hadoop ha搭建

環境:
Ubuntu 16.04.2 3臺
下面來看一下架構圖
 html

下面咱們將直接進行部署流程,最後會來簡單闡述一下原理
zookeeper的部署node

tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
dataDir=/apps/svr/install/apache-zookeeper-3.5.7-bin/data
dataLogDir=/apps/svr/install/apache-zookeeper-3.5.7-bin/log
server.0=192.168.1.7:2888:3888
server.1=192.168.1.8:2888:3888
server.2=192.168.1.9:2888:3888

修改zoo.cfg,建立對應的目錄,在data目錄下建立myid文件,一切完畢後進行啓動apache

hadoop-ha部署,咱們這裏採用一步到位的作法
先來看一下配置bootstrap

###core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<!--說明:hadoop2.x端口默認9000;hadoop3.x端口默認9820-->
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://mycluster</value>
        </property>
        <!--注意:臨時目錄本身建立下-->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/opt/hadoop/ha</value>
        </property>
       <property>
            <name>ha.zookeeper.quorum</name>
            <value>192.168.1.7:2181,192.168.1.8:2181,192.168.1.9:2181</value>
        </property>
</configuration>
###hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <!--說明:不配置副本的狀況下默認是3 -->
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
            <!--設置 secondaryNameNode 爲 node02節點的虛擬機; hadoop2.x 端口爲50090-->
            <name>dfs.namenode.secondary.http-address</name>
            <value>ubuntu-node2:50090</value>
        </property>
        <!--關閉 hdfs 讀取權限,即不檢查權限-->
        <property>
            <name>dfs.permissions.enabled</name>
            <value>false</value>
        </property>
        <property>
            <name>dfs.nameservices</name>
            <value>mycluster</value>
        </property>
        <property>
            <name>dfs.ha.namenodes.mycluster</name>
            <value>node1,node2</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.mycluster.node1</name>
            <value>ubuntu-node1:8020</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.mycluster.node2</name>
            <value>ubuntu-node2:8020</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.mycluster.node1</name>
            <value>ubuntu-node1:50070</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.mycluster.node2</name>
            <value>ubuntu-node2:50070</value>
        </property>
        <property>
            <name>dfs.namenode.shared.edits.dir</name>
            <value>qjournal://ubuntu-node1:8485;ubuntu-node2:8485;ubuntu-node3:8485/mycluster</value>
        </property>
        <property>
            <name>dfs.client.failover.proxy.provider.mycluster</name>
            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <property>
            <name>dfs.ha.fencing.methods</name>
            <value>sshfence</value>
        </property>
        <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
            <value>/home/ubuntu/.ssh/id_rsa</value>
        </property>
        <property>
            <name>dfs.ha.automatic-failover.enabled</name>
            <value>true</value>
        </property>
        <property>
            <name>dfs.journalnode.edits.dir</name>
            <value>/tmp/hadoop/journalnode/data</value>
        </property>
</configuration>

啓動 在奇數個節點上啓動QJMubuntu

sbin/hadoop-daemon.sh start journalnode

首先在namenode1上執行架構

bin/hdfs namenode -format

而後在namenode1上執行app

bin/hdfs zkfc -formatZK

啓動程序ssh

sbin/start-dfs.sh

在namenode2上執行格式化ide

bin/hdfs namenode -bootstrapStandby

啓動namenode2上的namenodeoop

sbin/hadoop-daemon.sh start

到此hadoop-ha已經搭建完畢
查看狀態的命令

bin/hdfs haadmin -getServiceState <id>

下面說說yarn ha的搭建

<?xml version="1.0"?>
<configuration>
    <!--啓用resourcemanager ha-->
	<property>
			<name>yarn.resourcemanager.ha.enabled</name>
			<value>true</value>
	</property>
	<property>
		<name>yarn.resourcemanager.cluster-id</name>
		<value>yarn-cluster</value>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.rm-ids</name>
		<value>rm1,rm2</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname.rm1</name>
		<value>ubuntu-node1</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname.rm2</name>
		<value>ubuntu-node2</value>
	</property>
	<!--zookeeper-->
	<property>
		<name>yarn.resourcemanager.zk-address</name>
		<value>192.168.1.7:2181,192.168.1.8:2181,192.168.1.9:2181</value>
	</property>
	<!--NodeManager上運行的附屬服務。需配置成mapreduce_shuffle,纔可運行MapReduce程序-->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<!--啓用自動恢復-->
	<property>
		<name>yarn.resourcemanager.recovery.enabled</name>
		<value>true</value>
	</property>

	<!--指定resourcemanager的狀態信息存儲在zookeeper集羣-->
	<property>
		<name>yarn.resourcemanager.store.class</name>     
		<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
	</property>
	<!--日誌聚合-->
	<property>
		<name>yarn.log-aggregation-enable</name>
		<value>true</value>
	</property>
	<!--任務歷史服務-->
	<property>
		<name>yarn.log.server.url</name>
		<value>http://ubuntu-node1:19888/jobhistory/logs/</value>
	</property>
	<!--HDFS上保存多長時間-->
	<property>
		<name>yarn.log-aggregation.retain-seconds</name>
		<value>86400</value>
	</property>
</configuration>

啓動命令

sbin/start-yarn.sh

啓動備用節點

sbin/yarn-daemon.sh start resourcemanager

查看狀態的命令

bin/yarn rmadmin -getServiceState <id>

手動激活命令

bin/hdfs haadmin -transitionToActive [--forcemanual] <id>

歡迎關注個人公衆號

相關文章
相關標籤/搜索