環境:
Ubuntu 16.04.2 3臺
下面來看一下架構圖
html
下面咱們將直接進行部署流程,最後會來簡單闡述一下原理
zookeeper的部署node
tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 dataDir=/apps/svr/install/apache-zookeeper-3.5.7-bin/data dataLogDir=/apps/svr/install/apache-zookeeper-3.5.7-bin/log server.0=192.168.1.7:2888:3888 server.1=192.168.1.8:2888:3888 server.2=192.168.1.9:2888:3888
修改zoo.cfg,建立對應的目錄,在data目錄下建立myid文件,一切完畢後進行啓動apache
hadoop-ha部署,咱們這裏採用一步到位的作法
先來看一下配置bootstrap
###core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--說明:hadoop2.x端口默認9000;hadoop3.x端口默認9820--> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <!--注意:臨時目錄本身建立下--> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/ha</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>192.168.1.7:2181,192.168.1.8:2181,192.168.1.9:2181</value> </property> </configuration>
###hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--說明:不配置副本的狀況下默認是3 --> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <!--設置 secondaryNameNode 爲 node02節點的虛擬機; hadoop2.x 端口爲50090--> <name>dfs.namenode.secondary.http-address</name> <value>ubuntu-node2:50090</value> </property> <!--關閉 hdfs 讀取權限,即不檢查權限--> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>node1,node2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.node1</name> <value>ubuntu-node1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.node2</name> <value>ubuntu-node2:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.node1</name> <value>ubuntu-node1:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.node2</name> <value>ubuntu-node2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://ubuntu-node1:8485;ubuntu-node2:8485;ubuntu-node3:8485/mycluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/ubuntu/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/tmp/hadoop/journalnode/data</value> </property> </configuration>
啓動 在奇數個節點上啓動QJMubuntu
sbin/hadoop-daemon.sh start journalnode
首先在namenode1上執行架構
bin/hdfs namenode -format
而後在namenode1上執行app
bin/hdfs zkfc -formatZK
啓動程序ssh
sbin/start-dfs.sh
在namenode2上執行格式化ide
bin/hdfs namenode -bootstrapStandby
啓動namenode2上的namenodeoop
sbin/hadoop-daemon.sh start
到此hadoop-ha已經搭建完畢
查看狀態的命令
bin/hdfs haadmin -getServiceState <id>
下面說說yarn ha的搭建
<?xml version="1.0"?> <configuration> <!--啓用resourcemanager ha--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>ubuntu-node1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>ubuntu-node2</value> </property> <!--zookeeper--> <property> <name>yarn.resourcemanager.zk-address</name> <value>192.168.1.7:2181,192.168.1.8:2181,192.168.1.9:2181</value> </property> <!--NodeManager上運行的附屬服務。需配置成mapreduce_shuffle,纔可運行MapReduce程序--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--啓用自動恢復--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--指定resourcemanager的狀態信息存儲在zookeeper集羣--> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <!--日誌聚合--> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!--任務歷史服務--> <property> <name>yarn.log.server.url</name> <value>http://ubuntu-node1:19888/jobhistory/logs/</value> </property> <!--HDFS上保存多長時間--> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>86400</value> </property> </configuration>
啓動命令
sbin/start-yarn.sh
啓動備用節點
sbin/yarn-daemon.sh start resourcemanager
查看狀態的命令
bin/yarn rmadmin -getServiceState <id>
手動激活命令
bin/hdfs haadmin -transitionToActive [--forcemanual] <id>
歡迎關注個人公衆號