目錄大綱html
1. hadoop HA原理 2. hadoop HA特色 3. Zookeeper 配置 4. 安裝Hadoop集羣 5. Hadoop HA配置
搭建環境java
環境 | 版本 | 地址地址 |
---|---|---|
CentOS | 6.5 64x | 點擊下載 |
hadoop | 2.5.1 | 點擊下載 |
Zookeeper | 3.4.5 | 點擊下載 |
Hadoop Ha配置 | null | 點擊下載 |
null | null | null |
ch01 | 192.168.128.121 | NN DN RM |
ch02 | 192.168.128.122 | NN DN NM |
ch03 | 192.168.128.123 | DN NM |
在一個典型的HA集羣中,每一個NameNode是一臺獨立的服務器。在任一時刻,只有一個NameNode處於active狀態,另外一個處於standby狀態。其中,active狀態的NameNode負責全部的客戶端操做,standby狀態的NameNode處於從屬地位,維護着數據狀態,隨時準備切換。node
兩個NameNode爲了數據同步,會經過一組稱做JournalNodes的獨立進程進行相互通訊。當active狀態的NameNode的命名空間有任何修改時,會告知大部分的JournalNodes進程。standby狀態的NameNode有能力讀取JNs中的變動信息,而且一直監控edit log的變化,把變化應用於本身的命名空間。standby能夠確保在集羣出錯時,命名空間狀態已經徹底同步了,如圖3所示。apache
1)配置zoo.cfg(默認是沒有zoo.cfg,將zoo_sample.cfg複製一份,並命名爲zoo.cfg)bootstrap
[root@ch01 conf]# vi /opt/hadoop/zookeeper-3.4.5-cdh5.6.0/conf/zoo.cfg # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. # the port at which the clients will connect clientPort=2181 dataDir=/opt/hadoop/zookeeper-3.4.5-cdh5.6.0/data dataLogDir=/opt/hadoop/zookeeper-3.4.5-cdh5.6.0/logs server.1=ch01:2888:3888 server.2=ch02:2888:3888 server.3=ch03:2888:3888 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1
2)將zookeeper從ch01複製到ch02,ch03機器上服務器
scp -r /opt/hadoop/zookeeper-3.4.5-cdh5.6.0/ root@ch02:/opt/hadoop/ scp -r /opt/hadoop/zookeeper-3.4.5-cdh5.6.0/ root@ch03:/opt/hadoop/
3)在ch01 02 c03中建立/opt/Hadoop/zookeeper-3.4.5-cdh5.6.0/data
目錄下建立myid
文件,內容爲zoo.cfg中server.x所配置的數字ssh
ch01=1 ch02=2 ch03=3
命令:jsp
[root@ch01 ~]# mkdir /opt/hadoop/zookeeper-3.4.5-cdh5.6.0/data //建立目錄 [root@ch01 ~]# echo 1 > /opt/hadoop/zookeeper-3.4.5-cdh5.6.0/data/myid //使用腳本命令 echo 寫入 [root@ch01 ~]# ssh ch02 //登陸ch02機器 Last login: Mon Feb 20 03:15:04 2017 from 192.168.128.1 [root@ch02 ~]# mkdir /opt/hadoop/zookeeper-3.4.5-cdh5.6.0/data //建立目錄 [root@ch02 ~]# echo 2 > /opt/hadoop/zookeeper-3.4.5-cdh5.6.0/data/myid //使用腳本命令 echo 寫入 [root@ch02 ~]# exit //退出ch02機器節點 logout Connection to ch02 closed. [root@ch01 ~]# ssh ch03 //登陸ch02機器 Last login: Sun Feb 19 16:13:53 2017 from 192.168.128.1 [root@ch03 ~]# mkdir /opt/hadoop/zookeeper-3.4.5-cdh5.6.0/data //建立目錄 [root@ch03 ~]# echo 3 > /opt/hadoop/zookeeper-3.4.5-cdh5.6.0/data/myid //使用腳本命令 echo 寫入 [root@ch03 ~]# exit //退出ch02機器節點
須要修改的文件配置ide
1. core-site.xml 2. hadoop-env.sh 2. hdfs-site.xml 3. mapred-site.xml 4. yarn-site.xml 5. slaves
core-site.xmloop
<configuration> <property> <!-- 配置 hadoop NameNode ip地址 ,因爲咱們配置的 HA 那麼有兩個namenode 因此這裏配置的地址必須是動態的--> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <!-- 整合 Zookeeper --> <name>ha.zookeeper.quorum</name> <value>ch01:2181,ch02:2181,ch03:2181</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/opt/hadoop/hadoop-2.6.0-cdh5.6.0/tmp/</value> </property> </configuration>
hdfs-site.xml
<configuration> <!--命名空間設置ns1--> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!--namenodes節點ID:nn1,nn2(配置在命名空間mycluster下)--> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <!--nn1,nn2節點地址配置--> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>ch01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>ch02:8020</value> </property> <!--nn1,nn2節點WEB地址配置--> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>ch01:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>ch02:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://ch01:8485;ch02:8485;ch03:8485/mycluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>file:/opt/hadoop/hadoop-2.6.0-cdh5.6.0/tmp/dfs/journalnode</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop/hadoop-2.6.0-cdh5.6.0/tmp/dfs/name</value> </property> <property> <name>dfs.namenode.data.dir</name> <value>file:/opt/hadoop/hadoop-2.6.0-cdh5.6.0/tmp/dfs/data</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <!--啓用自動故障轉移--> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.replication.max</name> <value>32767</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framwork.name</name> <value>yarn</value> </property> </configuration>
yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>ch01</value> </property> </configuration>
slaves
ch01 ch02 ch03
配置Zookeeper環境變量
[root@ch01 ~]#vi /etc/profile #ZOOKEEPER ZOOKEEPER_HOME=/opt/hadoop/zookeeper-3.4.5-cdh5.6.0 //安裝目錄 PATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/sbin export ZOOKEEPER_HOME PATH
啓動Zookeeper
1)在ch01,ch02,ch03全部機器上執行,下面的代碼是在ch01上執行的示例:
root@ch01:zkServer.sh start JMX enabled by default Using config: /opt/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg Starting zookeeper ... STARTED root@ch01:/home/hadoop# /opt/hadoop/zookeeper-3.4.5/bin/zkServer.sh status JMX enabled by default Using config: /opt/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg Mode: follower
2)在每臺機器上執行下面的命令,能夠查看狀態,在ch01上是leader,其餘機器是follower
3)測試zookeeper是否啓動成功,看下面第29行高亮處,表示成功。
zkCli.sh
4)在ch01上格式化zookeeper,第33行的日誌表示建立成功。
hdfs zkfc -formatZK
5)驗證zkfc是否格式化成功,若是多了一個hadoop-ha包就是成功了。
zkCli.sh
1)依次在ch01,ch02,ch03上面執行
hadoop-daemon.sh start journalnode
2)格式化集羣的一個NameNode(ch01),有兩種方法,我使用的是第一種
hdfs namenode –format
3)在ch01上啓動剛纔格式化的 namenode
hadoop-daemon.sh start namenode
4)在ch01機器上,將ch01的數據複製到ch02上來,在ch02上執行
hdfs namenode –bootstrapStandby
5)啓動ch02上的namenode,執行命令後
hadoop-daemon.sh start namenode
瀏覽:http://ch02:50070/dfshealth.jsp能夠看到m2的狀態。
這個時候在網址上能夠發現m1和m2的狀態都是standby。
6)啓動全部的datanode,在ch01上執行
hadoop-daemons.sh start datanode
7)啓動yarn,在ch01上執行如下命令
start-yarn.sh
8)、啓動 ZooKeeperFailoverCotroller,在ch01,ch02機器上依次執行如下命令,這個時候再瀏覽50070端口,能夠發現ch01變成active狀態了,而m2仍是standby狀態
hadoop-daemon.sh start zkfc
10)、測試HDFS是否可用
/home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /
journalnode啓動失敗
[root@ch01 hadoop]# hadoop-daemon.sh start journalnode starting journalnode, logging to /opt/hadoop/hadoop-2.6.0-cdh5.6.0/logs/hadoop-root-journalnode-ch01.out Exception in thread "main" java.lang.IllegalArgumentException: Journal dir 'file:/opt/hadoop/hadoop-2.6.0-cdh5.6.0/tmp/dfs/journalnode' should be an absolute path at org.apache.hadoop.hdfs.qjournal.server.JournalNode.validateAndCreateJournalDir(JournalNode.java:120) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:144) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:134) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:307)
解決:將HDFS-site.xml中的journalnode屬性value的值設置爲絕對路徑.不須要加file:關鍵字
DataNode啓動失敗
Java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1394)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1355)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:228)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:745)
2017-02-20 10:25:39,363 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to ch02/192.168.128.122:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1394)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1355)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:228)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:745)
解決:緣由是由於,Namenode中的namenode CID-5a00c610-f0e3-4ecd-b298-129cc5544e7d
和DataNode中的CID
不一致致使的