在非HA模式中,能保證NameNode中的元數據可靠性,可是當NameNode宕機後,沒法保證對外界提供服務,即沒法保證服務的可用性。因此在Hadoop2.x中提供了HA部署模式解決這個問題。
分析幾個須要面臨的問題:node
思路:應當讓NameNode集羣中,同一時間只有一個active狀態節點響應客戶端請求,其餘node爲standby狀態。每一個集羣能夠有一個active,一個standby,一個集羣成爲一個Federal,能夠擴展多個Federal集羣提供服務,這樣客戶端在訪問的時候,配置文件中就不能配置爲NameNode的host,應該是Federal的nameservice,如ns1,ns2等等,ns下面分別有兩個節點nn1, nn2shell
HOST | IP | SOFTS | PROCESS |
---|---|---|---|
hdcluster01 | 10.211.55.22 | jdk, hadoop | NameNode, DFSZKFailoverController(zkfc) |
hdcluster02 | 10.211.55.23 | jdk, hadoop | NameNode, DFSZKFailoverController(zkfc) |
hdcluster03 | 10.211.55.27 | jdk, hadoop | ResourceManager |
hdcluster04 | 10.211.55.28 | jdk, hadoop | ResourceManager |
zk01 | 10.211.55.24 | jdk, hadoop, zookeeper | DataNode, NodeManager, JournalNode, QuorumPeerMain |
zk02 | 10.211.55.25 | jdk, hadoop, zookeeper | DataNode, NodeManager, JournalNode, QuorumPeerMain |
zk03 | 10.211.55.26 | jdk, hadoop, zookeeper | DataNode, NodeManager, JournalNode, QuorumPeerMain |
配置hdcluster01
1)配置hadoop-env.shapache
export JAVA_HOME=/home/parallels/app/jdk1.7.0_65
2)配置core-site.xmlbootstrap
<configuration> <!-- 指定hdfs的nameservice爲ns1 --> <property> <name>fs.defaultFS</name> <value>hdfs://ns1/</value> </property> <!-- 指定hadoop臨時目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/home/parallels/app/hadoop-2.4.1/data/</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>zk01:2181,zk02:2181,zk03:2181</value> </property> </configuration>
3)配置hdfs-site.xml網絡
<configuration> <!--指定hdfs的nameservice爲ns1,須要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <!-- ns1下面有兩個NameNode,分別是nn1,nn2 --> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <!-- nn1的RPC通訊地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>hdcluster01:9000</value> </property> <!-- nn1的http通訊地址 --> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>hdcluster01:50070</value> </property> <!-- nn2的RPC通訊地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>hdcluster02:9000</value> </property> <!-- nn2的http通訊地址 --> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>hdcluster02:50070</value> </property> <!-- 指定NameNode的元數據在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://zk01:8485;zk02:8485;zk03:8485/ns1</value> </property> <!-- 指定JournalNode在本地磁盤存放數據的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/parallels/app/hadoop-2.4.1/journaldata</value> </property> <!-- 開啓NameNode失敗自動切換 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失敗自動切換實現方式 --> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔離機制方法,多個機制用換行分割,即每一個機制暫用一行--> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <!-- 使用sshfence隔離機制時須要ssh免登錄 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/parallels/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔離機制超時時間 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <!-- DataNode進程死亡或者網絡故障形成DataNode沒法與NameNode通訊,NameNode不會 當即把該節點斷定爲死亡,要通過一段超時時間。HDFS默認的超時時間是10分鐘+30秒,若是定 義超時時間爲timeout,則其計算公式爲: timeout = 2 * heartbeat.recheck.interval + 10 * dfs.heartbeat.interval --> <property> <name>heartbeat.recheck.interval</name> <!-- 單位:毫秒 --> <value>2000</value> </property> <property> <name>dfs.heartbeat.interval</name> <!-- 單位:秒 --> <value>1</value> </property> <!-- 在平常維護hadoop集羣過程當中會發現這樣一種現象:某個節點因爲網絡故障或者 DataNode進程死亡,被NameNode斷定爲死亡,HDFS立刻自動開始數據塊的容錯拷貝, 當該節點從新加入到集羣中,因爲該節點的數據並無損壞,致使集羣中某些block的 備份數超過了設定數值。默認狀況下要通過1個小時的時間纔會對這些冗餘block進行清理。 而這個時長與數據塊報告時間有關。DataNode會按期將該節點上的全部block信息報告給 NameNode,默認間隔1小時。下面的參數能夠修改報告時間 --> <property> <name>dfs.blockreport.intervalMsec</name> <value>10000</value> <description>Determines block reporting interval in milliseconds.</description> </property> </configuration>
4)配置mapred-site.xmlapp
<configuration> <!-- 指定mr框架爲yarn方式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
5)配置yarn-site.xml框架
<configuration> <!-- 開啓RM高可用 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <!-- 指定RM的名字 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- 分別指定RM的地址 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hdcluster03</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hdcluster04</value> </property> <!-- 指定zk集羣地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>zk01:2181,zk02:2181,zk03:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
6)修改slaves文件less
[parallels@hdcluster01 hadoop]$ less slaves zk01 zk02 zk03
slaves是指定子節點的位置,由於要在hdcluster01上啓動HDFS、在hdcluster03啓動yarn,因此hdcluster01上的slaves文件指定的是datanode的位置,hdcluster03上的slaves文件指定的是nodemanager的位置
7)將配置完成的hadoop安裝目錄scp到其餘六臺節點
8)配置免密碼ssh登陸
hdcluster01須要免密登陸到hdcluster02和zk01, zk02, zk03,首先生成密鑰ssh
ssh-keygen -t rsa
再將公鑰拷貝到上述節點分佈式
ssh-copy-id hdcluster01 ssh-copy-id hdcluster02 ssh-copy-id zk01 ssh-copy-id zk02 ssh-copy-id zk03
hdcluster03須要啓動ResourceManager,須要免密登陸到DataNode節點,同理配置hdcluster03到zk01, zk02, zk03的ssh登陸密鑰
9)啓動zookeeper集羣(以zk01爲例分別啓動zk01, zk02, zk03)
[parallels@zk01 bin]$ ./zkServer.sh start JMX enabled by default Using config: /home/parallels/app/zookeeper-3.4.5/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
查看狀態
[parallels@zk01 bin]$ ./zkServer.sh status JMX enabled by default Using config: /home/parallels/app/zookeeper-3.4.5/bin/../conf/zoo.cfg Mode: leader
10)分別在zk01, zk02, zk03上啓動journalnode
[parallels@zk01 sbin]$ ./hadoop-daemon.sh start journalnode starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk01.out [parallels@zk01 sbin]$ jps 11528 Jps 8530 QuorumPeerMain 11465 JournalNode [parallels@zk01 sbin]$ pwd /home/parallels/app/hadoop-2.4.1/sbin
11)格式化HDFS(hdcluster01)
hdfs namenode -format
要保證fsimage的初始數據一致性,能夠直接手動拷貝在core-site.xml中所配置的hadoop.tmp.dir目錄到hdcluster02中,或者在hdcluster01的hdfs啓動後,在hdcluster02執行命令:
hdfs namenode -bootstrapStandb
不過這樣會致使hdcluster02上的namenode中止,須要從新啓動。
12)格式化ZKFC(hdcluster01)
hdfs zkfc -formatZK
此時查看zookeeper集羣數據,能夠看到:
[zk: localhost:2181(CONNECTED) 0] ls / [hadoop-ha, zkData, zookeeper] [zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha [ns1] [zk: localhost:2181(CONNECTED) 3] ls /hadoop-ha/ns1 [] [zk: localhost:2181(CONNECTED) 3] get /hadoop-ha/ns1 cZxid = 0x300000003 ctime = Tue Oct 02 14:32:43 CST 2018 mZxid = 0x300000003 mtime = Tue Oct 02 14:32:43 CST 2018 pZxid = 0x30000000e cversion = 4 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 2
13)啓動HDFS(hdcluster01)
[parallels@hdcluster01 sbin]$ start-dfs.sh Starting namenodes on [hdcluster01 hdcluster02] hdcluster01: starting namenode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-namenode-hdcluster01.out hdcluster02: starting namenode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-namenode-hdcluster02.out zk01: starting datanode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-datanode-zk01.out zk03: starting datanode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-datanode-zk03.out zk02: starting datanode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-datanode-zk02.out Starting journal nodes [zk01 zk02 zk03] zk03: starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk03.out zk01: starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk01.out zk02: starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk02.out Starting ZK Failover Controllers on NN hosts [hdcluster01 hdcluster02] hdcluster01: starting zkfc, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-zkfc-hdcluster01.out hdcluster02: starting zkfc, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-zkfc-hdcluster02.out
14)啓動yarn(hdcluster03和hdcluster04)
[parallels@hdcluster03 sbin]$ ./start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-resourcemanager-hdcluster03.out zk03: starting nodemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-nodemanager-zk03.out zk01: starting nodemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-nodemanager-zk01.out zk02: starting nodemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-nodemanager-zk02.out [parallels@hdcluster03 sbin]$ jps 29964 Jps 28916 ResourceManager
[parallels@hdcluster04 sbin]$ yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-resourcemanager-hdcluster04.out [parallels@hdcluster04 sbin]$ jps 29404 ResourceManager 29455 Jps
至此Hadoop集羣的HA模式部署完成。
[parallels@hdcluster01 bin]$ hdfs dfsadmin -report Configured Capacity: 64418205696 (59.99 GB) Present Capacity: 60574326784 (56.41 GB) DFS Remaining: 60140142592 (56.01 GB) DFS Used: 434184192 (414.07 MB) DFS Used%: 0.72% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 3 (3 total, 0 dead) Live datanodes: Name: 10.211.55.24:50010 (zk01) Hostname: zk01 Decommission Status : Normal Configured Capacity: 21472735232 (20.00 GB) DFS Used: 144728064 (138.02 MB) Non DFS Used: 1281323008 (1.19 GB) DFS Remaining: 20046684160 (18.67 GB) DFS Used%: 0.67% DFS Remaining%: 93.36% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Last contact: Tue Oct 02 15:41:47 CST 2018 Name: 10.211.55.26:50010 (zk03) Hostname: zk03 Decommission Status : Normal Configured Capacity: 21472735232 (20.00 GB) DFS Used: 144728064 (138.02 MB) Non DFS Used: 1281269760 (1.19 GB) DFS Remaining: 20046737408 (18.67 GB) DFS Used%: 0.67% DFS Remaining%: 93.36% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Last contact: Tue Oct 02 15:41:47 CST 2018 Name: 10.211.55.25:50010 (zk02) Hostname: zk02 Decommission Status : Normal Configured Capacity: 21472735232 (20.00 GB) DFS Used: 144728064 (138.02 MB) Non DFS Used: 1281286144 (1.19 GB) DFS Remaining: 20046721024 (18.67 GB) DFS Used%: 0.67% DFS Remaining%: 93.36% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Last contact: Tue Oct 02 15:41:47 CST 2018
[parallels@hdcluster01 bin]$ hdfs haadmin -getServiceState nn1 standby [parallels@hdcluster01 bin]$ hdfs haadmin -getServiceState nn2 active
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start zkfc