1、環境介紹:html
一、系統:Debian8.1
java
二、JAVA:jdk1.7.0_80node
三、Hadoop:2.6.0-cdh5.15.1 Compiled with protoc 2.5.0linux
2、架構介紹shell
一、雙namenode+zookeeper集羣的HA使用sshfencingapache
HA namenode介紹:bootstrap
Namenode 管理者文件系統的Namespace。它維護着文件系統樹(filesystem tree)以及文件樹中全部的文件和文件夾的元數據(metadata)。管理這些信息的文件有兩個,分別是Namespace 鏡像文件(Namespace image)和操做日誌文件(edit log),這些信息被Cache在RAM中,固然,這兩個文件也會被持久化存儲在本地硬盤。Namenode記錄着每一個文件中各個塊所在的數據節點的位置信息,可是他並不持久化存儲這些信息,由於這些信息會在系統啓動時從數據節點重建。
在hadoop1時代,只有一個NameNode。若是該NameNode數據丟失或者不能工做,那麼整個集羣就不能恢復了。這是hadoop1中的單點問題,也是hadoop1不可靠的表現。vim
爲了解決hadoop1中的單點問題,在hadoop2中新的NameNode再也不是隻有一個,能夠有多個(目前只支持2個(2.4.1版本))。每個都有相同的職能。一個是active狀態的,一個是standby狀態的。當集羣運行時,只有active狀態的NameNode是正常工做的,standby狀態的NameNode是處於待命狀態的,時刻同步active狀態NameNode的數據。一旦active狀態的NameNode不能工做,經過手工或者自動切換,standby狀態的NameNode就能夠轉變爲active狀態的,就能夠繼續工做了。這就是高可靠。 c#
JournalNode實現NameNode(Active和Standby)數據的共享
當有兩個NameNode,一個standby一個active,當active有數據變更時,standby也應該及時更新,這樣才能夠作到高可靠!不然,信息不一致還怎麼叫高可靠呢? 緩存
兩個NameNode爲了數據同步,會經過一組稱做JournalNodes(JNs)的獨立進程進行相互通訊。當active狀態的NameNode的命名空間有任何修改時,會告知大部分的JournalNodes進程。standby狀態的NameNode有能力讀取JNs中的變動信息,而且一直監控edit log的變化,把變化應用於本身的命名空間。standby能夠確保在集羣出錯時,命名空間狀態已經徹底同步了
active的NameNode出現了故障,例如掛機了,是誰去切換standby的NameNode變爲active狀態呢?這時,就須要引入ZooKeeper。首先HDFS集羣中的兩個NameNode都在ZooKeeper中註冊,當active狀態的NameNode出故障時,ZooKeeper能檢測到這種狀況,它就會自動把standby狀態的NameNode切換爲active狀態
二、3臺journalnode的HA
上面已經知道了journalNode是負責同步主namenode元數據到從namenode上的,若是一個journalNode宕機以後,那麼元數據將沒法同步到從namenode,因此。這裏使用3臺journalnode做爲備份
三、3臺datanode節點
datanode很少說了,實際存儲數據塊的節點
3、搭建前的準備工做
一、配置主機名
root@master1:/usr/local/src# hostname master1 root@master1:/usr/local/src# vim /etc/hostname master1
root@master2:/usr/local/src# hostname master2 root@master2:/usr/local/src# vim /etc/hostname master2
root@slaver1:/usr/local/src# hostname slaver1 root@slaver1:/usr/local/src# vim /etc/hostname slaver1
root@slaver2:/usr/local/src# hostname slaver2 root@slaver2:/usr/local/src# vim /etc/hostname slaver2
root@slaver3:/usr/local/src# hostname slaver3 root@slaver3:/usr/local/src# vim /etc/hostname slaver3
二、配置ssh互信(master1免密碼登陸其餘主機,方便使用ansible管理)
root@master2:/usr/local/src# hosts=(master1 master2 slaver1 slaver2 slaver3) root@master2:/usr/local/src# for host in ${hosts[@]};do ssh-copy-id $host;done
三、配置hosts文件
root@master1:/usr/local/src# ansible nodes -m shell -a "echo '172.31.14.129 master1\n172.31.2.152 master2\n172.31.5.49 slaver1\n172.31.4.230 slaver2\n172.31.3.155 slaver3' >>/etc/hosts" master2 | success | rc=0 >> slaver1 | success | rc=0 >> master1 | success | rc=0 >> slaver2 | success | rc=0 >> slaver3 | success | rc=0 >> root@master1:/usr/local/src# ansible nodes -m command -a "cat /etc/hosts" master2 | success | rc=0 >> 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 172.31.14.129 master1 172.31.2.152 master2 172.31.5.49 slaver1 172.31.4.230 slaver2 172.31.3.155 slaver3 slaver2 | success | rc=0 >> 127.0.0.1 localhost slaver2 ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 172.31.4.230 slaver2 172.31.14.129 master1 172.31.2.152 master2 172.31.5.49 slaver1 172.31.4.230 slaver2 172.31.3.155 slaver3 slaver1 | success | rc=0 >> 127.0.0.1 localhost slaver1 ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 172.31.5.49 slaver1 172.31.14.129 master1 172.31.2.152 master2 172.31.5.49 slaver1 172.31.4.230 slaver2 172.31.3.155 slaver3 master1 | success | rc=0 >> 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 172.31.14.129 master1 172.31.2.152 master2 172.31.5.49 slaver1 172.31.4.230 slaver2 172.31.3.155 slaver3 slaver3 | success | rc=0 >> 127.0.0.1 localhost slaver3 ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 172.31.3.155 slaver3 172.31.14.129 master1 172.31.2.152 master2 172.31.5.49 slaver1 172.31.4.230 slaver2 172.31.3.155 slaver3
四、安裝cloudera CDH源
root@master1:/usr/local/src# ansible nodes -m shell -a "echo 'deb http://cloudera.proxy.ustclug.org/cdh5/debian/jessie/amd64/cdh jessie-cdh5 contrib\ndeb-src https://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh jessie-cdh5 contrib' >/etc/apt/sources.list.d/cloudera-cdh5.list" root@master1:/usr/local/src# ansible nodes -m command -a "apt-get update" root@master1:/usr/local/src# ansible nodes -m shell -a "curl -s http://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key | sudo apt-key add -"
五、安裝配置JAVA環境
下載地址
http://www.oracle.com/technetwork/java/archive-139210.html
root@master1:/usr/local/src# tar -zxvf server-jre-7u80-linux-x64.tar.gz root@master1:/usr/local/src# mv jdk1.7.0_80 /usr/lib/java/ root@master1:/usr/local/src# salt nodes -m shell -a "echo 'export JAVA_HOME=/usr/lib/java/jdk1.7.0_80\nexport JRE_HOME=/usr/lib/java/jdk1.7.0_80/jre\nexport CLASSPATH=:/lib:/jre/lib\nexport PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin:/jre/bin:$JAVA_HOME/bin' >>/etc/profile" root@master1:/usr/local/src# ansible nodes -m command -a "source /etc/profile" root@master1:/usr/local/src# ansible nodes -m command -a "java -version" slaver1 | success | rc=0 >> java version "1.7.0_181" OpenJDK Runtime Environment (IcedTea 2.6.14) (7u181-2.6.14-1~deb8u1) OpenJDK 64-Bit Server VM (build 24.181-b01, mixed mode) master2 | success | rc=0 >> java version "1.7.0_181" OpenJDK Runtime Environment (IcedTea 2.6.14) (7u181-2.6.14-1~deb8u1) OpenJDK 64-Bit Server VM (build 24.181-b01, mixed mode) master1 | success | rc=0 >> java version "1.7.0_181" OpenJDK Runtime Environment (IcedTea 2.6.14) (7u181-2.6.14-1~deb8u1) OpenJDK 64-Bit Server VM (build 24.181-b01, mixed mode) slaver2 | success | rc=0 >> java version "1.7.0_181" OpenJDK Runtime Environment (IcedTea 2.6.14) (7u181-2.6.14-1~deb8u1) OpenJDK 64-Bit Server VM (build 24.181-b01, mixed mode) slaver3 | success | rc=0 >> java version "1.7.0_181" OpenJDK Runtime Environment (IcedTea 2.6.14) (7u181-2.6.14-1~deb8u1) OpenJDK 64-Bit Server VM (build 24.181-b01, mixed mode)
4、安裝hadoop依賴
一、爲HA namenode安裝配置zookeeper集羣
namenode的failover要在zookeeper下實現。安裝了cdh的倉庫以後使用apt安裝的版本過高,會致使namenode等其餘包安裝不上,因此在cloudera找了符合的版本進行安裝
root@master1:/usr/local/src# wget root@master1:/usr/local/src# ansible zookeepernodes -m copy -a "src=./zookeeper_3.4.5+cdh5.15.0+144-1.cdh5.15.0.p0.52~jessie-cdh5.15.0_all.deb dest=/usr/local/src/" root@master1:/usr/local/src# ansible zookeepernodes -m command -a "dpkg -i /usr/local/src/zookeeper_3.4.5+cdh5.15.0+144-1.cdh5.15.0.p0.52~jessie-cdh5.15.0_all.deb" root@master1:/usr/local/src# cat /etc/zookeeper/conf/zoo.cfg maxClientCnxns=50 tickTime=2000 initLimit=10 syncLimit=5 dataDir=/var/lib/zookeeper clientPort=2181 dataLogDir=/var/lib/zookeeper server.1=master1:2888:3888 server.2=master2:2888:3888 server.3=slaver1:2888:3888 root@master1:/usr/local/src# ansible zookeepernodes -m copy -a "src=/etc/zookeeper/conf/zoo.cfg dest=/etc/zookeeper/conf/" root@master1:/usr/local/src# zookeepers=(master1 master2 slaver3) root@master1:/usr/local/src# for loop in ${zookeepers[@]};do ssh $loop "echo $count >/var/lib/zookeeper/myid";let count+=1 ;done root@master1:/usr/local/src# ansible zookeepernodes -m command -a "service zookeeper-server init" root@master1:/usr/local/src# ansible zookeepernodes -m command -a "service zookeeper-server start" root@master1:/usr/local/src# ansible zookeepernodes -m command -a "/usr/lib/zookeeper/bin/zkServer.sh status" master1 | success | rc=0 >> Mode: followerJMX enabled by default Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg slaver1 | success | rc=0 >> Mode: leaderJMX enabled by default Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg master2 | success | rc=0 >> Mode: followerJMX enabled by default Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg
二、hadoop-yarn-resourcemanager
2.一、YARN基礎架構
Hadoop 2.0 中新引入的資源管理系統,它的引入使得 Hadoop 再也不侷限於 MapReduce 一類計算,而是支持多樣化的計算框架。它由兩類服務組成,分別是 ResourceManager 和 NodeManager。
MRv2的YARN體系結構的基本思想是將JobTracker的兩個主要職責——資源管理和任務調度/監控——分解成單獨的守護進程:全局資源管理程序(RM)和每一個應用程序應用程序管理程序(AM)。使用MRv2, ResourceManager (RM)和每一個節點nodemanager (NM)造成了數據計算框架。ResourceManager服務有效地取代了JobTracker的功能,nodemanager在從節點上運行,而不是在TaskTracker守護進程上運行。實際上,每一個應用程序的ApplicationMaster是一個特定於框架的庫,其任務是與ResourceManager協商資源,並與NodeManager(s)合做執行和監視任務。
ResourceManager(RM)
接收客戶端任務請求,接收和監控 NodeManager(NM) 的資源狀況彙報,負責資源的分配與調度,啓動和監控 ApplicationMaster(AM)
2.二、安裝hadoop-yarn-resourcemanager
root@master1:/usr/local/src# ansible namenodes -m command -a "apt-get install hadoop-yarn-resourcemanager -y" root@master1:/usr/local/src# ansible namenodes -m command -a "/etc/init.d/hadoop-yarn-resourcemanager start"
三、安裝journalnode
root@master1:/usr/local/src# ansible zookeepernodes -m command -a "apt-get install hadoop-hdfs-journalnode -y" root@master1:/usr/local/src# ansible zookeepernodes -m command -a "service hadoop-hdfs-journalnode start"
四、安裝namenode
root@master1:/usr/local/src# ansible namenodes -m command -a "apt-get install hadoop-hdfs-namenode" root@master1:/usr/local/src# ansible namenodes -m command -a "apt-get install hadoop-hdfs-zkfc" root@master1:/usr/local/src# ansible namenodes -m command -a "service hadoop-hdfs-zkfc start"
4.一、配置namenode的HA
root@master1:/usr/local/src# ansible namenodes -m command -a "cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster" root@master1:/usr/local/src# ansible namenodes -m command -a "update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50" root@master1:/usr/local/src# ansible namenodes -m command -a "update-alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster" root@master1:/usr/local/src# ansible namenodes -m command -a "update-alternatives --display hadoop-conf" hadoop-conf - manual mode link currently points to /etc/hadoop/conf.my_cluster /etc/hadoop/conf.empty - priority 10 /etc/hadoop/conf.impala - priority 5 /etc/hadoop/conf.my_cluster - priority 50 Current 'best' version is '/etc/hadoop/conf.my_cluster'.
4.二、namode主要配置文件
一、hdfs-site.xml
root@master1:/etc/hadoop/conf.my_cluster# cat hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <!-- journalnode集羣 --> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://master1:8485;master2:8485;slaver1:8485/cluster</value> </property> <property> <!-- journalnode用來存放緩存文件的目錄 --> <name>dfs.journalnode.edits.dir</name> <value>/space/hadoop/data/dfs/cache/journal</value> </property> <property> <!-- namenode的http端口綁定的網卡地址 --> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value> </property> <property> <!-- namenode用來存放元數據的目錄 --> <name>dfs.namenode.name.dir</name> <value>file:///space/hadoop/data/dfs/nn</value> </property> <property> <!-- datanode存放數據的目錄 --> <name>dfs.datanode.data.dir</name> <value>file:///space/hadoop/data/dfs/dn</value> </property> <property> <!-- namenode集羣的名字,下面要調用 --> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <!-- namenode集羣中每一個節點的主機名,dfs.ha.namenode.your_host_name --> <name>dfs.ha.namenodes.mycluster</name> <value>master1,master2</value> </property> <property> <!-- 集羣中節點master1的rpc通訊端口,dfs.namenode.rpc-address.your_host_name.your_node_name --> <name>dfs.namenode.rpc-address.mycluster.master1</name> <value>master1:8020</value> </property> <property> <!-- 集羣中節點master1的http端口,dfs.namenode.http-address.you_host_name.your_node_name --> <name>dfs.namenode.http-address.mycluster.master1</name> <value>master1:50070</value> </property> <property> <!-- 節點2的端口配置,注意:每個namenode節點都要配置這兩個屬性 --> <name>dfs.namenode.rpc-address.mycluster.master2</name> <value>master2:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.master2</name> <value>master2:50070</value> </property> <property> <!-- 開啓namenode集羣的故障自動切換 --> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <!-- 在本地namenode掛掉時,使用ssh的方式自動登陸其餘節點啓動namenode --> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value> </property> <property> <!-- 每個數據塊要複製多少份 --> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property> </configuration>
二、core-site.xml
root@master1:/etc/hadoop/conf.my_cluster# cat core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <!-- 集羣名字 --> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/space/hadoop/data/temp</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>master1:2181,master2:2181,slaver1:2181</value> </property> <property> <!-- 回收站過時時間,minutes --> <name>fs.trash.interval</name> <value>120</value> </property>
三、同步從namenode配置文件
root@master1:/usr/local/src# scp -r /etc/hadoop/conf.empty/* master2:/etc/hadoop/conf.empty/
四、準備hadoop數據目錄
root@master1:/usr/local/src# ansible namenodes -m shell -a "mkdir -p /space/hadoop/data/dfs/nn" root@master1:/usr/local/src# ansible namenodes -m shell -a "mkdir -p /space/hadoop/data/dfs/cache/journal && chown -R /space/hadoop/data/"
五、hdfs用戶的免密登陸
5.一、主namenode
root@master1:/usr/local/src# su hdfs hdfs@master1:/usr/local/src$ ssh-keygen hdfs@master1:/usr/local/src$ scp-copy-id master2
5.二、從namenode
root@master2:/usr/local/src# su hdfs hdfs@master2:/usr/local/src$ ssh-keygen hdfs@master2:/usr/local/src$ scp-copy-id master1
六、namenode的初始化
6.一、在其中一臺zookeeper初始化HA狀態
root@master1:/etc/hadoop/conf.my_cluster# sudo -u hdfs hdfs zkfc -formatZK
6.二、在主namenode上格式化hdfs
root@master1:/etc/hadoop/conf.my_cluster# sudo -u hdfs hdfs namenode -format PRESS Y root@master1:/etc/hadoop/conf.my_cluster# /etc/init.d/hadoop-hdfs-namenode start
6.三、從namenode的啓動
root@master2:/etc/hadoop/conf.my_cluster# sudo -u hdfs hdfs namenode -bootstrapStandby root@master2:/etc/hadoop/conf.my_cluster# sudo -u hdfs hdfs namenode -initializeSharedEdits root@master2:/etc/hadoop/conf.my_cluster# /etc/init.d/hadoop-hdfs-namenode start
5、驗證故障切換
一、查看兩個namenode的狀態
root@master1:/etc/hadoop/conf.my_cluster# sudo -u hdfs hdfs haadmin -getServiceState master1 active root@master1:/etc/hadoop/conf.my_cluster# sudo -u hdfs hdfs haadmin -getServiceState master2 standby
二、上傳文件到hdfs上
root@master1:/etc/hadoop/conf.my_cluster# hadoop fs -mkdir /test root@master1:/etc/hadoop/conf.my_cluster# hadoop fs -put /space/test.txt /test/ root@master1:/etc/hadoop/conf.my_cluster# hadoop fs -ls /space Found 1 items -rw-r--r-- 3 hdfs hadoop 2 2018-08-24 18:48 /test/test.txt
三、中止主namenode服務
root@master1:/etc/hadoop/conf.my_cluster# /etc/init.d/hadoop-hdfs-namenode stop root@master1:/etc/hadoop/conf.my_cluster# sudo -u hdfs hdfs haadmin -getServiceState master1 standby root@master1:/etc/hadoop/conf.my_cluster# sudo -u hdfs hdfs haadmin -getServiceState master2 active
master2自動切換成主
四、驗證文件是否還在
root@master2:/usr/local/src# hadoop fs -ls /space Found 1 items -rw-r--r-- 3 hdfs hadoop 2 2018-08-24 18:48 /input/test1.txt
6、安裝配置故障彙總
一、namenode沒法自動failover 因爲debian使用ssh的版本是6.7新版本,hadoop的加密方法使用不了,因此在fencing時報錯不能鏈接
在/etc/ssh/sshd_config中加入「KexAlgorithms curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1」並重啓ssh便可解決
二、在驗證故障切換時,使用hdfs fs -pu發現hdfs只能讀不能寫,提示namenode處於安全模式
排查步驟:
2.一、主namenode下線以後,查看從的狀態是active,可是下面的live Nodes是0,磁盤和塊均看不到信息
2.二、檢查namenode錯誤日誌,未發現異常
2.三、檢查datanode提示不能鏈接到master1?並無報master2鏈接不上
2.四、從新啓動datanode發現錯誤:WARN org.apache.hadoop.hdfs.DFSUtil: Namenode for mycluster remains unresolved for ID master2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
2.五、檢查hdfs-site.xml發現master2的節點屬性我寫錯了,value寫成了name,改回來啓動正常。在測試,已經可讀可寫
三、各類端口不通,防火牆放行,關閉selinux,若是是雲主機,組安全策略也要放行。
至此,namenode的高可用搭建完成,之後會補上ResourceManager的HA。