Hadoop系列(六)分佈式集羣HA部署模式

HA模式分析

在非HA模式中,能保證NameNode中的元數據可靠性,可是當NameNode宕機後,沒法保證對外界提供服務,即沒法保證服務的可用性。因此在Hadoop2.x中提供了HA部署模式解決這個問題。
分析幾個須要面臨的問題:node

  • 要想保證NameNode的高可用,須要將NameNode部署成集羣模式,那麼可以讓多個NameNode都能正常響應客戶端的請求?

    思路:應當讓NameNode集羣中,同一時間只有一個active狀態節點響應客戶端請求,其餘node爲standby狀態。每一個集羣能夠有一個active,一個standby,一個集羣成爲一個Federal,能夠擴展多個Federal集羣提供服務,這樣客戶端在訪問的時候,配置文件中就不能配置爲NameNode的host,應該是Federal的nameservice,如ns1,ns2等等,ns下面分別有兩個節點nn1, nn2shell

  • standby狀態節點如何可以快速無縫切換到active狀態?
    思路:首先應保證NameNode節點時刻保證元數據的一致性。將edits文件部署到分佈式集羣中。在hadoop中,提供了qjournal這個框架進行元數據的管理,底層依賴於zookeeper框架,每一個qjournal節點成爲journalnode。這樣edits文件經過外部的集羣進行管理,NameNode集羣讀取外部的edits文件進行數據同步。
    其次如何保證處於standby節點的NameNode得知active狀態的節點已經宕機呢?
    NameNode能夠在zookeeper集羣中註冊臨時節點EPHEMERAL,臨時節點的生命週期和客戶端會話綁定。也就是說,若是客戶端會話失效,那麼這個節點就會自動被清除掉。當active節點發生宕機,臨時節點被清空,standby節點能夠知曉集羣服務發生異常,須要切換到active狀態。
    另外還能夠在NameNode所在節點啓動一個監控進程(在hadoop中叫zkfc進程),時刻監控本節點的NameNode運行狀態,將運行狀態寫入zookeeper集羣,另外的standby節點能夠監控這些狀態數據。
  • 當active節點發生宕機,standby節點切換成active狀態後,原來active的節點又恢復正常工做,這時候NameNode集羣中出現了兩臺active節點。這樣就會發生兩臺active節點同時向edits集羣寫入數據的brain split現象
    思路:當standby節點經過zookeeper集羣檢測到active節點已經宕機,它並不會立馬切換到active狀態,hadoop提供了fencing機制,它首先會經過ssh發送一條指令,將NameNode的進程殺死,並等待返回結果。當確認進程已經被殺死以後纔會進行狀態切換。可是因爲網絡問題,如何保證ssh命令必定可以發送成功,或者執行後的返回結果必定可以收到呢?若是收不到正確響應就永遠沒法切換成active狀態了。hadoop提供了這樣的解決方案,能夠定義一個超時時間,當ssh命令發送出去後,超過超時時間後尚未收到正常返回值,zkfc進程能夠執行一個自定義的shell腳本程序,進行節點的處理,去保證已經宕機的active節點不會從新切換成active狀態

HA模式配置及部署

  • 集羣規劃
HOST IP SOFTS PROCESS
hdcluster01 10.211.55.22 jdk, hadoop NameNode, DFSZKFailoverController(zkfc)
hdcluster02 10.211.55.23 jdk, hadoop NameNode, DFSZKFailoverController(zkfc)
hdcluster03 10.211.55.27 jdk, hadoop ResourceManager
hdcluster04 10.211.55.28 jdk, hadoop ResourceManager
zk01 10.211.55.24 jdk, hadoop, zookeeper DataNode, NodeManager, JournalNode, QuorumPeerMain
zk02 10.211.55.25 jdk, hadoop, zookeeper DataNode, NodeManager, JournalNode, QuorumPeerMain
zk03 10.211.55.26 jdk, hadoop, zookeeper DataNode, NodeManager, JournalNode, QuorumPeerMain
  • 說明
    1)在hadoop2.0中一般由兩個NameNode組成,一個處於active狀態,另外一個處於standby狀態。Active NameNode對外提供服務,而Standby NameNode則不對外提供服務,僅同步active namenode的狀態,以便可以在它失敗時快速進行切換。
    hadoop2.0官方提供了兩種HDFS HA的解決方案,一種是NFS,另外一種是QJM。這裏咱們使用簡單的QJM。在該方案中,主備NameNode之間經過一組JournalNode同步元數據信息,一條數據只要成功寫入多數JournalNode即認爲寫入成功。一般配置奇數個JournalNode
    這裏還配置了一個zookeeper集羣,用於ZKFC(DFSZKFailoverController)故障轉移,當Active NameNode掛掉了,會自動切換Standby NameNode爲standby狀態
    2)hadoop-2.2.0中依然存在一個問題,就是ResourceManager只有一個,存在單點故障,hadoop-2.4.1解決了這個問題,有兩個ResourceManager,一個是Active,一個是Standby,狀態由zookeeper進行協調
  • 配置hdcluster01
    1)配置hadoop-env.shapache

    export JAVA_HOME=/home/parallels/app/jdk1.7.0_65

    2)配置core-site.xmlbootstrap

    <configuration>
        <!-- 指定hdfs的nameservice爲ns1 -->
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://ns1/</value>
        </property>
        <!-- 指定hadoop臨時目錄 -->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/parallels/app/hadoop-2.4.1/data/</value>
        </property>
        <!-- 指定zookeeper地址 -->
        <property>
                <name>ha.zookeeper.quorum</name>
                <value>zk01:2181,zk02:2181,zk03:2181</value>
        </property>
        
    </configuration>

    3)配置hdfs-site.xml網絡

    <configuration>
        <!--指定hdfs的nameservice爲ns1,須要和core-site.xml中的保持一致 -->
        <property>
                <name>dfs.nameservices</name>
                <value>ns1</value>
        </property>
        <!-- ns1下面有兩個NameNode,分別是nn1,nn2 -->
        <property>
                <name>dfs.ha.namenodes.ns1</name>
                <value>nn1,nn2</value>
        </property>
        <!-- nn1的RPC通訊地址 -->
        <property>
                <name>dfs.namenode.rpc-address.ns1.nn1</name>
                <value>hdcluster01:9000</value>
        </property>
        <!-- nn1的http通訊地址 -->
        <property>
                <name>dfs.namenode.http-address.ns1.nn1</name>
                <value>hdcluster01:50070</value>
        </property>
        <!-- nn2的RPC通訊地址 -->
        <property>
                <name>dfs.namenode.rpc-address.ns1.nn2</name>
                <value>hdcluster02:9000</value>
        </property>
        <!-- nn2的http通訊地址 -->
        <property>
                <name>dfs.namenode.http-address.ns1.nn2</name>
                <value>hdcluster02:50070</value>
        </property>
        <!-- 指定NameNode的元數據在JournalNode上的存放位置 -->
        <property>
                <name>dfs.namenode.shared.edits.dir</name>
                <value>qjournal://zk01:8485;zk02:8485;zk03:8485/ns1</value>
        </property>
        <!-- 指定JournalNode在本地磁盤存放數據的位置 -->
        <property>
                <name>dfs.journalnode.edits.dir</name>
                <value>/home/parallels/app/hadoop-2.4.1/journaldata</value>
        </property>
        <!-- 開啓NameNode失敗自動切換 -->
        <property>
                <name>dfs.ha.automatic-failover.enabled</name>
                <value>true</value>
        </property>
        <!-- 配置失敗自動切換實現方式 -->
        <property>
                <name>dfs.client.failover.proxy.provider.ns1</name>
                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <!-- 配置隔離機制方法,多個機制用換行分割,即每一個機制暫用一行-->
        <property>
                <name>dfs.ha.fencing.methods</name>
                <value>
                        sshfence
                        shell(/bin/true)
                </value>
        </property>
        <!-- 使用sshfence隔離機制時須要ssh免登錄 -->
        <property>
                <name>dfs.ha.fencing.ssh.private-key-files</name>
                <value>/home/parallels/.ssh/id_rsa</value>
        </property>
        <!-- 配置sshfence隔離機制超時時間 -->
        <property>
                <name>dfs.ha.fencing.ssh.connect-timeout</name>
                <value>30000</value>
        </property>
        <!-- DataNode進程死亡或者網絡故障形成DataNode沒法與NameNode通訊,NameNode不會
        當即把該節點斷定爲死亡,要通過一段超時時間。HDFS默認的超時時間是10分鐘+30秒,若是定
        義超時時間爲timeout,則其計算公式爲:
        timeout = 2 * heartbeat.recheck.interval + 10 * dfs.heartbeat.interval -->
        <property>
                <name>heartbeat.recheck.interval</name>
                <!-- 單位:毫秒 -->
                <value>2000</value>
        </property>
        <property>
                <name>dfs.heartbeat.interval</name>
                <!-- 單位:秒 -->
                <value>1</value>
        </property>
        <!-- 在平常維護hadoop集羣過程當中會發現這樣一種現象:某個節點因爲網絡故障或者
        DataNode進程死亡,被NameNode斷定爲死亡,HDFS立刻自動開始數據塊的容錯拷貝,
        當該節點從新加入到集羣中,因爲該節點的數據並無損壞,致使集羣中某些block的
        備份數超過了設定數值。默認狀況下要通過1個小時的時間纔會對這些冗餘block進行清理。
        而這個時長與數據塊報告時間有關。DataNode會按期將該節點上的全部block信息報告給
        NameNode,默認間隔1小時。下面的參數能夠修改報告時間 -->
        <property>
                <name>dfs.blockreport.intervalMsec</name>
                <value>10000</value>
                <description>Determines block reporting interval in milliseconds.</description>
        </property>
    </configuration>

    4)配置mapred-site.xmlapp

    <configuration>
        <!-- 指定mr框架爲yarn方式 -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
    </configuration>

    5)配置yarn-site.xml框架

    <configuration>
        <!-- 開啓RM高可用 -->
        <property>
                <name>yarn.resourcemanager.ha.enabled</name>
                <value>true</value>
        </property>
        <!-- 指定RM的cluster id -->
        <property>
                <name>yarn.resourcemanager.cluster-id</name>
                <value>yrc</value>
        </property>
        <!-- 指定RM的名字 -->
        <property>
                <name>yarn.resourcemanager.ha.rm-ids</name>
                <value>rm1,rm2</value>
        </property>
        <!-- 分別指定RM的地址 -->
        <property>
                <name>yarn.resourcemanager.hostname.rm1</name>
                <value>hdcluster03</value>
        </property>
        <property>
                <name>yarn.resourcemanager.hostname.rm2</name>
                <value>hdcluster04</value>
        </property>
        <!-- 指定zk集羣地址 -->
        <property>
                <name>yarn.resourcemanager.zk-address</name>
                <value>zk01:2181,zk02:2181,zk03:2181</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
    
    </configuration>

    6)修改slaves文件less

    [parallels@hdcluster01 hadoop]$ less slaves
    zk01
    zk02
    zk03

    slaves是指定子節點的位置,由於要在hdcluster01上啓動HDFS、在hdcluster03啓動yarn,因此hdcluster01上的slaves文件指定的是datanode的位置,hdcluster03上的slaves文件指定的是nodemanager的位置
    7)將配置完成的hadoop安裝目錄scp到其餘六臺節點
    8)配置免密碼ssh登陸
    hdcluster01須要免密登陸到hdcluster02和zk01, zk02, zk03,首先生成密鑰ssh

    ssh-keygen -t rsa

    再將公鑰拷貝到上述節點分佈式

    ssh-copy-id hdcluster01
    ssh-copy-id hdcluster02
    ssh-copy-id zk01
    ssh-copy-id zk02
    ssh-copy-id zk03

    hdcluster03須要啓動ResourceManager,須要免密登陸到DataNode節點,同理配置hdcluster03到zk01, zk02, zk03的ssh登陸密鑰
    9)啓動zookeeper集羣(以zk01爲例分別啓動zk01, zk02, zk03)

    [parallels@zk01 bin]$ ./zkServer.sh start
    JMX enabled by default
    Using config: /home/parallels/app/zookeeper-3.4.5/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED

    查看狀態

    [parallels@zk01 bin]$ ./zkServer.sh status
    JMX enabled by default
    Using config: /home/parallels/app/zookeeper-3.4.5/bin/../conf/zoo.cfg
    Mode: leader

    10)分別在zk01, zk02, zk03上啓動journalnode

    [parallels@zk01 sbin]$ ./hadoop-daemon.sh start journalnode
    starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk01.out
    [parallels@zk01 sbin]$ jps
    11528 Jps
    8530 QuorumPeerMain
    11465 JournalNode
    [parallels@zk01 sbin]$ pwd
    /home/parallels/app/hadoop-2.4.1/sbin

    11)格式化HDFS(hdcluster01)

    hdfs namenode -format

    要保證fsimage的初始數據一致性,能夠直接手動拷貝在core-site.xml中所配置的hadoop.tmp.dir目錄到hdcluster02中,或者在hdcluster01的hdfs啓動後,在hdcluster02執行命令:

    hdfs namenode -bootstrapStandb

    不過這樣會致使hdcluster02上的namenode中止,須要從新啓動。

    12)格式化ZKFC(hdcluster01)

    hdfs zkfc -formatZK

    此時查看zookeeper集羣數據,能夠看到:

    [zk: localhost:2181(CONNECTED) 0] ls /
    [hadoop-ha, zkData, zookeeper]
    [zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha
    [ns1]
    [zk: localhost:2181(CONNECTED) 3] ls /hadoop-ha/ns1
    []
    [zk: localhost:2181(CONNECTED) 3] get /hadoop-ha/ns1
    
    cZxid = 0x300000003
    ctime = Tue Oct 02 14:32:43 CST 2018
    mZxid = 0x300000003
    mtime = Tue Oct 02 14:32:43 CST 2018
    pZxid = 0x30000000e
    cversion = 4
    dataVersion = 0
    aclVersion = 0
    ephemeralOwner = 0x0
    dataLength = 0
    numChildren = 2

    13)啓動HDFS(hdcluster01)

    [parallels@hdcluster01 sbin]$ start-dfs.sh
    Starting namenodes on [hdcluster01 hdcluster02]
    hdcluster01: starting namenode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-namenode-hdcluster01.out
    hdcluster02: starting namenode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-namenode-hdcluster02.out
    zk01: starting datanode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-datanode-zk01.out
    zk03: starting datanode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-datanode-zk03.out
    zk02: starting datanode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-datanode-zk02.out
    Starting journal nodes [zk01 zk02 zk03]
    zk03: starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk03.out
    zk01: starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk01.out
    zk02: starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk02.out
    Starting ZK Failover Controllers on NN hosts [hdcluster01 hdcluster02]
    hdcluster01: starting zkfc, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-zkfc-hdcluster01.out
    hdcluster02: starting zkfc, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-zkfc-hdcluster02.out

    14)啓動yarn(hdcluster03和hdcluster04)

    [parallels@hdcluster03 sbin]$ ./start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-resourcemanager-hdcluster03.out
    zk03: starting nodemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-nodemanager-zk03.out
    zk01: starting nodemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-nodemanager-zk01.out
    zk02: starting nodemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-nodemanager-zk02.out
    [parallels@hdcluster03 sbin]$ jps
    29964 Jps
    28916 ResourceManager
    [parallels@hdcluster04 sbin]$ yarn-daemon.sh start resourcemanager
    starting resourcemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-resourcemanager-hdcluster04.out
    [parallels@hdcluster04 sbin]$ jps
    29404 ResourceManager
    29455 Jps

    至此Hadoop集羣的HA模式部署完成。

測試集羣工做狀態

  • 查看HDFS的各節點的狀態信息
[parallels@hdcluster01 bin]$ hdfs dfsadmin -report
Configured Capacity: 64418205696 (59.99 GB)
Present Capacity: 60574326784 (56.41 GB)
DFS Remaining: 60140142592 (56.01 GB)
DFS Used: 434184192 (414.07 MB)
DFS Used%: 0.72%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)

Live datanodes:
Name: 10.211.55.24:50010 (zk01)
Hostname: zk01
Decommission Status : Normal
Configured Capacity: 21472735232 (20.00 GB)
DFS Used: 144728064 (138.02 MB)
Non DFS Used: 1281323008 (1.19 GB)
DFS Remaining: 20046684160 (18.67 GB)
DFS Used%: 0.67%
DFS Remaining%: 93.36%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Tue Oct 02 15:41:47 CST 2018


Name: 10.211.55.26:50010 (zk03)
Hostname: zk03
Decommission Status : Normal
Configured Capacity: 21472735232 (20.00 GB)
DFS Used: 144728064 (138.02 MB)
Non DFS Used: 1281269760 (1.19 GB)
DFS Remaining: 20046737408 (18.67 GB)
DFS Used%: 0.67%
DFS Remaining%: 93.36%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Tue Oct 02 15:41:47 CST 2018


Name: 10.211.55.25:50010 (zk02)
Hostname: zk02
Decommission Status : Normal
Configured Capacity: 21472735232 (20.00 GB)
DFS Used: 144728064 (138.02 MB)
Non DFS Used: 1281286144 (1.19 GB)
DFS Remaining: 20046721024 (18.67 GB)
DFS Used%: 0.67%
DFS Remaining%: 93.36%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Tue Oct 02 15:41:47 CST 2018
  • 獲取一個NameNode節點的HA狀態
[parallels@hdcluster01 bin]$ hdfs haadmin -getServiceState nn1
standby
[parallels@hdcluster01 bin]$ hdfs haadmin -getServiceState nn2
active
  • 單獨啓動一個NameNode進程
sbin/hadoop-daemon.sh start namenode
  • 單獨啓動一個zkfc進程
sbin/hadoop-daemon.sh start zkfc
相關文章
相關標籤/搜索