High Availability With QJM詳細部署步驟

時間 2019-11-11

標籤 high availability qjm 詳細部署步驟简体版

原文原文鏈接

hadoop集羣搭建：

配置hosts （4個節點一致）

192.168.83.11  hd1 
192.168.83.22  hd2
192.168.83.33  hd3
192.168.83.44  hd4

配置主機名（重啓生效）

[hadoop@hd1 ~]$ more /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hd1

配置用戶用戶組

[hadoop@hd1 ~]$ id hadoop
uid=1001(hadoop) gid=10010(hadoop) groups=10010(hadoop)

配置JDK

[hadoop@hd1 ~]$ env|grep JAVA
JAVA_HOME=/usr/java/jdk1.8.0_11
[hadoop@hd1 ~]$ java -version
java version "1.8.0_11"
Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)

配置ssh免密登陸

ssh-keygen -t rsa
ssh-keygen -t dsa
cat ~/.ssh/*.pub > ~/.ssh/authorizedkeys
scp  ~/.ssh/authorizedkeys hdoop@hd2:/.ssh/authorizedkeys

配置環境變量：

export JAVA_HOME=/usr/java/jdk1.8.0_11
export JRE_HOME=/usr/java/jdk1.8.0_11/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib

export HADOOP_INSTALL=/home/hadoop/hadoop-2.7.1
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
export PATRH=$PATH:/home/hadoop/zookeeper-3.4.6/bin

注意：hadoopo有一個小bug，在~/.bash_profile配置JAVA_HOME不生效，只能在hadoop-env.sh配置JAVA_HOMEhtml

# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/java/jdk1.8.0_11

軟件準備：

[hadoop@hd1 software]$ ls -l 
total 931700
-rw-r--r-- 1 hadoop hadoop  17699306 Oct  6 17:30 zookeeper-3.4.6.tar.gz
-rw-r--r--.  1 hadoop hadoop 336424960 Jul 18 23:13 hadoop-2.7.1.tar

tar -xvf /usr/hadoop/hadoop-2.7.1.tar -C /home/hadoop/

tar -xvf ../software/zookeeper-3.4.6.tar.gz -C /home/hadoop/

hadoop HA配置

參考 http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html。java

目的：「Using the Quorum Journal Manager or Conventional Shared Storage」，使用JN（ Quorum JournalNode）就是爲了解決共享存儲的問題，固然官方也推薦使用NFS，不過本人以爲NFS存在性能問題，不敢使用。node

Architecture

官方文檔有詳細介紹結構，看的比較費勁，轉載 yameing 的CSDN 博客片斷（全文地址請點擊：https://blog.csdn.net/yameing/article/details/39696151?utm_source=copy。）web

在一個普通的高可用集羣裏，有兩臺獨立機器被配置爲NN。在任什麼時候間裏，只有一個是處於活動狀態，而另外一個則處於備用狀態。活動NN負責全部客戶端通訊，同時，備用NN只是一個簡單的從節點，維護一個爲了在須要時能快速故障恢復的狀態。apache

爲了備用NN能經過活動NN同步狀態，兩個節點經過一組獨立進程JN進行通訊。任何執行在活動NN的edits，將持久地記錄到大多數JN裏。備用NN可以在這些JN裏讀取到edits，而且不斷的監控記錄的改變。當備用NN讀取到這些edits時，就把它們執行一遍，就保證兩個NN同步。發現故障恢復時，備份NN在確保從JN中讀取到全部edits後，就將本身提高爲活動NN。這就確保了再發生故障恢復前命名空間已徹底同步。bootstrap

爲了提供快速的故障恢復，備用NN擁有最新的塊地址信息也是很是重要的。爲了實現這個要求，DN同時配置有兩個NN的地址，而且同時向二者發送塊地址信息和心跳。安全

在同一時間裏，保證高可用集羣中只有一個活動NN是相當重要的。不然，兩個NN的狀態將很快出現不一致，數據有丟失的風險，或者其餘錯誤的結果。爲了確保這種屬性、防止所謂的腦裂場景（split-brain scenario），在同一時間裏，JN只容許一個NN寫edits。故障恢復期間，將成爲活動節點的NN簡單的獲取寫edits的角色，這將有效的阻止其餘NN繼續處於活動狀態，容許新活動節點安全的進行故障恢復。bash

節點及實例規劃：

NameNode 機器：運行活動NN和備用NN的硬件配置應該是一致的。這和非高可用集羣的配置同樣。網絡

l JournalNode 機器：JN進程相對輕量級，因此這些進程能夠合理的配置在Hadoop集羣的其餘機器裏，如NN，JT、RM。注意：必須至少有3個JN進程，由於edits須要寫入到大多數的JN裏。這就容許系統單臺機器的錯誤。你也能夠運行3個以上JN，但爲了實際提升系統對錯誤的容忍度，最好運行奇數個JN。執行N個JN的集羣上，系統能夠容忍(N-1)/2臺機器錯誤的同時保持正常工做。 session

注意：在高可用集羣裏，備用NN也扮演checkpoint，因此不必再運行一個Secondary NN，CheckpointNode，或BackupNode。事實上，這樣作（運行上述幾個節點）是一種錯誤。這也就容許複用原來指定爲Secondary Namenode 的硬件，將一個非高可用的HDFS集羣從新配置爲高可用的。

Configuration details

To configure HA NameNodes, you must add several configuration options to your hdfs-site.xml configuration file.

dfs.nameservices - the logical name for this new nameservice

<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
</property>

dfs.ha.namenodes.[nameservice ID] - unique identifiers for each NameNode in the nameservice

<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
</property>

dfs.namenode.rpc-address.[nameservice ID].[name node ID] - the fully-qualified RPC address for each NameNode to listen on

<property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>machine1.example.com:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>machine2.example.com:8020</value>
</property>

dfs.namenode.http-address.[nameservice ID].[name node ID] - the fully-qualified HTTP address for each NameNode to listen on

<property>
  <name>dfs.namenode.http-address.mycluster.nn1</name>
  <value>machine1.example.com:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.mycluster.nn2</name>
  <value>machine2.example.com:50070</value>
</property>

dfs.namenode.shared.edits.dir - the URI which identifies the group of JNs where the NameNodes will write/read edits

<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://node1.example.com:8485;node2.example.com:8485;node3.example.com:8485/mycluster</value>
</property>

dfs.client.failover.proxy.provider.[nameservice ID] - the Java class that HDFS clients use to contact the Active NameNode

<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

dfs.ha.fencing.methods - a list of scripts or Java classes which will be used to fence the Active NameNode during a failover.

Importantly, when using the Quorum Journal Manager, only one NameNode will ever be allowed to write to the JournalNodes, so there is no potential for corrupting the file system metadata from a split-brain scenario.

sshfence - SSH to the Active NameNode and kill the process

<property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>

    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/exampleuser/.ssh/id_rsa</value>
    </property>

fs.defaultFS - the default path prefix used by the Hadoop FS client when none is given.in your core-site.xml file:

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://mycluster</value>
</property>

dfs.journalnode.edits.dir - the path where the JournalNode daemon will store its local state

<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/path/to/journal/node/local/data</value>
</property>

配置Datanode

[hadoop@hd1 hadoop]$ more slaves 
hd2
hd3
hd4

以上配置完成，能夠把配置文件拷貝到其餘的節點，完成hadoop集羣配置工做。

啓動：

啓動JN:running the command「hadoop-daemon.sh start journalnode」。
若是是一個新的集羣，須要先格式化NN hdfs namenode -format 在其中一個NN節點上。
若是已經格式化完成，須要拷貝NN元數據到另外的節點，這個時候須要在未格式化的NN節點上執行「hdfs namenode -bootstrapStandby」。(注意：在拷貝元數據以前，須要提早啓動format過的NN,只啓動一個節點)，啓動已經格式化的節點的NN hadoop-daemon.sh start namenode
若是把一個非HA轉換成HA,須要執行「hdfs namenode -initializeSharedEdits」

Note: This is not yet implemented, and at present will always return success, unless the given NameNode is completely down.

格式化NN hdfs namenode -format，報錯：

18/10/07 21:42:34 INFO ipc.Client: Retrying connect to server: hd2/192.168.83.22:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/10/07 21:42:34 WARN namenode.NameNode: Encountered exception during format: 
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown:
192.168.83.22:8485: Call From hd1/192.168.83.11 to hd2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)
        at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:900)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:987)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
18/10/07 21:42:34 INFO ipc.Client: Retrying connect to server: hd3/192.168.83.33:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

解決：在格式化NN的時候，須要鏈接JN,若是鏈接JN失敗或者超時都會出現這種錯誤，首先檢查JN是否啓動，若是因爲網絡延遲致使能夠經過設置timeout來規避這個錯誤。

<!--修改core-site.xml中的ipc參數,防止出現鏈接journalnode服務ConnectException-->
 2 <property>
 3     <name>ipc.client.connect.max.retries</name>
 4     <value>100</value>
 5     <description>Indicates the number of retries a client will make to establish a server connection.</description>
 6 </property>
 7 <property>
 8     <name>ipc.client.connect.retry.interval</name>
 9     <value>10000</value>
10     <description>Indicates the number of milliseconds a client will wait for before retrying to establish a server connection.</description>
11 </property>

---------------------

本文來自 銳湃 的CSDN 博客 ，全文地址請點擊：https://blog.csdn.net/chuyouyinghe/article/details/78976933?utm_source=copy

注意：

　　1) 僅對於這種因爲服務沒有啓動完成形成鏈接超時的問題，均可以調整core-site.xml中的ipc參數來解決。若是目標服務自己沒有啓動成功，這邊調整ipc參數是無效的。

　　2) 該配置使namenode鏈接journalnode最大時間增長至1000s(maxRetries=100, sleepTime=10000),假如集羣節點數過多，或者網絡狀況不穩定，形成鏈接時間超過1000s,仍會致使namenode掛掉。

Automatic Failover：

上面介紹瞭如何配置人工故障恢復。這種方式下，即便活動NN掛掉了，系統不會自動觸發負責恢復。下面描述如何配置和部署自動故障恢復。

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

自動故障恢復增長了兩個組件：Zookeeper quorum、ZKFailoverController（ZKFC）。

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

Introduction

Apache Zookeeper是一個高可用的服務，它能維護少許的協調數據，通知客戶數據的變化，監控客戶端失敗。HDFS故障自動恢復依賴ZK如下特性：

Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.

故障檢測（Failure detection） - 集羣裏的每一個NN與ZK保持一個持久會話。若是機器宕機，ZK會話將過時，而後提醒其餘NN從而觸發一個故障恢復。

Active NameNode election - ZooKeeper provides a simple mechanism to exclusively select a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

活動NN選舉（Active NameNode election） - ZK提供一個簡單的機制專門用來選舉活動節點。若是當前的活動NN宕機，另一個節點會拿到一個在ZK裏的特殊的獨佔鎖，這表示這個節點將會成爲下一個活動節點。

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

ZKFC是一個新的組件，它是一個ZK客戶端，同時監聽和管理NN的狀態。運行NN的機器上須同時運行ZKFC，它的責任是：

Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

健康監控（Health monitoring） - ZKFC使用「health-check」命令按期ping本地的NN。只要NN及時的響應一個健康狀態，則認爲這個節點是健康的。若是節點宕機，無響應或者進入了其餘不健康狀態，健康監控器認爲它是不健康的。

ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special 「lock」 znode. This lock uses ZooKeeper’s support for 「ephemeral」 nodes; if the session expires, the lock node will be automatically deleted.

ZK會話管理（ZooKeeper session management） - 當本地NN是健康的，ZKFC持有ZK的一個打開的會話。若是本地NN是活動狀態，ZKFC同時持有一個特殊的鎖節點（a special "lock" znode）。這個鎖使用了ZK的臨時節點。若是會話過時，這個鎖節點將被自動刪除。

ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has 「won the election」, and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.

基於ZK的選舉（ZooKeeper-based election） - 若是本地節點是健康的而且ZKFC發現當前沒有節點持有鎖，它就嘗試獲取這個鎖。若是成功，它就贏得了選舉，執行一次故障恢復以使本身成爲活動NN。這個故障恢復過程和前面介紹的人工故障恢復是類似的：一、隔離前活動節點（若是須要）；二、本地NN轉換成活動節點

Deploying ZooKeeper

Before you begin configuring automatic failover, you should shut down your cluster. It is not currently possible to transition from a manual failover setup to an automatic failover setup while the cluster is running.

注意：在開始配置自動故障恢復前，關閉你的集羣。目前還不支持在集羣運行時將人工故障恢復轉換爲自動故障恢復。

Installer ZooKeeper：

ZK 集羣模式部署

參考文檔：https://zookeeper.apache.org/doc/r3.4.6/zookeeperStarted.html

解壓 zookeeper-3.4.6.tar.gz

vi conf/zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/hadoop/zookeeper-3.4.6/tmp
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hd1:2888:3888
server.2=hd2:2888:3888
server.3=hd3:2888:3888

配置ServerID

The entries of the form server.X list the servers that make up the ZooKeeper service. When the server starts up, it knows which server it is by looking for the file myid in the data directory. That file has the contains the server number,

server.X標記ZK服務，當服務啓動，他會去DataDir目錄下尋找一個myid的文件，這個文件包括server.X，

server.1,.2,.3是zookeeper Server.id

mkdir -p /usr/hadoop/zookeeper-3.4.6/tmp

vi /usr/hadoop/zookeeper-3.4.6/tmp/myid

[hadoop@hd1 tmp]$ more myid 
1

hd1 的 /usr/hadoop/zookeeper-3.4.6/tmp/myid文件寫入1，hd2寫入2，hd3寫入3，依此類推，，，
以上完成ZK的配置工做，能夠把配置文件拷貝到其餘ZK節點，完成ZK集羣的配置。

啓動ZK :

[hadoop@hd1 bin]$ sh zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@hd1 bin]$ jps
1957 QuorumPeerMain
1976 Jps

Configuring automatic failover：

In your hdfs-site.xml file, add:

<property>
   <name>dfs.ha.automatic-failover.enabled</name>
   <value>true</value>
 </property>

In your core-site.xml file, add:

<property>
   <name>ha.zookeeper.quorum</name>
   <value>hd1:2181,hd2:2181,hd3:2181</value>
 </property>

This lists the host-port pairs running the ZooKeeper service.

如上地址-端口應該運行着ZK服務。

Initializing HA state in ZooKeeper

After the configuration keys have been added, the next step is to initialize required state in ZooKeeper. You can do so by running the following command from one of the NameNode hosts.

以上配置都完成以後下一步就是初始化ZK,能夠運行以下命令完成初始化：

hdfs zkfc -formatZK

This will create a znode in ZooKeeper inside of which the automatic failover system stores its data.

Starting the cluster with 「start-dfs.sh」

Since automatic failover has been enabled in the configuration, the start-dfs.sh script will now automatically start a ZKFC daemon on any machine that runs a NameNode. When the ZKFCs start, they will automatically select one of the NameNodes to become active.

HA 啓動總結

啓動JN（hd2,hd3，hd4）：

[hadoop@hd4 ~]$ hadoop-daemon.sh  start journalnode
starting journalnode, logging to /usr/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-journalnode-hd4.out
[hadoop@hd4 ~]$ jps
1843 JournalNode
1879 Jps

NN格式化（任意一臺NN執行，hd1,hd2 ）：

[hadoop@hd1 sbin]$ hdfs namenode -format 
18/10/07 05:54:30 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hd1/192.168.83.11
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.1
........
18/10/07 05:54:34 INFO namenode.FSImage: Allocated new BlockPoolId: BP-841723191-192.168.83.11-1538862874971
18/10/07 05:54:34 INFO common.Storage: Storage directory /usr/hadoop/hadoop-2.7.1/dfs/name has been successfully formatted.
18/10/07 05:54:35 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/10/07 05:54:35 INFO util.ExitUtil: Exiting with status 0
18/10/07 05:54:35 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hd1/192.168.83.11
************************************************************/

NN元數據拷貝（注意：在拷貝元數據以前，須要提早啓動format過的NN,只啓動一個節點）
啓動format過的NN

[hadoop@hd1 current]$ hadoop-daemon.sh start namenode 
starting namenode, logging to /usr/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-namenode-hd1.out
[hadoop@hd1 current]$ jps
1777 QuorumPeerMain
2177 Jps

在未format NN節點（hd2）執行元數據拷貝命令

[hadoop@hd2 ~]$ hdfs namenode -bootstrapStandby
18/10/07 06:07:15 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
STARTUP_MSG:   args = [-bootstrapStandby]
STARTUP_MSG:   version = 2.7.1
。。。。。。。。。。。。。。。。。。。。。。
************************************************************/
18/10/07 06:07:15 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
18/10/07 06:07:15 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
18/10/07 06:07:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
=====================================================
About to bootstrap Standby ID nn2 from:
           Nameservice ID: mycluster
        Other Namenode ID: nn1
  Other NN's HTTP address: http://hd1:50070
  Other NN's IPC  address: hd1/192.168.83.11:8020
             Namespace ID: 1626081692
            Block pool ID: BP-841723191-192.168.83.11-1538862874971
               Cluster ID: CID-230e9e54-e6d1-4baf-a66a-39cc69368ed8
           Layout version: -63
       isUpgradeFinalized: true
=====================================================
18/10/07 06:07:17 INFO common.Storage: Storage directory /usr/hadoop/hadoop-2.7.1/dfs/name has been successfully formatted.
18/10/07 06:07:18 INFO namenode.TransferFsImage: Opening connection to http://hd1:50070/imagetransfer?getimage=1&txid=0&storageInfo=-63:1626081692:0:CID-230e9e54-e6d1-4baf-a66a-39cc69368ed8
18/10/07 06:07:18 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
18/10/07 06:07:18 INFO namenode.TransferFsImage: Transfer took 0.01s at 0.00 KB/s
18/10/07 06:07:18 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 353 bytes.
18/10/07 06:07:18 INFO util.ExitUtil: Exiting with status 0
18/10/07 06:07:18 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
************************************************************/

啓動ZK

hdfs zkfc -formatZK

格式化ZK報錯：

8/10/07 22:34:06 INFO zookeeper.ClientCnxn: Opening socket connection to server hd1/192.168.83.11:2181. Will not attempt to authenticate using SASL (unknown error)
18/10/07 22:34:06 INFO zookeeper.ClientCnxn: Socket connection established to hd1/192.168.83.11:2181, initiating session
18/10/07 22:34:06 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
18/10/07 22:34:07 INFO zookeeper.ClientCnxn: Opening socket connection to server hd2/192.168.83.22:2181. Will not attempt to authenticate using SASL (unknown error)
18/10/07 22:34:07 INFO zookeeper.ClientCnxn: Socket connection established to hd2/192.168.83.22:2181, initiating session
18/10/07 22:34:07 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
18/10/07 22:34:07 ERROR ha.ActiveStandbyElector: Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds
18/10/07 22:34:07 INFO zookeeper.ZooKeeper: Session: 0x0 closed
18/10/07 22:34:07 INFO zookeeper.ClientCnxn: EventThread shut down
18/10/07 22:34:07 FATAL ha.ZKFailoverController: Unable to start failover controller. Unable to connect to ZooKeeper quorum at hd1:2181,hd2:2181,hd3:2181. Please check the configured value for ha.zookeeper.quorum and ensure that ZooKeeper is running.
[hadoop@hd1 ~]$

Looks like your zookeeper quorum was not able to elect a master. Maybe you have misconfigured your zookeeper?

Make sure that you have entered all 3 servers in your zoo.cfg with a unique ID. Make sure you have the same config on all 3 of your machines and and make sure that every server is using the correct myId as specified in the cfg.

修改以後從新執行：

[hadoop@hd1 bin]$ hdfs zkfc -formatZK
18/10/09 20:27:21 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop-2.7.1/lib/native
18/10/09 20:27:21 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
18/10/09 20:27:21 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
18/10/09 20:27:21 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
18/10/09 20:27:21 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
18/10/09 20:27:21 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-504.el6.x86_64
18/10/09 20:27:21 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop
18/10/09 20:27:21 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
18/10/09 20:27:21 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/zookeeper-3.4.6/bin
18/10/09 20:27:21 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=hd1:2181,hd2:2181,hd3:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5119fb47
18/10/09 20:27:21 INFO zookeeper.ClientCnxn: Opening socket connection to server hd1/192.168.83.11:2181. Will not attempt to authenticate using SASL (unknown error)
18/10/09 20:27:22 INFO zookeeper.ClientCnxn: Socket connection established to hd1/192.168.83.11:2181, initiating session
18/10/09 20:27:22 INFO zookeeper.ClientCnxn: Session establishment complete on server hd1/192.168.83.11:2181, sessionid = 0x16658c662c80000, negotiated timeout = 5000
18/10/09 20:27:22 INFO ha.ActiveStandbyElector: Session connected.
18/10/09 20:27:22 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
18/10/09 20:27:22 INFO zookeeper.ZooKeeper: Session: 0x16658c662c80000 closed
18/10/09 20:27:22 INFO zookeeper.ClientCnxn: EventThread shut down

啓動ZK：

[hadoop@hd1 bin]$ ./zkServer.sh start 
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

啓動全部：

[hadoop@hd1 bin]$ start-dfs.sh
18/10/09 20:36:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hd1 hd2]
hd2: namenode running as process 2065. Stop it first.
hd1: namenode running as process 2011. Stop it first.
hd2: starting datanode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-hd2.out
hd4: starting datanode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-hd4.out
hd3: starting datanode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-hd3.out
Starting journal nodes [hd2 hd3 hd4]
hd4: journalnode running as process 1724. Stop it first.
hd2: journalnode running as process 1839. Stop it first.
hd3: journalnode running as process 1725. Stop it first.
18/10/09 20:37:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting ZK Failover Controllers on NN hosts [hd1 hd2]
hd1: zkfc running as process 3045. Stop it first.
hd2: zkfc running as process 2601. Stop it first.

查看hd2 DN日誌：

[hadoop@hd2 logs]$ jps
1984 QuorumPeerMain
2960 Jps
2065 NameNode
2601 DFSZKFailoverController
1839 JournalNode

2018-10-09 20:37:07,674 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/hadoop/hadoop-2.7.1/dfs/data/in_use.lock acquired by nodename 2787@hd2
2018-10-09 20:37:07,674 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /home/hadoop/hadoop-2.7.1/dfs/data: namenode clusterID = CID-e28f1182-d452
-4f23-9b37-9a59d4bdeaa0; datanode clusterID = CID-876d5634-38e8-464c-be02-714ee8c72878
2018-10-09 20:37:07,675 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to hd2/192.168.83.22:8020. Exiti
ng. 
java.io.IOException: All specified directories are failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
        at java.lang.Thread.run(Thread.java:745)
2018-10-09 20:37:07,676 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to hd1/192.168.83.11:8020. Exiti
ng. 
java.io.IOException: All specified directories are failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
        at java.lang.Thread.run(Thread.java:745)
2018-10-09 20:37:07,683 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to hd1/192.168.83.11:8020
2018-10-09 20:37:07,684 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to hd2/192.168.83.22:8020
2018-10-09 20:37:07,687 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2018-10-09 20:37:09,688 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2018-10-09 20:37:09,689 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2018-10-09 20:37:09,698 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hd2/192.168.83.22
************************************************************/

發現hd2 DN沒有啓動，從日誌能夠看出 namenode clusterID = CID-e28f1182-d452
-4f23-9b37-9a59d4bdeaa0; datanode clusterID = CID-876d5634-38e8-464c-be02-714ee8c72878 NN_ID和DN_ID不一致致使啓動失敗。回想本身的操做，因爲NN重複格式化致使NN_ID 發生變化，而DN_ID 沒有變化致使不一致，解決辦法很簡單把DN 數據刪除從新啓動DN。

從新查看hd2節點：

[hadoop@hd2 dfs]$ jps
1984 QuorumPeerMain
2065 NameNode
3123 DataNode
3268 Jps
2601 DFSZKFailoverController
1839 JournalNode

到目前爲止，各個節點實例都啓動完畢，如今羅列一下：
hd1:

[hadoop@hd1 bin]$ jps
4180 Jps
3045 DFSZKFailoverController
2135 QuorumPeerMain
2011 NameNode

hd2:

[hadoop@hd2 dfs]$ jps
1984 QuorumPeerMain
2065 NameNode
3123 DataNode
3268 Jps
2601 DFSZKFailoverController
1839 JournalNode

hd3:

[hadoop@hd3 bin]$ jps
2631 Jps
2523 DataNode
1725 JournalNode
1807 QuorumPeerMain

hd4:

[hadoop@hd4 ~]$ jps
2311 DataNode
2425 Jps
1724 JournalNode

經過web界面訪問NN(任意一個NN):

http://192.168.83.11:50070

http://192.168.83.22:50070

[hadoop@hd1 bin]$ hdfs dfs -put zookeeper.out /
18/10/09 21:11:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hd1 bin]$ hdfs dfs -ls /
18/10/09 21:11:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 3 hadoop supergroup 25698 2018-10-09 21:11 /zookeeper.out

下面配置MR

mapred-site.xml

<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
</configuration>

yarn-site.xml

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>hd1</value>
</property>
<property>
    <name>yarn.resourcemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
</configuration>

web界面管理MR:

http://hd1:8088/

查看服務默認端口能夠在 http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/ClusterSetup.html 下面的configuration 配置項找。

NN手動管理：

[hadoop@hd1 bin]$ hdfs haadmin
Usage: haadmin
    [-transitionToActive [--forceactive] <serviceId>]
    [-transitionToStandby <serviceId>]    --前面定義的nn1,nn2
    [-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
    [-getServiceState <serviceId>]
    [-checkHealth <serviceId>]
    [-help <command>]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

This guide describes high-level uses of each of these subcommands. For specific usage information of each subcommand, you should run 「hdfs haadmin -help <command>」.

transitionToActive and transitionToStandby - transition the state of the given NameNode to Active or Standby

These subcommands cause a given NameNode to transition to the Active or Standby state, respectively. These commands do not attempt to perform any fencing, and thus should rarely be used. Instead, one should almost always prefer to use the 「hdfs haadmin -failover」 subcommand.
failover - initiate a failover between two NameNodes

This subcommand causes a failover from the first provided NameNode to the second. If the first NameNode is in the Standby state, this command simply transitions the second to the Active state without error. If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by dfs.ha.fencing.methods) will be attempted in order until one succeeds. Only after this process will the second NameNode be transitioned to the Active state. If no fencing method succeeds, the second NameNode will not be transitioned to the Active state, and an error will be returned.
getServiceState - determine whether the given NameNode is Active or Standby

Connect to the provided NameNode to determine its current state, printing either 「standby」 or 「active」 to STDOUT appropriately. This subcommand might be used by cron jobs or monitoring scripts which need to behave differently based on whether the NameNode is currently Active or Standby.
checkHealth - check the health of the given NameNode

Connect to the provided NameNode to check its health. The NameNode is capable of performing some diagnostics on itself, including checking if internal services are running as expected. This command will return 0 if the NameNode is healthy, non-zero otherwise. One might use this command for monitoring purposes.

Note: This is not yet implemented, and at present will always return success, unless the given NameNode is completely down.