如何搭建配置centos虛擬機請參考《Kafka:ZK+Kafka+Spark Streaming集羣環境搭建(一)VMW安裝四臺CentOS,並實現本機與它們能交互,虛擬機內部實現能夠上網。》html
如何安裝hadoop2.9.0請參考《Kafka:ZK+Kafka+Spark Streaming集羣環境搭建(二)安裝hadoop2.9.0》java
如何配置zookeeper3.4.12 請參考《Kafka:ZK+Kafka+Spark Streaming集羣環境搭建(八)安裝zookeeper-3.4.12》node
安裝hadoop的服務器:git
192.168.0.120 master 192.168.0.121 slave1 192.168.0.122 slave2 192.168.0.123 slave3
配置高可用(HA)集羣在實際生產中可保證集羣安全穩定高效的運行,下面講解一下HA中的各個節點的做用。web
1,zookeeper的java進程:QuorumPeermain,各類分佈式服務(如hdfs,hbase,resourcemanager)註冊到zookeeper上去,便可啓動2個以上的相同進程(其中一個狀態爲active,其餘爲standby),那麼,其中active節點掛了,可經過zookeeper進行切換到standby,讓他active,保證進程正常運行。shell
2,namenode:hdfs的主節點,保存hdfs正常運行的各類必要元數據(保存在edits和fsimage文件中),萬一掛了,整個集羣就掛了,因此要配置多namenode。apache
3,datanode:hdfs的數據節點,保存真實數據的節點。bootstrap
4,journalnode:日誌節點,將namenode上的edits文件分離出來,弄成一個集羣,保證namenode不被掛。windows
5,zkfc:全稱DFSZKFailoverController,監控並管理namenode的狀態和切換。centos
6,Resourcemanager:管理並分配集羣的資源,如爲nodemanager分配計算資源等。
7,nodemanager:管理datanode,並隨datanode的啓動而啓動。
其實,最好每一個節點跑一個進程,奈何機器性能不足,跑不了那麼多虛擬機,就將幾個節點放在同一個節點上,只將重要的進程(namenode和resourcemanager)放在不一樣的節點。
首先是系統環境:
如何搭建配置centos虛擬機請參考《Kafka:ZK+Kafka+Spark Streaming集羣環境搭建(一)VMW安裝四臺CentOS,並實現本機與它們能交互,虛擬機內部實現能夠上網。》
如何安裝hadoop2.9.0請參考《Kafka:ZK+Kafka+Spark Streaming集羣環境搭建(二)安裝hadoop2.9.0》
而後搭建zookeeper3.4.12請參考《Kafka:ZK+Kafka+Spark Streaming集羣環境搭建(八)安裝zookeeper-3.4.12》
三,修改配置文件
先在master節點上修改配置文件,而後傳到其餘節點。
master上:關閉hadoop服務,並清除dfs、logs、tmp目錄
cd /opt/hadoop-2.9.0 sbin/stop-all.sh rm -r dfs rm -r logs rm -r tmp
slaves上:關閉hadoop服務,清除hadoop安裝目錄
cd /opt/hadoop-2.9.0 sbin/stop-all.sh rm -r *
cd /opt/hadoop-2.9.0/etc/hadoop vi hadoop-env.sh export JAVA_HOME = path_to_jdk #添加配置
cd /opt/hadoop-2.9.0/etc/hadoop vi core-site.xml <configuration> <!-- 指定hdfs的nameservice爲ns1 --> <property> <name>fs.defaultFS</name> <!-- <value>hdfs://master:9000/</value>--> <value>hdfs://HA</value> </property> <!-- 指定hadoop臨時目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop-2.9.0/tmp</value> <description>這裏的路徑默認是NameNode、DataNode、JournalNode等存放數據的公共目錄</description> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>master:2181,slave1:2181,slave2:2181</value> <description>這裏是ZooKeeper集羣的地址和端口。注意,數量必定是奇數,且很多於三個節點</description> </property> <!-- 下面的配置可解決NameNode鏈接JournalNode超時異常問題--> <property> <name>ipc.client.connect.retry.interval</name> <value>10000</value> <description>Indicates the number of milliseconds a client will wait for before retrying to establish a server connection. </description> </property> </configuration>
cd /opt/hadoop-2.9.0/etc/hadoop vi hdfs-site.xml <configuration> <!--指定hdfs的nameservice爲HA,須要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>HA</value> </property> <!-- HA下面有兩個NameNode,分別是nn1,nn2 --> <property> <name>dfs.ha.namenodes.HA</name> <value>nn1,nn2</value> </property> <!-- nn1的RPC通訊地址 --> <property> <name>dfs.namenode.rpc-address.HA.nn1</name> <value>master:9000</value> <description>9000爲HDFS 客戶端接入地址(包括命令行與程序),有的使用8020</description> </property> <!-- nn1的http通訊地址 --> <property> <name>dfs.namenode.http-address.HA.nn1</name> <value>master:50070</value> <description> namenode web的接入地址</description> </property> <!-- nn2的RPC通訊地址 --> <property> <name>dfs.namenode.rpc-address.HA.nn2</name> <value>slave1:9000</value> <description>9000爲HDFS 客戶端接入地址(包括命令行與程序),有的使用8020</description> </property> <!-- nn2的http通訊地址 --> <property> <name>dfs.namenode.http-address.HA.nn2</name> <value>slave1:50070</value> <description> namenode web的接入地址</description> </property> <!-- 指定NameNode的edits元數據在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://master:8485;slave1:8485;slave2:8485/HA</value> <description>指定 nn1 的兩個NameNode共享edits文件目錄時,使用的JournalNode集羣信息。master\slave1主機中使用這個配置</description> </property> <!-- 指定JournalNode在本地磁盤存放數據的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/hadoop-2.9.0/tmp/journal</value> </property> <!-- 開啓NameNode失敗自動切換 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失敗自動切換實現方式 --> <property> <name>dfs.client.failover.proxy.provider.HA</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔離機制方法,多個機制用換行分割,即每一個機制暫用一行--> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <!-- 使用sshfence隔離機制時須要ssh免登錄 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/spark/.ssh/id_rsa</value> <description> [spark@master hadoop]$ cd /home/spark/.ssh/ [spark@master .ssh]$ ls authorized_keys id_rsa.pub id_rsa.pub.slave2 known_hosts id_rsa id_rsa.pub.slave1 id_rsa.pub.slave3 </description> </property> <!-- 配置sshfence隔離機制超時時間 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <!-- <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/hadoop-2.9.0/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/hadoop-2.9.0/dfs/data</value> </property> --> <property> <name>dfs.replication</name> <value>3</value> <description>指定DataNode存儲block的副本數量。默認值是3個,咱們如今有4個DataNode,該值不大於4便可</description> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> <description> The default block size for new files, in bytes. You can use the following suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in bytes (such as 134217728 for 128 MB). 注:1.X及之前版本默認是64M,並且配置項名爲dfs.block.size </description> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> <description>注:若是還有權限問題,請執行下「/opt/hadoop-2.9.0/bin/hdfs dfs -chmod -R 777 /」命令</description> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
注意:這裏可能出現問題:The ratio of reported blocks 1.0000 has reached the threshold 0.9990. Safe mode will be turned off automatically in 27 seconds.
緣由:
hadoop默認狀況下在安全模式運行。能夠經過下面hadoop dfsadmin -safemode 參數,查看相關的狀態和設置安全模塊是否啓用。
enter 進入安全模式
leave 強制NameNode離開安全模式
get 返回安全模式是否開啓的信息
wait 等待,一直到安全模式結束。
解決辦法有二種:
《1》可能運行命令hadoop dfsadmin -safemode leave 離開安全模式,可是每次都須要手動去設置。
《2》經過配置dfs.safemode.threshold.pct的參數。默認狀況下是0.9990f。這個配置能夠在hdfs-defalut.xml中找到。咱們能夠把這個參數配置爲0,永久關閉安全模式。
在hadoop中的hdfs-site.xml添加以下配置:
<property> <name>dfs.safemode.threshold.pct</name> <value>0f</value> <description> Specifies the percentage of blocks that should satisfy the minimal replication requirement defined by dfs.replication.min. Values less than or equal to 0 mean not to wait for any particular percentage of blocks before exiting safemode. Values greater than 1 will make safe mode permanent. </description> </property>
重啓NameNode就能夠了。
cd /opt/hadoop-2.9.0/etc/hadoop vi mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>master:50030</value> <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比 slave1:50030 slave2:50030 slave3:50030 ,拷貝過去後請作相應修改</description> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比 slave1:10020、 slave2:10020、 slave3:10020 ,拷貝過去後請作相應修改</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比 slave1:1988八、 slave2:1988八、 slave3:19888 ,拷貝過去後請作相應修改</description> </property> <property> <name>mapred.job.tracker</name> <value>http://master:9001</value> <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比 http://slave1:9001 、 http://slave2:9001 、 http://slave3:9001 ,拷貝過去後請作相應修改</description> </property> </configuration>
cd /opt/hadoop-2.9.0/etc/hadoop vi yarn-site.xml <configuration> <!-- <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> <description>Whether virtual memory limits will be enforced for containers</description> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>4</value> <description>Ratio between virtual memory to physical memory when setting memory limits for containers</description> </property> --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <!-- 開啓RM高可用 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property> <!-- 分別指定RM的地址 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>master</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>slave1</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>master:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>slave1:8088</value> </property> <!-- 指定zk集羣地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>master:2181,slave1:2181,slave2:2181</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> <description>RM的數據默認存放在ZK上的/rmstore中,可經過yarn.resourcemanager.zk-state-store.parent-path 設定</description> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> <description>開啓日誌收集,這樣會將每臺執行任務的機上產生的本地日誌文件集中拷貝到HDFS的某個地方,這樣就能夠在任何一臺集羣中的機器上集中查看做業日誌了</description> </property> <property> <name>yarn.log.server.url</name> <value>http://master:19888/jobhistory/logs</value> <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比http://slave1:19888/jobhistory/logs、http://slave2:19888/jobhistory/logs、http://slave3:19888/jobhistory/logs,拷貝過去後請作相應修改</description> </property> </configuration>
cd /opt/hadoop-2.9.0/etc/hadoop vi slaves [spark@master hadoop]$ more slaves #localhost slave1 slave2 slave3
scp -r /opt/hadoop-2.9.0 spark@slave1:/opt/ scp -r /opt/hadoop-2.9.0 spark@slave2:/opt/ scp -r /opt/hadoop-2.9.0 spark@slave3:/opt/
分別在slave1,slave2,slave3上修改配置:
修改yarn-site.xml
cd /opt/hadoop-2.9.0/etc/hadoop vi yarn-site.xml
修改內容
<property> <name>yarn.log.server.url</name> <value>http://master:19888/jobhistory/logs</value> <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比http://slave1:19888/jobhistory/logs、http://slave2:19888/jobhistory/logs、http://slave3:19888/jobhistory/logs,拷 貝過去後請作相應修改</description> </property>
修改mapred-site.xml
cd /opt/hadoop-2.9.0 vi mapred-site.xml
修改內容
<configuration> <property> <name>mapreduce.jobtracker.http.address</name> <value>master:50030</value> <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比 slave1:50030 slave2:50030 slave3:50030 ,拷貝過去後請作相應修改</description> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比 slave1:10020、 slave2:10020、 slave3:10020 ,拷貝過去後請作相應修改</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比 slave1:1988八、 slave2:1988八、 slave3:19888 ,拷貝過去後請作相應修改</description> </property> <property> <name>mapred.job.tracker</name> <value>http://master:9001</value> <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比 http://slave1:9001 、 http://slave2:9001 、 http://slave3:9001 ,拷貝過去後請作相應修改</description> </property> </configuration>
cd /opt/zookeeper-3.4.12/bin ./zkServer.sh start ./zkServer.sh status
master
[spark@master hadoop-2.9.0]$ cd /opt/zookeeper-3.4.12/bin [spark@master bin]$ ./zkServer.sh start ZooKeeper JMX enabled by default Using config: /opt/zookeeper-3.4.12/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [spark@master bin]$ [spark@master bin]$ ./zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/zookeeper-3.4.12/bin/../conf/zoo.cfg Mode: follower
slave1
[spark@slave1 hadoop]$ cd /opt/zookeeper-3.4.12/bin [spark@slave1 bin]$ ./zkServer.sh start ZooKeeper JMX enabled by default Using config: /opt/zookeeper-3.4.12/bin/../conf/zoo.cfg ./zkServer.sh status Starting zookeeper ... STARTED [spark@slave1 bin]$ [spark@slave1 bin]$ ./zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/zookeeper-3.4.12/bin/../conf/zoo.cfg Mode: leader
slave2
[spark@slave2 hadoop]$ cd /opt/zookeeper-3.4.12/bin [spark@slave2 bin]$ ./zkServer.sh start ZooKeeper JMX enabled by default Using config: /opt/zookeeper-3.4.12/bin/../conf/zoo.cfg Starting zookeeper ... ./zkServer.sh status STARTED [spark@slave2 bin]$ [spark@slave2 bin]$ ./zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/zookeeper-3.4.12/bin/../conf/zoo.cfg Mode: follower
cd /opt/hadoop-2.9.0 sbin/hadoop-daemon.sh start journalnode
master
[spark@master hadoop]$ cd /opt/hadoop-2.9.0 [spark@master hadoop-2.9.0]$ sbin/hadoop-deamon.sh start journalnode starting journalnode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-journalnode-master.out [spark@master hadoop-2.9.0]$ jps 1808 Jps 1757 JournalNode 1662 QuorumPeerMain [spark@master hadoop-2.9.0]$
slave1
[spark@slave1 bin]$ cd /opt/hadoop-2.9.0 [spark@slave1 hadoop-2.9.0]$ sbin/hadoop-daemon.sh start journalnode starting journalnode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-journalnode-slave1.out [spark@slave1 hadoop-2.9.0]$ jps 2003 JournalNode 2054 Jps 1931 QuorumPeerMain [spark@slave1 hadoop-2.9.0]$
slave2
[spark@slave2 bin]$ cd /opt/hadoop-2.9.0 [spark@slave2 hadoop-2.9.0]$ sbin/hadoop-daemon.sh start journalnode starting journalnode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-journalnode-slave2.out [spark@slave2 hadoop-2.9.0]$ jps 1978 QuorumPeerMain 2044 JournalNode 2095 Jps [spark@slave2 hadoop-2.9.0]$
cd /opt/hadoop-2.9.0 bin/hdfs namenode -format #等同 bin/hadoop namenode -format [spark@master HA]$ cd /opt/hadoop-2.9.0 [spark@master hadoop-2.9.0]$ bin/hdfs namenode -format 18/07/01 22:26:11 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = master/192.168.0.120 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.9.0 STARTUP_MSG: classpath = /opt/hadoop-2.9.0/etc/hadoop:/opt/hadoop-2.9.0/share/hadoop/common/lib/nimbus-jose-jwt-3.9.jar:/opt/hadoop-2.9.0/share/hadoop/common/li.... STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 756ebc8394e473ac25feac05fa493f6d612e6c50; compiled by 'arsuresh' on 2017-11-13T23:15Z STARTUP_MSG: java = 1.8.0_171 ************************************************************/ 18/07/01 22:26:11 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 18/07/01 22:26:11 INFO namenode.NameNode: createNameNode [-format] Formatting using clusterid: CID-01f8ecdf-532b-4415-807e-90b4aa179e29 18/07/01 22:26:12 INFO namenode.FSEditLog: Edit logging is async:true 18/07/01 22:26:12 INFO namenode.FSNamesystem: KeyProvider: null 18/07/01 22:26:12 INFO namenode.FSNamesystem: fsLock is fair: true 18/07/01 22:26:12 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false 18/07/01 22:26:12 INFO namenode.FSNamesystem: fsOwner = spark (auth:SIMPLE) 18/07/01 22:26:12 INFO namenode.FSNamesystem: supergroup = supergroup 18/07/01 22:26:12 INFO namenode.FSNamesystem: isPermissionEnabled = false 18/07/01 22:26:12 INFO namenode.FSNamesystem: Determined nameservice ID: HA 18/07/01 22:26:12 INFO namenode.FSNamesystem: HA Enabled: true 18/07/01 22:26:12 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling 18/07/01 22:26:12 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000 18/07/01 22:26:12 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true 18/07/01 22:26:12 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000 18/07/01 22:26:12 INFO blockmanagement.BlockManager: The block deletion will start around 2018 Jul 01 22:26:12 18/07/01 22:26:12 INFO util.GSet: Computing capacity for map BlocksMap 18/07/01 22:26:12 INFO util.GSet: VM type = 64-bit 18/07/01 22:26:12 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB 18/07/01 22:26:12 INFO util.GSet: capacity = 2^21 = 2097152 entries 18/07/01 22:26:12 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false 18/07/01 22:26:12 WARN conf.Configuration: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS 18/07/01 22:26:12 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033 18/07/01 22:26:12 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0 18/07/01 22:26:12 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000 18/07/01 22:26:12 INFO blockmanagement.BlockManager: defaultReplication = 3 18/07/01 22:26:12 INFO blockmanagement.BlockManager: maxReplication = 512 18/07/01 22:26:12 INFO blockmanagement.BlockManager: minReplication = 1 18/07/01 22:26:12 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 18/07/01 22:26:12 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 18/07/01 22:26:12 INFO blockmanagement.BlockManager: encryptDataTransfer = false 18/07/01 22:26:12 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 18/07/01 22:26:12 INFO namenode.FSNamesystem: Append Enabled: true 18/07/01 22:26:12 INFO util.GSet: Computing capacity for map INodeMap 18/07/01 22:26:12 INFO util.GSet: VM type = 64-bit 18/07/01 22:26:12 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB 18/07/01 22:26:12 INFO util.GSet: capacity = 2^20 = 1048576 entries 18/07/01 22:26:12 INFO namenode.FSDirectory: ACLs enabled? false 18/07/01 22:26:12 INFO namenode.FSDirectory: XAttrs enabled? true 18/07/01 22:26:12 INFO namenode.NameNode: Caching file names occurring more than 10 times 18/07/01 22:26:12 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: falseskipCaptureAccessTimeOnlyChange: false 18/07/01 22:26:12 INFO util.GSet: Computing capacity for map cachedBlocks 18/07/01 22:26:12 INFO util.GSet: VM type = 64-bit 18/07/01 22:26:12 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB 18/07/01 22:26:12 INFO util.GSet: capacity = 2^18 = 262144 entries 18/07/01 22:26:12 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10 18/07/01 22:26:12 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10 18/07/01 22:26:12 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25 18/07/01 22:26:12 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 18/07/01 22:26:12 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 18/07/01 22:26:12 INFO util.GSet: Computing capacity for map NameNodeRetryCache 18/07/01 22:26:12 INFO util.GSet: VM type = 64-bit 18/07/01 22:26:12 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB 18/07/01 22:26:12 INFO util.GSet: capacity = 2^15 = 32768 entries Re-format filesystem in Storage Directory /opt/hadoop-2.9.0/tmp/dfs/name ? (Y or N) y Re-format filesystem in QJM to [192.168.0.120:8485, 192.168.0.121:8485, 192.168.0.122:8485] ? (Y or N) y 18/07/01 22:26:19 INFO namenode.FSImage: Allocated new BlockPoolId: BP-4950294-192.168.0.120-1530455179314 18/07/01 22:26:19 INFO common.Storage: Will remove files: [/opt/hadoop-2.9.0/tmp/dfs/name/current/VERSION, /opt/hadoop-2.9.0/tmp/dfs/name/current/seen_txid] 18/07/01 22:26:19 INFO common.Storage: Storage directory /opt/hadoop-2.9.0/tmp/dfs/name has been successfully formatted. 18/07/01 22:26:19 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hadoop-2.9.0/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 18/07/01 22:26:19 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadoop-2.9.0/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 322 bytes saved in 0 seconds. 18/07/01 22:26:19 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 18/07/01 22:26:19 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master/192.168.0.120 ************************************************************/
先在 master 上啓動namenode
cd /opt/hadoop-2.9.0 sbin/hadoop-daemon.sh start namenode [spark@master hadoop-2.9.0]$ cd /opt/hadoop-2.9.0 [spark@master hadoop-2.9.0]$ sbin/hadoop-daemon.sh start namenode starting namenode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-namenode-master.out [spark@master hadoop-2.9.0]$ jps 2016 Jps 1939 NameNode 1757 JournalNode 1662 QuorumPeerMain [spark@master hadoop-2.9.0]$
再在 slave1 上執行命令:
cd /opt/hadoop-2.9.0 bin/hdfs namenode -bootstrapStandby [spark@slave1 HA]$ cd /opt/hadoop-2.9.0 [spark@slave1 hadoop-2.9.0]$ bin/hdfs namenode -bootstrapStandby 18/07/01 22:28:58 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = slave1/192.168.0.121 STARTUP_MSG: args = [-bootstrapStandby] STARTUP_MSG: version = 2.9.0 STARTUP_MSG: classpath = /opt/hadoop-2.9.0/etc/hadoop:/opt/hadoop-2.9.0/share/hadoop/common/lib... STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 756ebc8394e473ac25feac05fa493f6d612e6c50; compiled by 'arsuresh' on 2017-11-13T23:15Z STARTUP_MSG: java = 1.8.0_171 ************************************************************/ 18/07/01 22:28:58 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 18/07/01 22:28:58 INFO namenode.NameNode: createNameNode [-bootstrapStandby] ===================================================== About to bootstrap Standby ID nn2 from: Nameservice ID: HA Other Namenode ID: nn1 Other NN's HTTP address: http://master:50070 Other NN's IPC address: master/192.168.0.120:9000 Namespace ID: 1875840257 Block pool ID: BP-4950294-192.168.0.120-1530455179314 Cluster ID: CID-01f8ecdf-532b-4415-807e-90b4aa179e29 Layout version: -63 isUpgradeFinalized: true ===================================================== 18/07/01 22:28:59 INFO common.Storage: Storage directory /opt/hadoop-2.9.0/tmp/dfs/name has been successfully formatted. 18/07/01 22:28:59 INFO namenode.FSEditLog: Edit logging is async:true 18/07/01 22:29:00 INFO namenode.TransferFsImage: Opening connection to http://master:50070/imagetransfer?getimage=1&txid=0&storageInfo=-63:1875840257:1530455179314:CID-01f8ecdf-532b-4415-807e-90b4aa179e29&bootstrapstandby=true 18/07/01 22:29:00 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds 18/07/01 22:29:00 INFO namenode.TransferFsImage: Combined time for fsimage download and fsync to all disks took 0.02s. The fsimage download took 0.01s at 0.00 KB/s. Synchronous (fsync) write to disk of /opt/hadoop-2.9.0/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 took 0.00s. 18/07/01 22:29:00 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 322 bytes. 18/07/01 22:29:00 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at slave1/192.168.0.121 ************************************************************/
cd /opt/hadoop-2.9.0 bin/hdfs zkfc -formatZK [spark@master hadoop-2.9.0]$ cd /opt/hadoop-2.9.0 [spark@master hadoop-2.9.0]$ bin/hdfs zkfc -formatZK 18/07/01 22:31:25 INFO tools.DFSZKFailoverController: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DFSZKFailoverController STARTUP_MSG: host = master/192.168.0.120 STARTUP_MSG: args = [-formatZK] STARTUP_MSG: version = 2.9.0 STARTUP_MSG: classpath = /opt/hadoop-2.9.0/etc/hadoop:/opt/hadoop-2.9.0/share/hadoop/common/lib/nimbus-jose-jwt-3.9.jar:/opt/hadoop-2.9.0/share/hadoop/common/lib/java-xmlbuilder-0.4.jar.... STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 756ebc8394e473ac25feac05fa493f6d612e6c50; compiled by 'arsuresh' on 2017-11-13T23:15Z STARTUP_MSG: java = 1.8.0_171 ************************************************************/ 18/07/01 22:31:25 INFO tools.DFSZKFailoverController: registered UNIX signal handlers for [TERM, HUP, INT] 18/07/01 22:31:25 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at master/192.168.0.120:9000 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:host.name=master 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_171 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:java.home=/opt/jdk1.8.0_171/jre 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/hadoop-2.9.0/etc/hadoop:/opt/hadoop-2.9.0/share/hadoop/common/lib/nimbu..... 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/hadoop-2.9.0/lib/native 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-862.el7.x86_64 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:user.name=spark 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/spark 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Client environment:user.dir=/opt/hadoop-2.9.0 18/07/01 22:31:25 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave1:2181,slave2:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@6ca8564a 18/07/01 22:31:26 INFO zookeeper.ClientCnxn: Opening socket connection to server master/192.168.0.120:2181. Will not attempt to authenticate using SASL (unknown error) 18/07/01 22:31:26 INFO zookeeper.ClientCnxn: Socket connection established to master/192.168.0.120:2181, initiating session 18/07/01 22:31:26 INFO zookeeper.ClientCnxn: Session establishment complete on server master/192.168.0.120:2181, sessionid = 0x100006501250000, negotiated timeout = 5000 18/07/01 22:31:26 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/HA in ZK. 18/07/01 22:31:26 INFO zookeeper.ZooKeeper: Session: 0x100006501250000 closed 18/07/01 22:31:26 WARN ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x100006501250000 18/07/01 22:31:26 INFO zookeeper.ClientCnxn: EventThread shut down 18/07/01 22:31:26 INFO tools.DFSZKFailoverController: SHUTDOWN_MSG: /************************************************************
cd /opt/zookeeper-3.4.12/bin ./zkServer.sh start
cd /opt/hadoop-2.9.0 sbin/start-dfs.sh sbin/start-yarn.sh [spark@master sbin]$ cd /opt/hadoop-2.9.0 [spark@master hadoop-2.9.0]$ sbin/start-dfs.sh sbin/start-yarn.shStarting namenodes on [master slave1] slave1: starting namenode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-namenode-slave1.out master: namenode running as process 1939. Stop it first. slave3: starting datanode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-datanode-slave3.out slave2: starting datanode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-datanode-slave2.out slave1: starting datanode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-datanode-slave1.out Starting journal nodes [master slave1 slave2] slave2: journalnode running as process 2044. Stop it first. # 前邊咱們已經啓動過了,是手動啓動的。 master: journalnode running as process 1757. Stop it first. # 前邊咱們已經啓動過了,是手動啓動的。 slave1: journalnode running as process 2003. Stop it first. # 前邊咱們已經啓動過了,是手動啓動的。 Starting ZK Failover Controllers on NN hosts [master slave1] master: starting zkfc, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-zkfc-master.out slave1: starting zkfc, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-zkfc-slave1.out [spark@master hadoop-2.9.0]$ sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /opt/hadoop-2.9.0/logs/yarn-spark-resourcemanager-master.out slave1: starting nodemanager, logging to /opt/hadoop-2.9.0/logs/yarn-spark-nodemanager-slave1.out slave2: starting nodemanager, logging to /opt/hadoop-2.9.0/logs/yarn-spark-nodemanager-slave2.out slave3: starting nodemanager, logging to /opt/hadoop-2.9.0/logs/yarn-spark-nodemanager-slave3.out [spark@master hadoop-2.9.0]$ jps 2466 DFSZKFailoverController 1939 NameNode 2567 ResourceManager 2839 Jps 1757 JournalNode 1662 QuorumPeerMain [spark@master hadoop-2.9.0]$
3,slave1 上的resourcemanager須要手動啓動
cd /opt/hadoop-2.9.0 sbin/yarn-deamon.sh start resourcemanager [spark@slave1 hadoop-2.9.0]$ jps 2003 JournalNode 2292 DataNode 2501 NodeManager 2613 Jps 2424 DFSZKFailoverController 1931 QuorumPeerMain 2191 NameNode [spark@slave1 sbin]$ cd /opt/hadoop-2.9.0 [spark@slave1 hadoop-2.9.0]$ cd sbin/ [spark@slave1 sbin]$ ls distribute-exclude.sh hadoop-daemons.sh httpfs.sh refresh-namenodes.sh start-all.sh start-dfs.sh start-yarn.sh stop-balancer.sh stop-secure-dns.sh yarn-daemon.sh FederationStateStore hdfs-config.cmd kms.sh slaves.sh start-balancer.sh start-secure-dns.sh stop-all.cmd stop-dfs.cmd stop-yarn.cmd yarn-daemons.sh hadoop-daemon.sh hdfs-config.sh mr-jobhistory-daemon.sh start-all.cmd start-dfs.cmd start-yarn.cmd stop-all.sh stop-dfs.sh stop-yarn.sh [spark@slave1 sbin]$ ./yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /opt/hadoop-2.9.0/logs/yarn-spark-resourcemanager-slave1.out [spark@slave1 sbin]$ jps 2689 ResourceManager 2003 JournalNode 2292 DataNode 2740 Jps 2501 NodeManager 2424 DFSZKFailoverController 1931 QuorumPeerMain 2191 NameNode [spark@slave1 sbin]$
此時slave2
[spark@slave2 HA]$ jps 2208 DataNode 2470 Jps 2345 NodeManager 1978 QuorumPeerMain 2044 JournalNode [spark@slave2 HA]$
此時slave3
[spark@slave3 hadoop-2.9.0]$ jps 2247 Jps 2123 NodeManager 2013 DataNode [spark@slave3 hadoop-2.9.0]$
至此,高可用hadoop配置完畢,可訪問瀏覽器:
http://192.168.0.120:50070 //namenode active http://192.168.0.121:50070 //namenode standby http://192.168.0.120:8088 //resourcemanager active http://192.168.0.121:8088 //resourcemanager standby
此時 http://192.168.0.120:50070/能夠訪問(顯示爲: 'master:9000' (active)),http://192.168.0.121:50070/能夠訪問(顯示爲: 'slave1:9000' (standby))
此時 http://192.168.0.121:8088/ 不能夠訪問,而http://192.168.0.120:8088/ 能夠訪問。
關閉的時候按相反的順序關閉
首先kill掉 master 上active的namenode
kill -9 <pid of NN>
[spark@master hadoop-2.9.0]$ jps 2466 DFSZKFailoverController 1939 NameNode 2567 ResourceManager 1757 JournalNode 1662 QuorumPeerMain 2910 Jps [spark@master hadoop-2.9.0]$ kill -9 1939 [spark@master hadoop-2.9.0]$ jps 2466 DFSZKFailoverController 2567 ResourceManager 2920 Jps 1757 JournalNode 1662 QuorumPeerMain [spark@master hadoop-2.9.0]$
而後查看 slave1 上狀態爲standby的namenode狀態變成active。
此時 slave1 jps
[spark@slave1 sbin]$ jps 2689 ResourceManager 2881 Jps 2003 JournalNode 2292 DataNode 2501 NodeManager 2424 DFSZKFailoverController 1931 QuorumPeerMain 2191 NameNode [spark@slave1 sbin]$
此時 http://192.168.0.120:50070/ 不能夠訪問,http://192.168.0.121:50070/能夠訪問(顯示爲: 'slave1:9000' (active))
此時 http://192.168.0.121:8088/ 不能夠訪問,而http://192.168.0.120:8088/ 能夠訪問。
同上,kill掉active狀態的rm後,standby狀態下的rm即變成active,繼續工做。
[spark@master hadoop-2.9.0]$ jps 2466 DFSZKFailoverController 2948 Jps 2567 ResourceManager 1757 JournalNode 1662 QuorumPeerMain [spark@master hadoop-2.9.0]$ kill -9 2567 [spark@master hadoop-2.9.0]$ jps 2466 DFSZKFailoverController 1757 JournalNode 1662 QuorumPeerMain 2958 Jps
此時 http://192.168.0.121:8088/ 能夠訪問,而http://192.168.0.120:8088/ 不能夠訪問。
http://192.168.0.120:50070/dfshealth.html#tab-overview
http://192.168.0.121:50070/dfshealth.html#tab-overview
http://192.168.0.120:8088/cluster/apps/RUNNING
http://192.168.0.121:8088/cluster/apps/ACCEPTED
參考《https://www.cnblogs.com/jiangzhengjun/p/6347916.html》
《https://blog.csdn.net/qq_32166627/article/details/51553394》