先配置免登陸html
先準備了7個節點java
192.168.101.172 node1 192.168.101.206 node2 192.168.101.207 node3 192.168.101.215 node4 192.168.101.216 node5 192.168.101.217 node6 192.168.101.218 node7
每臺機器都修改hosts文件以下:node
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.101.172 node1
192.168.101.206 node2
192.168.101.207 node3
192.168.101.215 node4
192.168.101.216 node5
192.168.101.217 node6
192.168.101.218 node7linux
修改主機名:git
vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=node2
關防火牆:github
centos6:web
service iptables stop算法
centos7:shell
systemctl stop firewalld
systemctl disable firewalldexpress
建立用戶:
groupadd hadoop useradd hadoop -g hadoop mkdir /main/bigdata chown -R hadoop:hadoop /main/bigdata/ passwd hadoop
時間同步centos6:
yum -y installntp ntpdate cn.pool.ntp.org
開始配置免登陸:
免密的核心思想就是:若是B服務器authorized_keys有A服務器的公鑰(鎖),那麼A服務器能夠免密登陸B服務器。
先到7臺機器分別生成密鑰:
# 安裝ssh
yum -y install openssh-clients
su hadoop
rm -rf ~/.ssh/*
ssh-keygen -t rsa # 一路回車,而後拷貝到受權文件 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
接着要讓一臺機器能夠訪問其餘的全部機器:
參考了 https://blog.csdn.net/ITYang_/article/details/70144395
先在全部6臺機器上執行如家命令,把各個機器的密鑰發送到node1(172)機器的authorized_keys中去:
#centos6
ssh-copy-id "-p 2121 192.168.101.172"
#centos7
ssh-copy-id -p 2121 hadoop@node1
# 或更詳細的: 當上一步是dsa的時候,這裏必需要指定算法是id_dsa.pub
ssh-copy-id -i ~/.ssh/id_dsa.pub "-p 2121 192.168.101.172"
把咱們的公鑰都傳遞給172這個節點. 在172(node1)上執行:
cat .ssh/authorized_keys
會看到別的機器節點密鑰都傳上來了
測試:
此時在別的節點上執行
ssh node1 -p 2121
發現能夠免密碼進入node1 !
下一步是把咱們辛苦收集到node1機器的.ssh/authorized_keys文件分發到各個節點上去:
在node1的機器上執行一個shell:
yum install expect
而後:
#!/bin/bash SERVERS="192.168.101.172 192.168.101.206 192.168.101.207 192.168.101.215 192.168.101.216 192.168.101.217 192.168.101.218" PASSWORD=機器密碼
# 將當前機器的密鑰copy到其餘的節點上,爲了保證下一步scp的時候能夠免密登陸 auto_ssh_copy_id() { expect -c "set timeout -1;
# 若是缺省端口的話 能夠直接spawn ssh-copy-id $1; spawn ssh-copy-id \" -p 2121 $1 \"; expect { *(yes/no)* {send -- yes\r;exp_continue;} *assword:* {send -- $2\r;exp_continue;} eof {exit 0;} }"; }
# 循環全部的機器,開始copy ssh_copy_id_to_all() { for SERVER in $SERVERS do auto_ssh_copy_id $SERVER $PASSWORD done }
# 調用上面的方法 ssh_copy_id_to_all
# 循環全部的機器ip,scp把當前機器的密鑰全都copy到別的機器上去(端口2121,若是缺省的話能夠不填寫) 用戶名hadoop根據實際狀況更換 for SERVER in $SERVERS do scp -P 2121 ~/.ssh/authorized_keys hadoop@$SERVER:~/.ssh/ done
centos7的 copy-id命令語法變了,因此shell變爲:
#!/bin/bash SERVERS="node1 node2 node3 node4 node5 node6 node7 node8" USERNAME=hadoop PASSWORD=機器密碼 # 將當前機器的密鑰copy到其餘的節點上,爲了保證下一步scp的時候能夠免密登陸 auto_ssh_copy_id() { expect -c "set timeout -1; # 若是缺省端口的話 能夠直接spawn ssh-copy-id $1; spawn ssh-copy-id -p 2121 $2@$1 ; expect { *(yes/no)* {send -- yes\r;exp_continue;} *assword:* {send -- $3\r;exp_continue;} eof {exit 0;} }"; } # 循環全部的機器,開始copy ssh_copy_id_to_all() { for SERVER in $SERVERS do auto_ssh_copy_id $SERVER $USERNAME $PASSWORD done } # 調用上面的方法 ssh_copy_id_to_all # 循環全部的機器ip,scp把當前機器的密鑰全都copy到別的機器上去(端口2121,若是缺省的話能夠不填寫) 用戶名hadoop根據實際狀況更換 for SERVER in $SERVERS do scp -P 2121 ~/.ssh/authorized_keys $USERNAME@$SERVER:~/.ssh/ done
可是實際狀況在使用非root用戶的時候發現了很奇怪的現象:
centos6小插曲:
node1收集到別的6個節點的密鑰以後,反向ssh-copy-id回去以後,沒法ssh免密碼登陸對方,scp的時候仍然須要輸入密碼....
可是別的節點刪掉本地的~/.ssh/authorized_keys 以後就能夠了.....
結果我就在其餘6個節點執行了rm ~/.ssh/authorized_keys以後,再在node1運行這個shell就能夠不用輸入密碼了 ....
root用戶沒有這個問題...
好奇怪. 有讀者解決了以後能夠和我溝通下,共同窗習.
在這臺node1機器分別執行ssh nodeX -p xxxx 以後選yes,會緩存knowhost
而後scp這個文件到其餘的機器上去:
#!/bin/bash SERVERS="node1 node2 node3 node4 node5 node6 node7 node8" for SERVER in $SERVERS do scp -P 2121 ~/.ssh/known_hosts hadoop@$SERVER:~/.ssh/ done
這樣其餘的機器就不用首次輸入yes了
至此,機器面密碼登陸搞定.
安裝Java8:
cd /main tar -zvxf soft/jdk-8u171-linux-x64.tar.gz -C . vim /etc/profile # 最後加上 export JAVA_HOME=/main/jdk1.8.0_171 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=$JAVA_HOME/lib export PATH=:$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
rm -rf /usr/bin/java* # 當即生效 source /etc/profile
zookeeper安裝配置省略 ,基本就是解壓 改路徑,看心情改端口,設置myid就ok
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/main/zookeeper/data dataLogDir=/main/zookeeper/logs # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 server.1=192.168.101.173:2888:3888 server.2=192.168.101.183:2888:3888 server.3=192.168.101.193:2888:3888
安裝hadoop:
安裝5個節點
準備在node1-2五臺機器安裝,其中node1/node2做爲master, node3-5做爲slave
在node1的機器上 進行配置:
cd /main tar -zvxf soft/hadoop-2.8.4.tar.gz -C . chown -R hadoop:hadoop /main/hadoop-2.8.4/ su hadoop mkdir /main/hadoop-2.8.4/data mkdir /main/hadoop-2.8.4/data/journal mkdir /main/hadoop-2.8.4/data/tmp mkdir /main/hadoop-2.8.4/data/hdfs mkdir /main/hadoop-2.8.4/data/hdfs/namenode mkdir /main/hadoop-2.8.4/data/hdfs/datanode
繼續修改/main/hadoop-2.8.4/etc/hadoop/下的:
hadoop-env.sh
mapred-env.sh
yarn-env.sh
三個文件設置java_home
export JAVA_HOME=/main/jdk1.8.0_171
hadoop-env.sh最後一行加上ssh的端口
export HADOOP_SSH_OPTS="-p 2121"
修改/main/hadoop-2.8.4/etc/hadoop/slaves文件
node3
node4
node5
修改/main/hadoop-2.8.4/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- fs.defaultFS須要設置成HDFS的邏輯服務名(需與hdfs-site.xml中的dfs.nameservices一致) --> <property> <name>fs.defaultFS</name> <value>hdfs://hjbdfs</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/main/hadoop-2.8.4/data/tmp</value> </property> <property> <name>hadoop.http.staticuser.user</name> <value>hadoop</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>192.168.101.173:2181,192.168.101.173:2181,192.168.101.173:2181</value> </property> </configuration>
更多參數詳見: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
修改 hdfs-site.xml:
注意故障切換sshfence節點的端口若是ssh不是默認的22端口,須要設置爲sshfence([[username][:port]])好比 sshfence(hadoop:2121) 不然active的namenode掛了以後,sshfence沒法進去到另一臺機器去,致使沒法自動切換主備.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <!-- HDFS NN的邏輯名稱,使用上面crore中fs.defaultFS設置的hjbdfs --> <property> <name>dfs.nameservices</name> <value>hjbdfs</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <!-- 給定服務邏輯名稱myhdfs的節點列表 --> <property> <name>dfs.ha.namenodes.hjbdfs</name> <value>nn1,nn2</value> </property> <!-- nn1的RPC通訊地址,nn1所在地址 --> <property> <name>dfs.namenode.rpc-address.hjbdfs.nn1</name> <value>node1:8020</value> </property> <!-- nn1的http通訊地址,外部訪問地址 --> <property> <name>dfs.namenode.http-address.hjbdfs.nn1</name> <value>node1:50070</value> </property> <!-- nn2的RPC通訊地址,nn2所在地址 --> <property> <name>dfs.namenode.rpc-address.hjbdfs.nn2</name> <value>node2:8020</value> </property> <!-- nn2的http通訊地址,外部訪問地址 --> <property> <name>dfs.namenode.http-address.hjbdfs.nn2</name> <value>node2:50070</value> </property> <!-- 指定NameNode的元數據在JournalNode日誌上的存放位置(通常和zookeeper部署在一塊兒) --> <!-- 設置一組 journalNode 的 URI 地址,active NN 將 edit log 寫入這些JournalNode,而 standby NameNode 讀取這些 edit log,並做用在內存中的目錄樹中。若是journalNode有多個節點則使用分號分割。該屬性值應符合如下格式qjournal://host1:port1;host2:port2;host3:port3/journalId --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node3:8485;node4:8485;node5:8485/hjbdf_journal</value> </property> <!-- 指定JournalNode在本地磁盤存放數據的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/main/hadoop-2.8.4/data/journal</value> </property> <!--指定cluster1出故障時,哪一個實現類負責執行故障切換 --> <property> <name>dfs.client.failover.proxy.provider.hjbdfs</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!--解決HA集羣腦裂問題(即出現兩個 master 同時對外提供服務,致使系統處於不一致狀態)。在 HDFS HA中,JournalNode 只容許一個 NameNode 寫數據,不會出現兩個 active NameNode 的問題.這是配置自動切換的方法,有多種使用方法,具體能夠看官網,在文末會給地址,這裏是遠程登陸殺死的方法 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence(hadoop:2121)</value> <description>how to communicate in the switch process</description> </property> <!-- 這個是使用sshfence隔離機制時才須要配置ssh免登錄 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔離機制超時時間,這個屬性同上,若是你是用腳本的方法切換,這個應該是能夠不配置的 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <!-- 這個是開啓自動故障轉移,若是你沒有自動故障轉移,這個能夠先不配 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/main/hadoop-2.8.4/data/hdfs/datanode</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/main/hadoop-2.8.4/data/hdfs/namenode</value> </property> </configuration>
參考 http://ju.outofmemory.cn/entry/95494
https://www.cnblogs.com/meiyuanbao/p/3545929.html
官方參數: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
修改mapred-site.xml.template名稱爲mapred-site.xml並修改:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>node1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>node1:19888</value> </property> </configuration>
配置yarn-site.xml:
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- Site specific YARN configuration properties --> <!--啓用resourcemanager ha--> <!--是否開啓RM ha,默認是開啓的--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!--聲明兩臺resourcemanager的地址--> <property> <name>yarn.resourcemanager.cluster-id</name> <value>rmcluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>node1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>node2</value> </property> <!--指定zookeeper集羣的地址--> <property> <name>yarn.resourcemanager.zk-address</name> <value>192.168.101.173:2181,192.168.101.173:2181,192.168.101.173:2181</value> </property> <!--啓用自動恢復,當任務進行一半,rm壞掉,就要啓動自動恢復,默認是false--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--指定resourcemanager的狀態信息存儲在zookeeper集羣,默認是存放在FileSystem裏面。--> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> </configuration>
以上基礎的都配置完了,打包:
tar -zcvf hadoop-2.8.4.ready.tar.gz /main/hadoop-2.8.4
而後用scp把包拷貝到其餘4個節點
scp -P 2121 hadoop-2.8.4.ready.tar.gz hadoop@node2:/main
去另外4個節點解壓 重命名文件夾爲/main/hadoop-2.8.4便可
而後繼續配置node1 node2這兩個master
分別在node1和node2的yarn-site.xml上添加yarn.resourcemanager.ha.id :
相似與zookeeper的myid
<property> <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> </property>
<property> <name>yarn.resourcemanager.ha.id</name> <value>rm2</value> </property>
啓動journal:
三個slave節點啓動journal
[hadoop@node5 hadoop-2.8.4]$ /main/hadoop-2.8.4/sbin/hadoop-daemon.sh start journalnode starting journalnode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-journalnode-node5.out [hadoop@node5 hadoop-2.8.4]$ jps 2272 Jps 2219 JournalNode
啓動namenode:
格式化一個master的namenode: journal要起來才能格式化
不能再次格式化或者在另外的節點再次格式化,不然會致使nn和dn的namespaceID不一致而報錯!!!
/main/hadoop-2.8.4/bin/hdfs namenode -format
啓動 一個 namenode:
[hadoop@node1 hadoop]$ /main/hadoop-2.8.4/sbin/hadoop-daemon.sh start namenode starting namenode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-namenode-node1.out [hadoop@node1 hadoop]$ jps 7536 Jps 7457 NameNode
[hadoop@node1 hadoop]$ ps -ef|grep namenode hadoop 7457 1 10 09:20 pts/4 00:00:08 /main/jdk1.8.0_171/bin/java -Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/main/hadoop-2.8.4/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/main/hadoop-2.8.4 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/main/hadoop-2.8.4/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/main/hadoop-2.8.4/logs -Dhadoop.log.file=hadoop-hadoop-namenode-node1.log -Dhadoop.home.dir=/main/hadoop-2.8.4 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/main/hadoop-2.8.4/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode
此時應該能夠訪問node1的hdfs界面,狀態爲standby:
http://192.168.101.172:50070/
而後另一個master先同步上一個master namenode的元數據信息再啓動:
主要是爲了同步data/hdfs/namenode/這些信息(包括namespaceID),不然兩個節點不一致 會報錯
/main/hadoop-2.8.4/bin/hdfs namenode -bootstrapStandby
會有成功下在到文件的日誌:
18/06/28 11:01:56 WARN common.Util: Path /main/hadoop-2.8.4/data/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration. 18/06/28 11:01:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ===================================================== About to bootstrap Standby ID nn2 from: Nameservice ID: hjbdfs Other Namenode ID: nn1 Other NN's HTTP address: http://node1:50070 Other NN's IPC address: node1/192.168.101.172:8020 Namespace ID: 675265321 Block pool ID: BP-237410497-192.168.101.172-1530153904905 Cluster ID: CID-604da42a-d0a8-403b-b073-68c857c9b772 Layout version: -63 isUpgradeFinalized: true ===================================================== Re-format filesystem in Storage Directory /main/hadoop-2.8.4/data/hdfs/namenode ? (Y or N) Y 18/06/28 11:02:06 INFO common.Storage: Storage directory /main/hadoop-2.8.4/data/hdfs/namenode has been successfully formatted. 18/06/28 11:02:06 WARN common.Util: Path /main/hadoop-2.8.4/data/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration. 18/06/28 11:02:06 WARN common.Util: Path /main/hadoop-2.8.4/data/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration. 18/06/28 11:02:06 INFO namenode.FSEditLog: Edit logging is async:true 18/06/28 11:02:06 INFO namenode.TransferFsImage: Opening connection to http://node1:50070/imagetransfer?getimage=1&txid=0&storageInfo=-63:675265321:1530153904905:CID-604da42a-d0a8-403b-b073-68c857c9b772&bootstrapstandby=true 18/06/28 11:02:06 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds 18/06/28 11:02:06 INFO namenode.TransferFsImage: Transfer took 0.01s at 0.00 KB/s 18/06/28 11:02:06 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 323 bytes. 18/06/28 11:02:06 INFO util.ExitUtil: Exiting with status 0 18/06/28 11:02:06 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at node2/192.168.101.206 ************************************************************/
接着啓動:
/main/hadoop-2.8.4/sbin/hadoop-daemon.sh start namenode
web界面的狀態一樣也是standby
手動強制讓一個節點變爲active的主節點,有問題,會拋出EOFExcption,致使namenode節點掛掉:
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hdfs haadmin -transitionToActive nn1 --forcemanual
使用zookeeper自動接管namenode:
先把整個集羣關閉,zookeeper不關,輸入bin/hdfs zkfc -formatZK,格式化ZKFC
關閉
/main/hadoop-2.8.4/sbin/stop-dfs.sh
啓動
/main/hadoop-2.8.4/bin/hdfs zkfc -formatZK
執行完畢以後zookeepr多出來一個hadoop-ha的節點:
啓動整個集羣:
/main/hadoop-2.8.4/sbin/start-dfs.sh
能夠看到這個命令前後啓動了namenode datanode journalnode 並啓動了zkfc,進行了註冊:
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/sbin/start-dfs.sh 18/06/28 11:57:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [node1 node2] node1: starting namenode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-namenode-node1.out node2: starting namenode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-namenode-node2.out node3: starting datanode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-datanode-node3.out node4: starting datanode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-datanode-node4.out node5: starting datanode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-datanode-node5.out Starting journal nodes [node3 node4 node5] node3: starting journalnode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-journalnode-node3.out node4: starting journalnode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-journalnode-node4.out node5: starting journalnode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-journalnode-node5.out 18/06/28 11:58:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting ZK Failover Controllers on NN hosts [node1 node2] node1: starting zkfc, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-zkfc-node1.out node2: starting zkfc, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-zkfc-node2.out
能夠看到zookeeper上已經註冊好了服務:
zookeeper上兩個節點的內容:
[zk: localhost:2181(CONNECTED) 10] get /hadoop-ha/hjbdfs/ActiveBreadCrumb hjbdfsnn2node2 �>(�> cZxid = 0x30000000a ctime = Thu Jun 28 11:49:36 CST 2018 mZxid = 0x30000000a mtime = Thu Jun 28 11:49:36 CST 2018 pZxid = 0x30000000a cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 26 numChildren = 0
[zk: localhost:2181(CONNECTED) 16] get /hadoop-ha/hjbdfs/ActiveStandbyElectorLock hjbdfsnn1node1 �>(�> cZxid = 0x3000001a7 ctime = Thu Jun 28 11:54:13 CST 2018 mZxid = 0x3000001a7 mtime = Thu Jun 28 11:54:13 CST 2018 pZxid = 0x3000001a7 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x163f1ef62ad008c dataLength = 26 numChildren = 0
此時的NameNode:
[hadoop@node1 hadoop-2.8.4]$ jps 14885 NameNode 15191 DFSZKFailoverController 15321 Jps
[hadoop@node2 hadoop-2.8.4]$ jps 18850 NameNode 19059 Jps 18952 DFSZKFailoverController
3個DataNode都同樣:
[hadoop@node3 hadoop-2.8.4]$ jps 5409 DataNode 5586 Jps 5507 JournalNode
此時能夠發現一個節點是active的,一個節點是standby的.
初步判斷安裝成功,接下來是測試:
後來重啓節點以後,active的節點是node1,開始zookeeper對NameNode的自動故障切換測試:
kill掉主節點Node1的NN進程:
[hadoop@node1 hadoop-2.8.4]$ jps 18850 NameNode 18952 DFSZKFailoverController 19103 Jps
能夠看到故障切換的hadoop-hadoop-zkfc-node2.log日誌:
2018-06-28 16:07:40,181 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ====== 2018-06-28 16:07:40,181 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(hadoop:2121) 2018-06-28 16:07:40,220 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to node1... 2018-06-28 16:07:40,222 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to node1 port 2121 2018-06-28 16:07:40,229 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established 2018-06-28 16:07:40,236 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_5.3 2018-06-28 16:07:40,236 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.54 2018-06-28 16:07:40,236 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256 2018-06-28 16:07:40,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckKexes: diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521 2018-06-28 16:07:40,725 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckSignatures: ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521 2018-06-28 16:07:40,729 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent 2018-06-28 16:07:40,729 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: ssh-rsa,ssh-dss 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: none,zlib@openssh.com 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: none,zlib@openssh.com 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: none 2018-06-28 16:07:40,732 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: none 2018-06-28 16:07:40,732 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: 2018-06-28 16:07:40,732 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: 2018-06-28 16:07:40,732 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-md5 none 2018-06-28 16:07:40,732 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-md5 none 2018-06-28 16:07:40,769 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent 2018-06-28 16:07:40,770 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY 2018-06-28 16:07:40,804 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true 2018-06-28 16:07:40,811 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'node1' (RSA) to the list of known hosts. 2018-06-28 16:07:40,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent 2018-06-28 16:07:40,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received 2018-06-28 16:07:40,817 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent 2018-06-28 16:07:40,818 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received 2018-06-28 16:07:40,820 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password 2018-06-28 16:07:40,820 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic 2018-06-28 16:07:40,824 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password 2018-06-28 16:07:40,824 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey 2018-06-28 16:07:40,902 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentication succeeded (publickey). 2018-06-28 16:07:40,902 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to node1 2018-06-28 16:07:40,902 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 8020 2018-06-28 16:07:40,950 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Indeterminate response from trying to kill service. Verifying whether it is running using nc... 2018-06-28 16:07:40,966 WARN org.apache.hadoop.ha.SshFenceByTcpPort: nc -z node1 8020 via ssh: bash: nc: command not found 2018-06-28 16:07:40,968 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Verified that the service is down. 2018-06-28 16:07:40,968 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from node1 port 2121 2018-06-28 16:07:40,972 INFO org.apache.hadoop.ha.NodeFencer: ====== Fencing successful by method org.apache.hadoop.ha.SshFenceByTcpPort(hadoop:2121) ====== 2018-06-28 16:07:40,972 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/hjbdfs/ActiveBreadCrumb to indicate that the local node is the most recent active... 2018-06-28 16:07:40,973 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed 2018-06-28 16:07:40,979 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at node2/192.168.101.206:8020 active... 2018-06-28 16:07:41,805 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at node2/192.168.101.206:8020 to active state
啓動Yarn,測試resourcemanager ha ,node1輸入:
[hadoop@node1 sbin]$ /main/hadoop-2.8.4/sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /main/hadoop-2.8.4/logs/yarn-hadoop-resourcemanager-node1.out node3: starting nodemanager, logging to /main/hadoop-2.8.4/logs/yarn-hadoop-nodemanager-node3.out node4: starting nodemanager, logging to /main/hadoop-2.8.4/logs/yarn-hadoop-nodemanager-node4.out node5: starting nodemanager, logging to /main/hadoop-2.8.4/logs/yarn-hadoop-nodemanager-node5.out
node2啓動resourcemanager:
[hadoop@node2 sbin]$ /main/hadoop-2.8.4/sbin/yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /main/hadoop-2.8.4/logs/yarn-hadoop-resourcemanager-node2.out
瀏覽器敲node2的resourcemanager地址http://192.168.101.206:8088 會自動跳轉到 http://node1:8088 也就是http://192.168.101.172:8088/cluster
測試HDFS文件上傳/下載/刪除:
新建了一個 word_i_have_a_dream.txt 裏面放的馬丁路德金的英文演講稿.
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -put word_i_have_a_dream.txt /word.txt 18/06/28 16:39:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls /word.txt 18/06/28 16:39:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -rw-r--r-- 3 hadoop supergroup 4805 2018-06-28 16:39 /word.txt
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -rm /word.txt 18/06/28 16:39:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Deleted /word.txt
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls /word.txt 18/06/28 16:40:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ls: `/word.txt': No such file or directory
下載是get命令,更多見官網或其餘博客如https://www.cnblogs.com/lzfhope/p/6952869.html
運行經典wordcount測試:
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop jar /main/hadoop-2.8.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar wordcount /word.txt /wordoutput 18/06/28 16:50:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/06/28 16:50:36 INFO input.FileInputFormat: Total input files to process : 1 18/06/28 16:50:36 INFO mapreduce.JobSubmitter: number of splits:1 18/06/28 16:50:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1530173899165_0001 18/06/28 16:50:37 INFO impl.YarnClientImpl: Submitted application application_1530173899165_0001 18/06/28 16:50:37 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1530173899165_0001/ 18/06/28 16:50:37 INFO mapreduce.Job: Running job: job_1530173899165_0001 18/06/28 16:50:49 INFO mapreduce.Job: Job job_1530173899165_0001 running in uber mode : false 18/06/28 16:50:49 INFO mapreduce.Job: map 0% reduce 0% 18/06/28 16:50:58 INFO mapreduce.Job: map 100% reduce 0% 18/06/28 16:51:07 INFO mapreduce.Job: map 100% reduce 100% 18/06/28 16:51:08 INFO mapreduce.Job: Job job_1530173899165_0001 completed successfully 18/06/28 16:51:08 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=4659 FILE: Number of bytes written=330837 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4892 HDFS: Number of bytes written=3231 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=6187 Total time spent by all reduces in occupied slots (ms)=6411 Total time spent by all map tasks (ms)=6187 Total time spent by all reduce tasks (ms)=6411 Total vcore-milliseconds taken by all map tasks=6187 Total vcore-milliseconds taken by all reduce tasks=6411 Total megabyte-milliseconds taken by all map tasks=6335488 Total megabyte-milliseconds taken by all reduce tasks=6564864 Map-Reduce Framework Map input records=32 Map output records=874 Map output bytes=8256 Map output materialized bytes=4659 Input split bytes=87 Combine input records=874 Combine output records=359 Reduce input groups=359 Reduce shuffle bytes=4659 Reduce input records=359 Reduce output records=359 Spilled Records=718 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=194 CPU time spent (ms)=1860 Physical memory (bytes) snapshot=444248064 Virtual memory (bytes) snapshot=4178436096 Total committed heap usage (bytes)=317718528 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=4805 File Output Format Counters Bytes Written=3231
能夠看到HDFS上/wordoutput下生成了輸出文件:
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls / 18/06/28 16:52:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items drwx------ - hadoop supergroup 0 2018-06-28 16:50 /tmp -rw-r--r-- 3 hadoop supergroup 4805 2018-06-28 16:43 /word.txt drwxr-xr-x - hadoop supergroup 0 2018-06-28 16:51 /wordoutput
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls /wordoutput 18/06/28 16:52:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 3 hadoop supergroup 0 2018-06-28 16:51 /wordoutput/_SUCCESS -rw-r--r-- 3 hadoop supergroup 3231 2018-06-28 16:51 /wordoutput/part-r-00000
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls /wordoutput/_SUCCESS 18/06/28 16:53:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -rw-r--r-- 3 hadoop supergroup 0 2018-06-28 16:51 /wordoutput/_SUCCESS
能夠看到_SUCCESS大小是0 因此內容在part-r-00000上,擼到本地看一眼:
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls /wordoutput/part-r-00000 18/06/28 16:54:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -rw-r--r-- 3 hadoop supergroup 3231 2018-06-28 16:51 /wordoutput/part-r-00000
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -get /wordoutput/part-r-00000 word_success.txt 18/06/28 16:54:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@node1 hadoop-2.8.4]$ vim word_success.txt
生成了咱們想要的結果
接下來試試kill掉NN master測試下HA:
注意:
若是報錯連接被拒絕,就要去確認以前另外一個master namenode自身的服務是否監聽正常,防火牆是否關閉,以及它的hosts文件與hdfs-site.xml文件中namenode的配置是否一致.
若是配置文件中是主機名,那麼每一個節點都要在hosts中映射全部集羣節點的ip,自身主機名要映射爲內網ip,不能映射爲127.0.0.1,不然會致使自身服務綁定在127.0.0.1:xxxx上,局域網其餘節點沒法連接!!!
若是遇到問題 須要從新格式化NameNode,須要清理全部節點的老信息,不然會由於老的DataNode節點的namespaceID不一樣而沒法正確啓動:
rm -rf /main/hadoop-2.8.4/data/journal/* rm -rf /main/hadoop-2.8.4/data/hdfs/namenode/* rm -rf /main/hadoop-2.8.4/data/hdfs/datanode/* rm -rf /main/hadoop-2.8.4/logs/*
注意故障切換sshfence節點的端口若是ssh不是默認的22端口,須要設置爲sshfence([[username][:port]])好比 sshfence(hadoop:2121) 不然active的namenode掛了以後,sshfence沒法進去到另一臺機器去,致使沒法自動切換主備.
安裝若遇到問題,在/main/hadoop-2.8.4/logs目錄下的各類*.log文件有詳細的日誌
HBASE安裝
HMaster沒有單點問題,HBase中能夠啓動多個HMaster,經過Zookeeper的Master Election機制保證總有一個Master運行。 因此這裏要配置HBase高可用的話,只須要啓動兩個HMaster,讓Zookeeper本身去選擇一個Master Acitve。
5臺機器都解壓到/main下:
tar -zvxf /main/soft/hbase-1.2.6.1-bin.tar.gz -C /main/
在Hadoop配置的基礎上,配置環境變量HBASE_HOME、hbase-env.sh
vim /etc/profile 設置以下:
#java 以前hadoop時候設置的 export JAVA_HOME=/main/jdk1.8.0_171 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=$JAVA_HOME/lib export PATH=:$PATH:$JAVA_HOME/bin:$JRE_HOME/bin #hbase export HBASE_HOME=/main/hbase-1.2.6.1 export PATH=$HBASE_HOME/bin:$PATH
HBASE的hbase-env.sh設置以下:
設置javahome 關閉Hbase自帶的zk 使用咱們本身安裝的zk 同時設置ssh的端口
export JAVA_HOME=/main/jdk1.8.0_171 export HBASE_MANAGES_ZK=false
export HBASE_SSH_OPTS="-p 2121"
設置hbase-site.xml
官網有配置文件說明:
http://abloz.com/hbase/book.html
https://hbase.apache.org/2.0/book.html#example_config
https://hbase.apache.org/2.0/book.html#config.files
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--命名空間--> <property> <name>hbase.rootdir</name>
<!--這裏要和hadoop的HDFS的servicename名字一致,不然會報錯!--> <value>hdfs://hjbdfs/hbase</value> <description>The directory shared by RegionServers.</description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.master.port</name> <value>16000</value> <description>The port the HBase Master should bind to.</description> </property> <property> <name>hbase.zookeeper.quorum</name> <value>192.168.101.173,192.168.101.183,192.168.101.193</value> <description>逗號分割的zk服務器地址 Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. </description> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> <description>Property from ZooKeeper's config zoo.cfg.The port at which the clients will connect.</description> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/main/zookeeper/data</value> <description>zk配置文件zoo.cfg中的data目錄地址 Property from ZooKeeper config zoo.cfg.The directory where the snapshot is stored.</description> </property> <property> <name>hbase.tmp.dir</name> <value>/main/hbase-1.2.6.1/hbase/tmp</value> </property> </configuration>
拷貝haddop的配置文件到HBASE目錄下,關聯兩者:
[hadoop@node1 hbase-1.2.6.1]$ cp /main/hadoop-2.8.4/etc/hadoop/core-site.xml /main/hbase-1.2.6.1/conf/ [hadoop@node1 hbase-1.2.6.1]$ cp /main/hadoop-2.8.4/etc/hadoop/hdfs-site.xml /main/hbase-1.2.6.1/conf/
vim regionservers
node3
node4
node5
官網說明:
regionservers: A plain-text file containing a list of hosts which should run a RegionServer in your HBase cluster. By default this file contains the single entry localhost
. It should contain a list of hostnames or IP addresses, one per line, and should only contain localhost
if each node in your cluster will run a RegionServer on its localhost
interface.
vim backup-masters
在node1的機器上寫node2 這樣的話,之後在node1上start集羣,就會把node2做爲備份的Master
同理能夠在node2上寫node1,這樣之後能夠同時在兩個節點進行操做
node2
官網說明:
backup-masters: Not present by default. A plain-text file which lists hosts on which the Master should start a backup Master process, one host per line.
啓動HMaster 任意一個master節點都可:
[hadoop@node2 bin]$
/main/hbase-1.2.6.1/bin
/start-hbase.sh starting master, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-master-node2.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node5: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node5.out node3: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node3.out node4: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node4.out node3: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node3: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node4: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node4: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node5: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node5: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node1: starting master, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-master-node1.out node1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
在第一個HMaster的節點上執行:
[hadoop@node1 hbase-1.2.6.1]$ /main/hbase-1.2.6.1/bin/start-hbase.sh starting master, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-master-node1.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node3: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node3.out node4: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node4.out node5: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node5.out node3: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node3: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node4: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node4: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node5: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node5: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
會根據配置文件自動去SSH到RegionServer進行啓動,而自身節點若是不是RegionServer,啓動完也不會有變化.
如今開始進入第二個HMaster節點,手動再啓動一個Hmaster
RegionServer上啓動先後的進程變化:
[hadoop@node3 hbase-1.2.6.1]$ jps 7460 DataNode 8247 Jps 7562 JournalNode 7660 NodeManager [hadoop@node3 hbase-1.2.6.1]$ vim conf/hbase-env.sh [hadoop@node3 hbase-1.2.6.1]$ jps 7460 DataNode 8408 Jps 7562 JournalNode 8300 HRegionServer 7660 NodeManager
從節點是node1:
主節點是node2 :
咱們去node2 kill掉進程,測試master的災備:
[hadoop@node2 bin]$ jps 3809 Jps 1412 NameNode 3607 HMaster 1529 DFSZKFailoverController [hadoop@node2 bin]$ kill 3607 [hadoop@node2 bin]$ jps 3891 Jps 1412 NameNode 1529 DFSZKFailoverController
node1成功變成了主節點:
能夠用./hbase-daemon.sh start master命令再次啓動掛掉的master
以上安裝完畢以後,node1 node2:
[hadoop@node1 hbase-1.2.6.1]$ jps 31458 NameNode 31779 DFSZKFailoverController 5768 Jps 5482 HMaster 31871 ResourceManager
node3-5:
[hadoop@node3 hbase-1.2.6.1]$ jps 9824 Jps 9616 HRegionServer 7460 DataNode 7562 JournalNode 7660 NodeManager
Spark的安裝:
在另一堆集羣上,也如上安裝了hadoop
192.168.210.114 node1 192.168.210.115 node2 192.168.210.116 node3 192.168.210.117 node4 192.168.210.134 node5 192.168.210.135 node6 192.168.210.136 node7 192.168.210.137 node8
spark安裝在1-5上 spark依賴於hadoop:
因此下載的時候須要注意http://spark.apache.org/downloads.html 選擇和hadoop兼容的版本:
同時須要下載其對應的scala版本: 去http://spark.apache.org/docs/latest/ 看看讀應的scala版本
因此咱們下載了scala 2.11.11 解壓到/main下設置好 PATH便可.
下載spark-2.3.1-bin-hadoop2.7.tgz解壓到/main後:
編輯 spark-env.sh,根據實際狀況設置:
#!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # This file is sourced when running various Spark programs. # Copy it as spark-env.sh and edit that to configure Spark for your site. # Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public dns name of the driver program # Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program # - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data # - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos # Options read in YARN client/cluster mode # - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf) # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN # - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1). # - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G) # - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G) # Options for the daemons used in the standalone deploy mode # - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master # - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y") # - SPARK_WORKER_CORES, to set the number of cores to use on this machine # - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g) # - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker # - SPARK_WORKER_DIR, to set the working directory of worker processes # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y") # - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g). # - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y") # - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y") # - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y") # - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons # - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers # Generic options for the daemons used in the standalone deploy mode # - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf) # - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs) # - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp) # - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER) # - SPARK_NICENESS The scheduling priority for daemons. (Default: 0) # - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file. # Options for native BLAS, like Intel MKL, OpenBLAS, and so on. # You might get better performance to enable these options if using native BLAS (see SPARK-21305). # - MKL_NUM_THREADS=1 Disable multi-threading of Intel MKL # - OPENBLAS_NUM_THREADS=1 Disable multi-threading of OpenBLAS export JAVA_HOME=/main/server/jdk1.8.0_11 export SCALA_HOME=/main/scala-2.11.11 export HADOOP_HOME=/main/hadoop-2.8.4 export HADOOP_CONF_DIR=/main/hadoop-2.8.4/etc/hadoop export SPARK_WORKER_MEMORY=4g export SPARK_EXECUTOR_MEMORY=4g export SPARK_DRIVER_MEMORY=4G export SPARK_WORKER_CORES=2 export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=192.168.210.38:2181,192.168.210.58:2181,192.168.210.78:2181 -Dspark.deploy.zookeeper.dir=/spark" export SPARK_SSH_OPTS="-p 2121"
vim slaves
node1
node2
node3
node4
node5
而後便可啓動,先啓動全部的slaves,而後手動啓動兩個master:
start-all會自動把本身做爲master啓動,而後再去啓動slaves文件中全部的worker
[hadoop@node1 conf]$ /main/spark-2.3.1-bin-hadoop2.7/sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out node2: starting org.apache.spark.deploy.worker.Worker, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.out node1: starting org.apache.spark.deploy.worker.Worker, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node1.out node3: starting org.apache.spark.deploy.worker.Worker, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node3.out node4: starting org.apache.spark.deploy.worker.Worker, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node4.out node5: starting org.apache.spark.deploy.worker.Worker, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node5.out
[hadoop@node1 conf]$ jps 11299 Master 11411 Worker 5864 NameNode 6184 DFSZKFailoverController 11802 Jps 6301 ResourceManager 6926 HMaster
其餘機器都有worker進程,而後爲了HA咱們再去node2啓動一個Spark的Master
[hadoop@node2 conf]$ jps 5536 Jps 2209 DFSZKFailoverController 2104 NameNode 2602 HMaster 5486 Worker [hadoop@node2 conf]$ /main/spark-2.3.1-bin-hadoop2.7/sbin/start-master.sh starting org.apache.spark.deploy.master.Master, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node2.out [hadoop@node2 conf]$ jps 5568 Master 2209 DFSZKFailoverController 2104 NameNode 2602 HMaster 5486 Worker 5631 Jps
去看看兩個master:
另外一個是空的:
Storm安裝:
先抄一張圖:
下載apache-storm-1.2.2.tar.gz 解壓到node6-8節點/main下,設置:
export STORM_HOME=/main/apache-storm-1.2.2
vim storm.yaml 以下,3臺相同:
storm.zookeeper.servers: - "192.168.210.38" - "192.168.210.58" - "192.168.210.78" storm.zookeeper.port: 2181 storm.local.dir: "/main/apache-storm-1.2.2/data" nimbus.seeds: ["192.168.210.135"] supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 storm.health.check.dir: "healthchecks" storm.health.check.timeout.ms: 5000
其中storm.local.dir指定的目錄須要提早建立,supervisor.slots.ports配置的端口數量決定了每臺supervisor機器的worker數量,每一個worker會有本身的監聽端口用於監放任務。
建立目錄:
mkdir /main/apache-storm-1.2.2/data
主控節點(能夠是多個,見nimbus.seeds: ["192.168.210.135"] 配置)
先啓動主控:
# 啓動主控 nohup /main/apache-storm-1.2.2/bin/storm nimbus & # 啓動主控 ui nohup /main/apache-storm-1.2.2/bin/storm ui & # 啓動supervisor nohup /main/apache-storm-1.2.2/bin/storm supervisor &
最後在另外兩臺機器啓動supervisor:
nohup /main/apache-storm-1.2.2/bin/storm supervisor &
主控節點的ui頁面能夠看到全部supervisor信息
而後從github下載storm源碼(或者從storm的apache主頁下載src的zip解壓),切換到對應的分支,如1.1.x或1.x或2.x. 目前2.x仍是快照版,咱們安裝的是1.2.2,切換分支到1.x以後,
特別是github下載的源碼,必定要執行一遍:
mvn clean install -DskipTests
由於github上的代碼可能更新,相關的jar並無發佈到maven倉庫,因此上面的命令就編譯了一堆storm依賴包到本地mavan倉庫.
ps:上述過程不要使用maven第三方鏡像,不然極可能會出錯!
咱們切換到1.x後發現最新的是storm-starter-1.2.3-SNAPSHOT版本,測試一下服務器的1.2.2是否兼容:
把storm-starter-1.2.3-SNAPSHOT.jar傳到服務器,運行一下demo試試:
[hadoop@node6 main]$ /main/apache-storm-1.2.2/bin/storm jar storm-starter-1.2.3-SNAPSHOT.jar org.apache.storm.starter.WordCountTopology word-count Running: /main/server/jdk1.8.0_11/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/main/apache-storm-1.2.2 -Dstorm.log.dir=/main/apache-storm-1.2.2/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /main/apache-storm-1.2.2/*:/main/apache-storm-1.2.2/lib/*:/main/apache-storm-1.2.2/extlib/*:storm-starter-1.2.3-SNAPSHOT.jar:/main/apache-storm-1.2.2/conf:/main/apache-storm-1.2.2/bin -Dstorm.jar=storm-starter-1.2.3-SNAPSHOT.jar -Dstorm.dependency.jars= -Dstorm.dependency.artifacts={} org.apache.storm.starter.WordCountTopology word-count 955 [main] WARN o.a.s.u.Utils - STORM-VERSION new 1.2.2 old null 991 [main] INFO o.a.s.StormSubmitter - Generated ZooKeeper secret payload for MD5-digest: -7683379793985025786:-5178094576454792625 1122 [main] INFO o.a.s.u.NimbusClient - Found leader nimbus : node6:6627 1182 [main] INFO o.a.s.s.a.AuthUtils - Got AutoCreds [] 1190 [main] INFO o.a.s.u.NimbusClient - Found leader nimbus : node6:6627 1250 [main] INFO o.a.s.StormSubmitter - Uploading dependencies - jars... 1251 [main] INFO o.a.s.StormSubmitter - Uploading dependencies - artifacts... 1251 [main] INFO o.a.s.StormSubmitter - Dependency Blob keys - jars : [] / artifacts : [] 1289 [main] INFO o.a.s.StormSubmitter - Uploading topology jar storm-starter-1.2.3-SNAPSHOT.jar to assigned location: /main/apache-storm-1.2.2/data/nimbus/inbox/stormjar-5ed677a5-9af1-4b5e-8467-83f637e00506.jar Start uploading file 'storm-starter-1.2.3-SNAPSHOT.jar' to '/main/apache-storm-1.2.2/data/nimbus/inbox/stormjar-5ed677a5-9af1-4b5e-8467-83f637e00506.jar' (106526828 bytes) [==================================================] 106526828 / 106526828 File 'storm-starter-1.2.3-SNAPSHOT.jar' uploaded to '/main/apache-storm-1.2.2/data/nimbus/inbox/stormjar-5ed677a5-9af1-4b5e-8467-83f637e00506.jar' (106526828 bytes) 2876 [main] INFO o.a.s.StormSubmitter - Successfully uploaded topology jar to assigned location: /main/apache-storm-1.2.2/data/nimbus/inbox/stormjar-5ed677a5-9af1-4b5e-8467-83f637e00506.jar 2876 [main] INFO o.a.s.StormSubmitter - Submitting topology word-count in distributed mode with conf {"storm.zookeeper.topology.auth.scheme":"digest","storm.zookeeper.topology.auth.payload":"-7683379793985025786:-5178094576454792625","topology.workers":3,"topology.debug":true} 2876 [main] WARN o.a.s.u.Utils - STORM-VERSION new 1.2.2 old 1.2.2 4091 [main] INFO o.a.s.StormSubmitter - Finished submitting topology: word-count
顯示提交人物完畢,接下來看看UI:
也能夠中止這個Topology:
[hadoop@node6 main]$ /main/apache-storm-1.2.2/bin/storm kill word-count Running: /main/server/jdk1.8.0_11/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/main/apache-storm-1.2.2 -Dstorm.log.dir=/main/apache-storm-1.2.2/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /main/apache-storm-1.2.2/*:/main/apache-storm-1.2.2/lib/*:/main/apache-storm-1.2.2/extlib/*:/main/apache-storm-1.2.2/extlib-daemon/*:/main/apache-storm-1.2.2/conf:/main/apache-storm-1.2.2/bin org.apache.storm.command.kill_topology word-count