準備Linux環境node
修改主機名:linux
$ vim /etc/sysconfig/networkshell
NETWORKING=yesapache
HOSTNAME=hadoop001vim
修改IP:瀏覽器
# vim /etc/sysconfig/network-scripts/ifcfg-eth0框架
DEVICE=eth0ssh
HWADDR=♦♦♦♦♦♦♦♦♦♦♦♦♦ide
TYPE=Ethernetoop
UUID=♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
IPADDR=172.17.30.111
NETMASK=255.255.254.0
GATEWAY=172.17.30.1
DNS1=223.5.5.5
DNS2=223.6.6.6
關閉防火牆:
查看防火牆狀態
service iptables status
關閉防火牆
service iptables stop
查看防火牆開機啓動狀態
chkconfig iptables --list
關閉防火牆開機啓動
chkconfig iptables off
修改主機名和IP映射關係:
$ vim /etc/hosts
172.17.30.111 hadoop001
172.17.30.112 hadoop002
172.17.30.113 hadoop003
172.17.30.114 hadoop004
172.17.30.115 hadoop005
172.17.30.116 hadoop006
172.17.30.117 hadoop007
重啓機器:
# reboot
安裝JDK
解壓jdk:
# tar -zxvf jdk-7u79-linux-x64.tar.gz -C /opt/modules/
添加環境變量:
# vim /etc/profile
##JAVA
JAVA_HOME=/opt/modules/jdk1.7.0_79
JRE_HOME=/opt/modules/jdk1.7.0_79/jre
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME PATH CLASSPATH
刷新配置:
# source /etc/profile
安裝hadoop2.4.1
解壓hadoop2.4.1:
# tar -zxvf hadoop-2.4.1.tar.gz -C /opt/modules/
添加環境變量:
# vim /etc/profile
##HADOOP
export HADOOP_HOME=/opt/modules/hadoop-2.4.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
刷新配置:
# source /etc/profile
集羣規劃:
主機名 IP 安裝的軟件 運行的進程
hadoop001 172.17.30.111 jdk、hadoop NameNode、DFSZKFailoverController(zkfc)
hadoop002 172.17.30.112 jdk、hadoop NameNode、DFSZKFailoverController(zkfc)
hadoop003 172.17.30.113 jdk、hadoop ResourceManager
hadoop004 172.17.30.114 jdk、hadoop ResourceManager
hadoop005 172.17.30.115 jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain
hadoop006 172.17.30.116 jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain
hadoop007 172.17.30.117 jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain
說明:
1.在hadoop2.0中一般由兩個NameNode組成,一個處於active狀態,另外一個處於standby狀態。Active NameNode對外提供服務,而Standby NameNode則不對外提供服務,僅同步active namenode的狀態,以便可以在它失敗時快速進行切換。
hadoop2.0官方提供了兩種HDFS HA的解決方案,一種是NFS,另外一種是QJM。這裏咱們使用簡單的QJM。在該方案中,主備NameNode之間經過一組JournalNode同步元數據信息,一條數據只要成功寫入多數JournalNode即認爲寫入成功。一般配置奇數個JournalNode
這裏還配置了一個zookeeper集羣,用於ZKFC(DFSZKFailoverController)故障轉移,當Active NameNode掛掉了,會自動切換Standby NameNode爲standby狀態
2.hadoop-2.2.0中依然存在一個問題,就是ResourceManager只有一個,存在單點故障,hadoop-2.4.1解決了這個問題,有兩個ResourceManager,一個是Active,一個是Standby,狀態由zookeeper進行協調
配置HDFS:
修改hadoop-env.sh:
# vim hadoop-env.sh
export JAVA_HOME=/opt/modules/jdk1.7.0_79
修改core-site.xml:
# vim core-site.xml
<configuration> <!-- 指定hdfs的nameservice爲ns1 --> <property> <name>fs.defaultFS</name> <value>hdfs://ns1</value> </property> <!-- 指定hadoop臨時目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/data/tmp</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>hadoop005:2181,hadoop006:2181,hadoop007:2181</value> </property> </configuration> |
修改hdfs-site.xml:
# vim hdfs-site.xml
<configuration> <!--指定hdfs的nameservice爲ns1,須要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <!-- ns1下面有兩個NameNode,分別是nn1,nn2 --> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <!-- nn1的RPC通訊地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>hadoop001:9000</value> </property> <!-- nn1的http通訊地址 --> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>hadoop001:50070</value> </property> <!-- nn2的RPC通訊地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>hadoop002:9000</value> </property> <!-- nn2的http通訊地址 --> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>hadoop002:50070</value> </property> <!-- 指定NameNode的元數據在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop005:8485;hadoop006:8485;hadoop007:8485/ns1</value> </property> <!-- 指定JournalNode在本地磁盤存放數據的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/data/journaldata</value> </property> <!-- 開啓NameNode失敗自動切換 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失敗自動切換實現方式 --> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔離機制方法,多個機制用換行分割,即每一個機制暫用一行--> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <!-- 使用sshfence隔離機制時須要ssh免登錄 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔離機制超時時間 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> </configuration> |
修改mapred-site.xml:
# cp mapred-site.xml.template mapred-site.xml
# vim mapred-site.xml
<configuration> <!-- 指定mr框架爲yarn方式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> |
修改yarn-site.xml:
# vim yarn-site.xml
<configuration> <!-- 開啓RM高可用 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <!-- 指定RM的名字 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- 分別指定RM的地址 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop003</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop004</value> </property> <!-- 指定zk集羣地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop005:2181,hadoop006:2181,hadoop007:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> |
修改slaves(slaves是指定子節點的位置,由於要在hadoop001上啓動HDFS、在hadoop003啓動yarn,因此hadoop001上的slaves文件指定的是datanode的位置,hadoop003上的slaves文件指定的是nodemanager的位置):
# vim slaves
hadoop005 hadoop006 hadoop007 |
配置免密碼登陸:
在hadoop001上產生一對密鑰
# ssh-keygen -t rsa
配置hadoop001到hadoop00二、hadoop00三、hadoop00四、hadoop00五、hadoop00六、hadoop007的免密碼登錄
將公鑰拷貝到其餘節點,包括本身
# ssh-copy-id hadoop001
# ssh-copy-id hadoop002
# ssh-copy-id hadoop003
# ssh-copy-id hadoop004
# ssh-copy-id hadoop005
# ssh-copy-id hadoop006
# ssh-copy-id hadoop007
在hadoop003上產生一對密鑰
# ssh-keygen -t rsa
配置hadoop003到hadoop00四、hadoop00五、hadoop00六、hadoop007的免密碼登錄
# ssh-copy-id hadoop004
# ssh-copy-id hadoop005
# ssh-copy-id hadoop006
# ssh-copy-id hadoop007
注意:兩個namenode之間要配置ssh免密碼登錄,
在hadoop002上產生一對密鑰
# ssh-keygen -t rsa
配置hadoop002到hadoop001的免登錄
# ssh-copy-id hadoop001
將配置好的hadoop2.4.1拷貝到其餘節點:
# scp -r hadoop-2.4.1/ hadoop002:/opt/modules/
# scp -r hadoop-2.4.1/ hadoop003:/opt/modules/
# scp -r hadoop-2.4.1/ hadoop004:/opt/modules/
# scp -r hadoop-2.4.1/ hadoop005:/opt/modules/
# scp -r hadoop-2.4.1/ hadoop006:/opt/modules/
# scp -r hadoop-2.4.1/ hadoop007:/opt/modules/
安裝配置zooekeeper集羣(在hadoop005)
解壓zookeeper:
# tar -zxvf zookeeper-3.4.5.tar.gz -C /opt/modules/
添加環境變量:
# vim /etc/profile
##ZOOKEEPER
export ZOOKEEPER_HOME=/opt/modules/zookeeper-3.4.5
export PATH=$PATH:$ZOOKEEPER_HOME/bin
修改配置:
# pwd
/opt/modules/zookeeper-3.4.5/conf
# cp zoo_sample.cfg zoo.cfg
# vim zoo.cfg
修改:dataDir=/opt/modules/zookeeper-3.4.5/tmp
在配置文件最後添加:
server.1=hadoop005:2888:3888
server.2=hadoop006:2888:3888
server.3=hadoop007:2888:3888
建立tmp文件夾
# mkdir tmp
在tmp文件夾中建立空文件添加myid文本爲1
# echo 1 > myid
示例:
# cat myid
1
將配置好的zookeeper拷貝到其餘節點:
# scp -r zookeeper-3.4.5/ hadoop006:/opt/modules/
# scp -r zookeeper-3.4.5/ hadoop007:/opt/modules/
注意:修改hadoop00六、hadoop007對應/opt/modules/zookeeper-3.4.5/tmp/myid內容
hadoop006:
# echo 2 > myid
hadoop007:
# echo 3 > myid
注意:第一次啓動集羣嚴格按照下面的步驟:
啓動zookeeper集羣(分別在hadoop00五、hadoop00、hadoop007上啓動)
$ zkServer.sh start
查看狀態:
# zkServer.sh status 一個leader兩個follower
啓動journalnode(分別在hadoop00五、hadoop00、hadoop007上執行)
# hadoop-daemon.sh start journalnode
運行jps命令:如果有JournalNode進程說明journalnode執行成功
# jps
示例:
2308 QuorumPeerMain
2439 JournalNode
2486 Jps
格式化HDFS:
# hdfs namenode –format
格式化後會在根據core-site.xml中的hadoop.tmp.dir配置生成個文件,這裏我配置的是/opt/data/tmp,而後將/opt/data/tmp拷貝到hadoop002的/opt/data/下。
scp -r tmp/ hadoop002: /opt/data/
格式化ZKFC:
# hdfs zkfc –formatZK
啓動HDFS(在hadoop001上啓動):
# start-dfs.sh
啓動YARN(注意:在hadoop003上啓動。把namenode和resourceManager分開是由於性能問題,由於他們都要佔用大量的資源,因此要分開,啓動固然是在不一樣機器上啓動。):
# start-yarn.sh
hadoop2.4.1配置完畢,能夠瀏覽器訪問:
NameNode’hadoop001:9000’(active)
NameNode’hadoop002:9000’(standby)
測試集羣工做狀態的一些指令 :
# hdfs dfsadmin -report 查看hdfs的各節點狀態信息
# hdfs haadmin -getServiceState nn1 獲取一個namenode節點的HA狀態
# hadoop-daemon.sh start namenode 單獨啓動一個namenode進程
# hadoop-daemon.sh start zkfc 單獨啓動一個zkfc進程
若是隻有3臺主機,能夠按照以下規劃來部署安裝
hadoop001 zookeeper journalnode namenode zkfc resourcemanager datanode
hadoop002 zookeeper journalnode namenode zkfc resourcemanager datanode
hadoop003 zookeeper journalnode datanode