hadoop2.4.1集羣搭建

準備Linux環境node

修改主機名linux

$ vim /etc/sysconfig/networkshell

NETWORKING=yesapache

HOSTNAME=hadoop001vim

 

修改IP瀏覽器

# vim /etc/sysconfig/network-scripts/ifcfg-eth0框架

DEVICE=eth0ssh

HWADDR=♦♦♦♦♦♦♦♦♦♦♦♦♦ide

TYPE=Ethernetoop

UUID=♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

ONBOOT=yes

NM_CONTROLLED=yes

BOOTPROTO=static

IPADDR=172.17.30.111

NETMASK=255.255.254.0

GATEWAY=172.17.30.1

DNS1=223.5.5.5

DNS2=223.6.6.6

 

關閉防火牆

查看防火牆狀態

         service iptables status

         關閉防火牆

         service iptables stop

         查看防火牆開機啓動狀態

         chkconfig iptables --list

         關閉防火牆開機啓動

         chkconfig iptables off

 

修改主機名和IP映射關係

$ vim /etc/hosts

172.17.30.111   hadoop001

172.17.30.112   hadoop002

172.17.30.113   hadoop003

172.17.30.114   hadoop004

172.17.30.115   hadoop005

172.17.30.116   hadoop006

172.17.30.117   hadoop007

 

重啓機器

# reboot

 

安裝JDK

解壓jdk

# tar -zxvf jdk-7u79-linux-x64.tar.gz -C /opt/modules/

 

添加環境變量

# vim /etc/profile

##JAVA

JAVA_HOME=/opt/modules/jdk1.7.0_79

JRE_HOME=/opt/modules/jdk1.7.0_79/jre

PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib

export JAVA_HOME JRE_HOME PATH CLASSPATH

 

刷新配置

# source /etc/profile

 

安裝hadoop2.4.1

解壓hadoop2.4.1

# tar -zxvf hadoop-2.4.1.tar.gz -C /opt/modules/

 

添加環境變量

# vim /etc/profile

##HADOOP

export HADOOP_HOME=/opt/modules/hadoop-2.4.1

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

 

刷新配置

# source /etc/profile

 

集羣規劃:

         主機名                IP                            安裝的軟件                                 運行的進程

         hadoop001       172.17.30.111         jdk、hadoop                              NameNode、DFSZKFailoverController(zkfc)

         hadoop002       172.17.30.112         jdk、hadoop                              NameNode、DFSZKFailoverController(zkfc)

         hadoop003       172.17.30.113         jdk、hadoop                              ResourceManager

         hadoop004       172.17.30.114         jdk、hadoop                              ResourceManager

         hadoop005       172.17.30.115         jdk、hadoop、zookeeper          DataNode、NodeManager、JournalNode、QuorumPeerMain

         hadoop006       172.17.30.116         jdk、hadoop、zookeeper          DataNode、NodeManager、JournalNode、QuorumPeerMain

         hadoop007       172.17.30.117         jdk、hadoop、zookeeper          DataNode、NodeManager、JournalNode、QuorumPeerMain

        

說明:

         1.在hadoop2.0中一般由兩個NameNode組成,一個處於active狀態,另外一個處於standby狀態。Active NameNode對外提供服務,而Standby NameNode則不對外提供服務,僅同步active namenode的狀態,以便可以在它失敗時快速進行切換。

         hadoop2.0官方提供了兩種HDFS HA的解決方案,一種是NFS,另外一種是QJM。這裏咱們使用簡單的QJM。在該方案中,主備NameNode之間經過一組JournalNode同步元數據信息,一條數據只要成功寫入多數JournalNode即認爲寫入成功。一般配置奇數個JournalNode

         這裏還配置了一個zookeeper集羣,用於ZKFC(DFSZKFailoverController)故障轉移,當Active NameNode掛掉了,會自動切換Standby NameNode爲standby狀態

         2.hadoop-2.2.0中依然存在一個問題,就是ResourceManager只有一個,存在單點故障,hadoop-2.4.1解決了這個問題,有兩個ResourceManager,一個是Active,一個是Standby,狀態由zookeeper進行協調

 

 

配置HDFS:

修改hadoop-env.sh

# vim hadoop-env.sh

export JAVA_HOME=/opt/modules/jdk1.7.0_79

 

修改core-site.xml

# vim core-site.xml

<configuration>

         <!-- 指定hdfs的nameservice爲ns1 -->

         <property>

                   <name>fs.defaultFS</name>

                   <value>hdfs://ns1</value>

         </property>

         <!-- 指定hadoop臨時目錄 -->

         <property>

                   <name>hadoop.tmp.dir</name>

                   <value>/opt/data/tmp</value>

         </property>

         <!-- 指定zookeeper地址 -->

         <property>

                   <name>ha.zookeeper.quorum</name>

                   <value>hadoop005:2181,hadoop006:2181,hadoop007:2181</value>

         </property>

</configuration>

修改hdfs-site.xml

# vim hdfs-site.xml

<configuration>

         <!--指定hdfs的nameservice爲ns1,須要和core-site.xml中的保持一致 -->

         <property>

                   <name>dfs.nameservices</name>

                   <value>ns1</value>

         </property>

         <!-- ns1下面有兩個NameNode,分別是nn1,nn2 -->

         <property>

                   <name>dfs.ha.namenodes.ns1</name>

                   <value>nn1,nn2</value>

         </property>

         <!-- nn1的RPC通訊地址 -->

         <property>

                   <name>dfs.namenode.rpc-address.ns1.nn1</name>

                   <value>hadoop001:9000</value>

         </property>

         <!-- nn1的http通訊地址 -->

         <property>

                   <name>dfs.namenode.http-address.ns1.nn1</name>

                   <value>hadoop001:50070</value>

         </property>

         <!-- nn2的RPC通訊地址 -->

         <property>

                   <name>dfs.namenode.rpc-address.ns1.nn2</name>

                   <value>hadoop002:9000</value>

         </property>

         <!-- nn2的http通訊地址 -->

         <property>

                   <name>dfs.namenode.http-address.ns1.nn2</name>

                   <value>hadoop002:50070</value>

         </property>

         <!-- 指定NameNode的元數據在JournalNode上的存放位置 -->

         <property>

                   <name>dfs.namenode.shared.edits.dir</name>

                   <value>qjournal://hadoop005:8485;hadoop006:8485;hadoop007:8485/ns1</value>

         </property>

         <!-- 指定JournalNode在本地磁盤存放數據的位置 -->

         <property>

                   <name>dfs.journalnode.edits.dir</name>

                   <value>/opt/data/journaldata</value>

         </property>

         <!-- 開啓NameNode失敗自動切換 -->

         <property>

                   <name>dfs.ha.automatic-failover.enabled</name>

                   <value>true</value>

         </property>

         <!-- 配置失敗自動切換實現方式 -->

         <property>

                   <name>dfs.client.failover.proxy.provider.ns1</name>

                   <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

         </property>

         <!-- 配置隔離機制方法,多個機制用換行分割,即每一個機制暫用一行-->

         <property>

                   <name>dfs.ha.fencing.methods</name>

                   <value>

                            sshfence

                            shell(/bin/true)

                   </value>

         </property>

         <!-- 使用sshfence隔離機制時須要ssh免登錄 -->

         <property>

                   <name>dfs.ha.fencing.ssh.private-key-files</name>

                   <value>/root/.ssh/id_rsa</value>

         </property>

         <!-- 配置sshfence隔離機制超時時間 -->

         <property>

                   <name>dfs.ha.fencing.ssh.connect-timeout</name>

                   <value>30000</value>

         </property>

</configuration>

修改mapred-site.xml

# cp mapred-site.xml.template mapred-site.xml

# vim mapred-site.xml

<configuration>

         <!-- 指定mr框架爲yarn方式 -->

         <property>

                   <name>mapreduce.framework.name</name>

                   <value>yarn</value>

         </property>

</configuration>

修改yarn-site.xml

# vim yarn-site.xml

<configuration>

         <!-- 開啓RM高可用 -->

         <property>

            <name>yarn.resourcemanager.ha.enabled</name>

            <value>true</value>

         </property>

         <!-- 指定RM的cluster id -->

         <property>

            <name>yarn.resourcemanager.cluster-id</name>

            <value>yrc</value>

         </property>

         <!-- 指定RM的名字 -->

         <property>

            <name>yarn.resourcemanager.ha.rm-ids</name>

            <value>rm1,rm2</value>

         </property>

         <!-- 分別指定RM的地址 -->

         <property>

            <name>yarn.resourcemanager.hostname.rm1</name>

            <value>hadoop003</value>

         </property>

         <property>

            <name>yarn.resourcemanager.hostname.rm2</name>

            <value>hadoop004</value>

         </property>

         <!-- 指定zk集羣地址 -->

         <property>

            <name>yarn.resourcemanager.zk-address</name>

            <value>hadoop005:2181,hadoop006:2181,hadoop007:2181</value>

         </property>

         <property>

            <name>yarn.nodemanager.aux-services</name>

            <value>mapreduce_shuffle</value>

         </property>

</configuration>

修改slaves(slaves是指定子節點的位置,由於要在hadoop001上啓動HDFS、在hadoop003啓動yarn,因此hadoop001上的slaves文件指定的是datanode的位置,hadoop003上的slaves文件指定的是nodemanager的位置):

# vim slaves

hadoop005

hadoop006

hadoop007

 

 

配置免密碼登陸:

在hadoop001上產生一對密鑰

# ssh-keygen -t rsa

配置hadoop001到hadoop00二、hadoop00三、hadoop00四、hadoop00五、hadoop00六、hadoop007的免密碼登錄

將公鑰拷貝到其餘節點,包括本身

# ssh-copy-id hadoop001

# ssh-copy-id hadoop002

# ssh-copy-id hadoop003

# ssh-copy-id hadoop004

# ssh-copy-id hadoop005

# ssh-copy-id hadoop006

# ssh-copy-id hadoop007

 

在hadoop003上產生一對密鑰

# ssh-keygen -t rsa

配置hadoop003到hadoop00四、hadoop00五、hadoop00六、hadoop007的免密碼登錄

# ssh-copy-id hadoop004

# ssh-copy-id hadoop005

# ssh-copy-id hadoop006

# ssh-copy-id hadoop007

 

注意:兩個namenode之間要配置ssh免密碼登錄,

在hadoop002上產生一對密鑰

# ssh-keygen -t rsa

配置hadoop002到hadoop001的免登錄

# ssh-copy-id hadoop001

 

 

將配置好的hadoop2.4.1拷貝到其餘節點

# scp -r hadoop-2.4.1/ hadoop002:/opt/modules/

# scp -r hadoop-2.4.1/ hadoop003:/opt/modules/

# scp -r hadoop-2.4.1/ hadoop004:/opt/modules/

# scp -r hadoop-2.4.1/ hadoop005:/opt/modules/

# scp -r hadoop-2.4.1/ hadoop006:/opt/modules/

# scp -r hadoop-2.4.1/ hadoop007:/opt/modules/

 

安裝配置zooekeeper集羣(在hadoop005)

解壓zookeeper

# tar -zxvf zookeeper-3.4.5.tar.gz -C /opt/modules/

 

添加環境變量

# vim /etc/profile

##ZOOKEEPER

export ZOOKEEPER_HOME=/opt/modules/zookeeper-3.4.5

export PATH=$PATH:$ZOOKEEPER_HOME/bin

 

修改配置

# pwd

/opt/modules/zookeeper-3.4.5/conf

# cp zoo_sample.cfg zoo.cfg

# vim zoo.cfg

修改:dataDir=/opt/modules/zookeeper-3.4.5/tmp

在配置文件最後添加:

server.1=hadoop005:2888:3888

server.2=hadoop006:2888:3888

server.3=hadoop007:2888:3888

建立tmp文件夾

# mkdir tmp

在tmp文件夾中建立空文件添加myid文本爲1

# echo 1 > myid

示例

# cat myid 

1

 

將配置好的zookeeper拷貝到其餘節點

# scp -r zookeeper-3.4.5/ hadoop006:/opt/modules/

# scp -r zookeeper-3.4.5/ hadoop007:/opt/modules/

注意:修改hadoop00六、hadoop007對應/opt/modules/zookeeper-3.4.5/tmp/myid內容

hadoop006:

# echo 2 > myid

hadoop007:

# echo 3 > myid

 

 

注意:第一次啓動集羣嚴格按照下面的步驟:

啓動zookeeper集羣(分別在hadoop00五、hadoop00、hadoop007上啓動)

$ zkServer.sh start

查看狀態

# zkServer.sh status 一個leader兩個follower

 

啓動journalnode(分別在hadoop00五、hadoop00、hadoop007上執行)

# hadoop-daemon.sh start journalnode

運行jps命令:如果有JournalNode進程說明journalnode執行成功

# jps

示例

2308 QuorumPeerMain

2439 JournalNode

2486 Jps

 

格式化HDFS

# hdfs namenode –format

格式化後會在根據core-site.xml中的hadoop.tmp.dir配置生成個文件,這裏我配置的是/opt/data/tmp,而後將/opt/data/tmp拷貝到hadoop002的/opt/data/下。

scp -r tmp/ hadoop002: /opt/data/

 

格式化ZKFC

# hdfs zkfc –formatZK

 

啓動HDFS(在hadoop001上啓動):

# start-dfs.sh

 

啓動YARN(注意:在hadoop003上啓動。把namenode和resourceManager分開是由於性能問題,由於他們都要佔用大量的資源,因此要分開,啓動固然是在不一樣機器上啓動。):

# start-yarn.sh

 

 

 

hadoop2.4.1配置完畢,能夠瀏覽器訪問:

http://hadoop001:50070

NameNode’hadoop001:9000’(active)

http://hadoop002:50070

NameNode’hadoop002:9000’(standby)

 

測試集羣工做狀態的一些指令 :

# hdfs dfsadmin -report   查看hdfs的各節點狀態信息

# hdfs haadmin -getServiceState nn1               獲取一個namenode節點的HA狀態

# hadoop-daemon.sh start namenode  單獨啓動一個namenode進程

# hadoop-daemon.sh start zkfc   單獨啓動一個zkfc進程

 

 

若是隻有3臺主機,能夠按照以下規劃來部署安裝                  

hadoop001                                   zookeeper    journalnode   namenode zkfc    resourcemanager  datanode

hadoop002                                   zookeeper    journalnode   namenode zkfc    resourcemanager  datanode

hadoop003                                   zookeeper    journalnode   datanode        

相關文章
相關標籤/搜索