Hadoop CDH4.5升級CDH5 以及NameNode和YARN HA實戰

CDH5支持不少新特性,因此打算把當前的CDH4.5升級到CDH5,軟件部署仍是以以前的CDH4.5集羣爲基礎 node

192.168.1.10    U-1  (Active) hadoop-yarn-resourcemanager  hadoop-hdfs-namenode hadoop-mapreduce-historyserver hadoop-yarn-proxyserver  hadoop-hdfs-zkfc
192.168.1.20    U-2  hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce  journalnode  zookeeper  zookeeper-server
192.168.1.30    U-3  hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce  journalnode  zookeeper  zookeeper-server
192.168.1.40    U-4  hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce  journalnode  zookeeper  zookeeper-server
192.168.1.50    U-5  hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
192.168.1.70    U-7  (Standby) hadoop-yarn-resourcemanager  hadoop-hdfs-namenode  hadoop-hdfs-zkfc
注意:由於咱們是升級CDH4.5到CDH5,因此上表並無列出來全部要安裝的軟件,由於在CDH4.5的時候已經安裝了一些,因此上面列出的軟件只是你升級的時候須要從新安裝的。


操做過程以下: web

1    Back Up Configuration Data and Stop Services shell

        1    namenode進入safe mode,保存fsimage apache

su - hdfs
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace

        2    中止集羣中的各類hadoop服務 bootstrap

for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done

2    Back up the HDFS Metadata ubuntu

        1    找到dfs.namenode.name.dir tomcat

grep -C1 name.dir /etc/hadoop/conf/hdfs-site.xml

        2    備份dfs.namenode.name.dir指定的目錄 app

tar czvf dfs.namenode.name.dir.tgz /data

3    Uninstall the CDH 4 Version of Hadoop ssh

        1    卸載hadoop組件 curl

apt-get remove bigtop-utils bigtop-jsvc bigtop-tomcat sqoop2-client hue-common

        2    刪除CDH4的repository files

mv /etc/apt/sources.list.d/cloudera-cdh4.list /root/

4    Download the Latest Version of CDH 5

        1    下載CDH5的repository

wget 'http://archive.cloudera.com/cdh5/one-click-install/precise/amd64/cdh5-repository_1.0_all.deb'

        2    安裝CDH5的repository

dpkg -i cdh5-repository_1.0_all.deb 
curl -s http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key |  apt-key add -

5    Install CDH 5 with YARN

        1    安裝zookeeper

        2    在各個主機上安裝相關組件

                1    Resource Manager host

apt-get install hadoop-yarn-resourcemanager

                2    NameNode host(s)

apt-get install hadoop-hdfs-namenode

                3    All cluster hosts except the Resource Manager

apt-get install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

                4    One host in the cluster(Active NameNode)

apt-get install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

                5    All client hosts

apt-get install hadoop-client

6    Install CDH 5 with MRv1

        由於CDH5已經主推YARN了,因此咱們再也不使用MRv1,就不安裝了。

7    In an HA Deployment, Upgrade and Start the Journal Nodes

        1    安裝journal nodes

apt-get install hadoop-hdfs-journalnode

        2    啓動journal node

service hadoop-hdfs-journalnode start

8    Upgrade the HDFS Metadata

        HA模式和NON-HA模式的升級方式不同,由於咱們以前的CDH4.5是HA模式的,因此咱們就按照HA模式的來升級

        1    在active namenode上執行

service hadoop-hdfs-namenode upgrade

        2    重啓standby namenode

su - hdfs
hdfs namenode -bootstrapStandby
service hadoop-hdfs-namenode start

        3    啓動datanode

service hadoop-hdfs-datanode start

        4    查看版本


9    Start YARN

        1    建立相關目錄

su - hdfs
hadoop fs -mkdir /user/history
hdfs fs -chmod -R 1777 /user/history
hdfs fs -chown yarn /user/history
hdfs fs -mkdir /var/log/hadoop-yarn
hdfs fs -chown yarn:mapred /var/log/hadoop-yarn
hadoop fs -ls -R /

        2    在各個hadoop集羣集羣上啓動相關服務

service hadoop-yarn-resourcemanager start
service hadoop-yarn-nodemanager start
service hadoop-mapreduce-historyserver start

10   配置NameNode的HA配置

        1     NameNode HA和CDH4.5的部署同樣,只是要把yarn-site.xml中的mapreduce.shuffle修改成mapreduce_shuffle便可。

        2    驗證


11    配置YARN的HA配置

        1    Stop all YARN daemons

service hadoop-yarn-nodemanager stop
service hadoop-yarn-resourcemanager stop
service hadoop-mapreduce-historyserver stop

        2    Update the configuration used by the ResourceManagers, NodeManagers and clients

                如下是U-1上的配置,core-site.xml、hdfs-site.xml、mapred-site.xml三個文件都不須要作修改,惟一要修改的是yarn-site.xml

                core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://mycluster/</value>
  </property>

  <property>
    <name>ha.zookeeper.quorum</name>
    <value>U-2:2181,U-3:2181,U-4:2181</value>
  </property>

</configuration>

                hdfs-site.xml

<configuration>
  <property>
     <name>dfs.permissions.superusergroup</name>
     <value>hadoop</value>
  </property>

  <property>
     <name>dfs.namenode.name.dir</name>
     <value>/data</value>
  </property>

  <property>
     <name>dfs.datanode.data.dir</name>
     <value>/data01,/data02</value>
  </property>

  <property>
     <name>dfs.nameservices</name>
     <value>mycluster</value>
  </property>

<!--  HA Config  -->
  <property>
      <name>dfs.ha.namenodes.mycluster</name>
      <value>U-1,U-7</value>
  </property>

  <property>
      <name>dfs.namenode.rpc-address.mycluster.U-1</name>
      <value>U-1:8020</value>
  </property>

  <property>
      <name>dfs.namenode.rpc-address.mycluster.U-7</name>
      <value>U-7:8020</value>
  </property>

  <property>
      <name>dfs.namenode.http-address.mycluster.U-1</name>
      <value>U-1:50070</value>
  </property>

  <property>
      <name>dfs.namenode.http-address.mycluster.U-7</name>
      <value>U-7:50070</value>
  </property>

  <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://U-2:8485;U-3:8485;U-4:8485/mycluster</value>
  </property>

  <property>
      <name>dfs.journalnode.edits.dir</name>
      <value>/jdata</value>
  </property>

  <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>

  <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
  </property>

  <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value>
  </property>

  <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
  </property>

</configuration>

                mapred-site.xml

<configuration>
 
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>

<property>
 <name>mapreduce.jobhistory.address</name>
 <value>U-1:10020</value>
</property>
<property>
 <name>mapreduce.jobhistory.webapp.address</name>
 <value>U-1:19888</value>
</property>

</configuration>

                yarn-site.xml

<configuration>
<!-- Resource Manager Configs -->
  <property>
    <name>yarn.resourcemanager.connect.retry-interval.ms</name>
    <value>2000</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>yarn-rm-cluster</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>U-1,U-7</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.id</name>
    <value>U-1</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>


  <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>U-2:2181,U-3:2181,U-4:2181</value>
  </property>

  <property>
    <name>yarn.resourcemanager.zk.state-store.address</name>
    <value>U-1:2181</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
    <value>5000</value>
  </property>

  <!-- RM1 configs -->
  <property>
    <name>yarn.resourcemanager.address.U-1</name>
    <value>U-1:23140</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.U-1</name>
    <value>U-1:23130</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address.U-1</name>
    <value>U-1:23189</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.U-1</name>
    <value>U-1:23188</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.U-1</name>
    <value>U-1:23125</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address.U-1</name>
    <value>U-1:23141</value>
  </property>

  <!-- RM2 configs -->
  <property>
    <name>yarn.resourcemanager.address.U-7</name>
    <value>U-7:23140</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.U-7</name>
    <value>U-7:23130</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address.U-7</name>
    <value>U-7:23189</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.U-7</name>
    <value>U-7:23188</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.U-7</name>
    <value>U-7:23125</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address.U-7</name>
    <value>U-7:23141</value>
  </property>

<!-- Node Manager Configs -->
  <property>
    <description>Address where the localizer IPC is.</description>
    <name>yarn.nodemanager.localizer.address</name>
    <value>0.0.0.0:23344</value>
  </property>
  <property>
    <description>NM Webapp address.</description>
    <name>yarn.nodemanager.webapp.address</name>
    <value>0.0.0.0:23999</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/yarn/local</value>
  </property>
  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/yarn/log</value>
  </property>
  <property>
    <name>mapreduce.shuffle.port</name>
    <value>23080</value>
  </property>
</configuration>

            注意:在把yarn-site.xml拷貝到U-7後,須要把U-7上的yarn-site.xml的yarn.resourcemanager.ha.id的值修改成U-7,不然ResourceManager啓動不了。 

        3    Start all YARN daemons

service hadoop-yarn-resourcemanager start
service hadoop-yarn-nodemanager start

        4    驗證


                我勒個去的,這是啥問題,沒有找到相應的ZKFC地址?



今天再次實驗YARN的HA機制,發現官方的郵件列表有以下解釋:

Right now, RM HA does not use ZKFC. So, we can not use this command 「yarn rmadmin -failover
rm1 rm2」 now.



If you use the default HA configuration, you set up a Automatic RM HA. In order to failover
manually,  you have two options:

set up manual RM HA by set the configuration 「yarn.resourcemanager.ha.automatic-failover.enable」
as false. Then you can use command 「yarn rmadmin –transitionToActive rm1」, 「yarn rmadmin
–transitionToStandby rm2」 to control which rm goes to active by yourself.
If you really want to experiment the manual failover when automatic failover enabled, you
can use command 「yarn rmadmin –transitionToActive --forcemanual rm2"
Thanks
        原來是個人姿式不對....


        參考:https://issues.apache.org/jira/browse/YARN-3006

                 https://issues.apache.org/jira/browse/YARN-1177

相關文章
相關標籤/搜索