upgrade小結:
1.dfsadmin -upgradeProgress status 在cdh5.2.0中沒有,在4.6.0有(見源碼org.apache.hadoop.hdfs.tools.DFSAdmin)
升級的時候不能經過這個看到升級狀態
rollingUpgrade這個參數在4.6.0中不存在,在5.2.0中有,能夠用於滾動升級
2.在cdh5.2.0中執行upgrade,nn中調用的命令是
html
hadoop-daemon.sh start namenode -upgrade
最終調用org.apache.hadoop.hdfs.server.namenode.NameNode類,並傳入upgrade參數
3.yarn的設置有些變更,下面兩個參數會影響nm是否啓動正常
java
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
修改成:
node
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
4.impala1.1.1和cdh5.2.0有兼容性問題
cdh5.x開始使用PB作通訊,須要升級impala到2.0.0(impala的可用性/穩定性/性能須要測試)apache
rollback小結:
1.rollback要在4.6.0版本上運行
在4.6.0上運行rollback,finalize,upgrade時,都會判斷時否爲ha模式,若是沒有關閉ha的配置會報以下錯誤:
bootstrap
14/11/19 15:25:47 FATAL namenode.NameNode: Exception in namenode join org.apache.hadoop.HadoopIllegalArgumentException: Invalid startup option. Cannot perform DFS upgrade with HA enabled. at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1130) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1241)
namenode類main方法中經過createNameNode方法建立一個NameNode的實例
bash
public static void main(String argv[]) throws Exception { if (DFSUtil.parseHelpArgument(argv, NameNode.USAGE, System.out, true)) { System.exit(0); } try { StringUtils.startupShutdownMessage(NameNode.class, argv, LOG); NameNode namenode = createNameNode(argv, null); if (namenode != null) namenode.join(); } catch (Throwable e) { LOG.fatal("Exception in namenode join", e); terminate(1, e); } }
而在createNameNode方法中會經過下面的代碼檢測是否爲ha的配置,而在5.2.0是沒有這個限制的。
ide
if (HAUtil.isHAEnabled(conf, DFSUtil.getNamenodeNameServiceId(conf)) && (startOpt == StartupOption.UPGRADE || startOpt == StartupOption.ROLLBACK || startOpt == StartupOption.FINALIZE)) { throw new HadoopIllegalArgumentException("Invalid startup option. " + "Cannot perform DFS upgrade with HA enabled."); }
這裏會涉及到兩個判斷方法:
1)oop
org.apache.hadoop.hdfs.HAUti類的isHAEnabled方法: public static boolean isHAEnabled(Configuration conf, String nsId) { Map<String, Map<String, InetSocketAddress>> addresses = DFSUtil.getHaNnRpcAddresses(conf); if (addresses == null) return false; Map<String, InetSocketAddress> nnMap = addresses.get(nsId); return nnMap != null && nnMap.size() > 1; }
這裏會依次調用org.apache.hadoop.hdfs.DFSUtil類的性能
getHaNnRpcAddresses/getAddresses/getNameServiceIds/getAddressesForNameserviceId/getNameNodeIds
方法,解析dfs.nameservices/dfs.ha.namenodes.xxxx/dfs.namenode.rpc-address.xxxx的設置來獲取每一個nameserviceid到對應的nn rpc地址的對應關係(測試
Map<String, Map<String, InetSocketAddress>>
)並判斷map value的size(若是dfs.ha.namenodes.x設置超過1個就算ha),這裏只要把配置改爲下面便可
<property> <name>dfs.ha.namenodes.bipcluster</name> <value>nn1</value> </property>
2)若是設置了jn會報以下錯誤:
14/11/19 16:47:32 FATAL namenode.NameNode: Exception in namenode join java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:576) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:513) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:445) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:621) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:606) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1177) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1241) 14/11/19 16:47:32 INFO util.ExitUtil: Exiting with status 1
錯誤在FSNamesystem類的構造方法中:
final boolean persistBlocks = conf.getBoolean(DFS_PERSIST_BLOCKS_KEY, DFS_PERSIST_BLOCKS_DEFAULT); //默認是false // block allocation has to be persisted in HA using a shared edits directory // so that the standby has up-to-date namespace information String nameserviceId = DFSUtil.getNamenodeNameServiceId(conf); this.haEnabled = HAUtil.isHAEnabled(conf, nameserviceId); this.persistBlocks = persistBlocks || (haEnabled && HAUtil.usesSharedEditsDir(conf)); // Sanity check the HA-related config. if (nameserviceId != null) { LOG.info("Determined nameservice ID: " + nameserviceId); //Determined nameservice ID: bipcluster } LOG.info("HA Enabled: " + haEnabled); //HA Enabled: false if (!haEnabled && HAUtil.usesSharedEditsDir(conf)) { //異常由這裏拋出 LOG.warn("Configured NNs:\n" + DFSUtil.nnAddressesAsString(conf)); //Configured NNs: Nameservice <bipcluster>: NN ID nn1 => xxxx:8020 throw new IOException("Invalid configuration: a shared edits dir " + "must not be specified if HA is not enabled."); }
HAUtil.usesSharedEditsDir方法:
public static boolean usesSharedEditsDir(Configuration conf) { return null != conf.get(DFS_NAMENODE_SHARED_EDITS_DIR_KEY); }
判斷jn的edit dir設置,若是設置了dfs.namenode.shared.edits.dir就會拋出異常
去掉下面的設置便可:
<property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://xxxxx/bipcluster</value> </property>
2.rollback的時候要先把ha的配置更改成非ha的配置,而後進行rollback
rollback以後,再重作ha
重作ha的步驟簡單以下:
1).關閉整個集羣(非ha的),更改配置爲ha的配置,備份原來的standby nn和jn的數據目錄
2).刪除舊的jn數據,並單獨啓動jn
./hadoop-daemon.sh start journalnode
3).在active nn上運行
hdfs namenode -initializeSharedEdits
namenode經過initializeSharedEdits命令來初始化journalnode,把edits文件共享到journalnode上
4).運行active nn
./hadoop-daemon.sh start namenode
5).在standby的nn上運行
hadoop-daemon.sh start namenode -bootstrapStandby hadoop-daemon.sh start namenode
同步元數據並啓動standby namenode
6).啓動全部的dn
./hadoop-daemons.sh start datanode
7).nn transitionToActive
hdfs haadmin -transitionToActive nn1 hdfs haadmin -getServiceState nn1
3.start-dfs.sh 這個腳本有bug,在傳入-rollback時,只能dn能夠rollback,nn不能rollback
diff ../../hadoop-2.5.0-cdh5.2.0/sbin/start-dfs.sh start-dfs.sh 50c50 < nameStartOpt="$nameStartOpt $@" --- > nameStartOpt="$nameStartOpts $@"
也能夠直接經過
sh -x ./hadoop-daemon.sh start namenode -rollback
命令rollback nn,注意dn仍是要rollback的
4.若是rollback或upgrade失敗,能夠經過以前的備份元數據進行覆蓋恢復