錯誤現象,剛開始 namenode log一直刷如下錯誤信息:html
2014-01-27 17:55:59,388 WARN resources.ExceptionHandler (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERRORjava
後面與此文相似,見 Hadoop運維筆記 之 Namenode異常中止後沒法正常啓動。node
同系 Hadoop-2.10-beta 版本的 bug(testNamenodeRestart fails with NullPointerException in trunk),linux
This is actually due to a bug in the NN. The http services are started before the image is loaded, the edits are processed, and the rpc server is started. During image loading and edits processing, webhdfs will NPE on the rpc server.web
無發啓動,只好重作 Standby,具體步驟以下:sql
一、首先在 Active 上執行如下命令,而後手動備份整個 name目錄:apache
# 關閉 故障自動切換控制器 hadoop-daemon.sh stop zkfc # 進入安全模式 hdfs dfsadmin -safemode enter # 刷新editslog 到fsimage hdfs dfsadmin -saveNamespace
二、而後在 Standby 上,先備份整個 name 及 journal 目錄,再執行:bootstrap
hadoop-daemon.sh stop zkfc hdfs namenode -bootstrapStandby
若報錯:安全
FATAL ha.BootstrapStandby: Unable to read transaction ids 10-100 from the configured shared edits storage qjournal://1.1.1.1:8485;1.1.1.2:8485/sec-hdfs-cluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node.
Error: Gap in transactions. Expected to be able to read up until at least txid 10 but unable to find any edit logs containing txid 10bash
則將 Active 上整個 name目錄複製到 Standby,而後直接啓動namenode便可:
scp -r /data/hadoop/name/ $standby_ip:/data/hadoop hadoop-daemon.sh start namenode
三、注意,此時無需執行 「bootstrapStandby」,不然會將剛剛複製過來的 name 目錄重建清空。
參考: