Hadoop HA重作 Standby

錯誤現象,剛開始 namenode log一直刷如下錯誤信息:html

2014-01-27 17:55:59,388 WARN  resources.ExceptionHandler (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERRORjava

後面與此文相似,見 Hadoop運維筆記 之 Namenode異常中止後沒法正常啓動node

同系 Hadoop-2.10-beta 版本的 bug(testNamenodeRestart fails with NullPointerException in trunk),linux

This is actually due to a bug in the NN. The http services are started before the image is loaded, the edits are processed, and the rpc server is started. During image loading and edits processing, webhdfs will NPE on the rpc server.web

 

無發啓動,只好重作 Standby,具體步驟以下:sql

一、首先在 Active 上執行如下命令,而後手動備份整個 name目錄:apache

# 關閉 故障自動切換控制器
hadoop-daemon.sh stop zkfc

# 進入安全模式
hdfs dfsadmin -safemode enter

# 刷新editslog 到fsimage
hdfs dfsadmin -saveNamespace

二、而後在 Standby 上,先備份整個 name 及 journal 目錄,再執行:bootstrap

hadoop-daemon.sh stop zkfc
hdfs namenode -bootstrapStandby

若報錯:安全

FATAL ha.BootstrapStandby: Unable to read transaction ids 10-100 from the configured shared edits storage qjournal://1.1.1.1:8485;1.1.1.2:8485/sec-hdfs-cluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node.
Error: Gap in transactions. Expected to be able to read up until at least txid 10 but unable to find any edit logs containing txid 10
bash

則將 Active 上整個 name目錄複製到 Standby,而後直接啓動namenode便可:

scp -r /data/hadoop/name/ $standby_ip:/data/hadoop
hadoop-daemon.sh start namenode

三、注意,此時無需執行 「bootstrapStandby」,不然會將剛剛複製過來的 name 目錄重建清空。

參考:

相關文章
相關標籤/搜索