HBase啓動後HMaster掛掉的一種解決方案

時間 2019-11-13

標籤 hbase 啓動 hmaster 掛掉一種解決方案欄目 Hadoop 简体版

原文原文鏈接

最近在使用HBase的時候，發現使用HBase shell的list命令時出錯，jps查看了一下進程，發現HMaster掛掉了，在確認Hadoop狀態正常後，查看HMaster的日誌，發現報錯以下：
java

2015-02-17 05:46:15,212 DEBUG [master:master:60000] lock.ZKInterProcessLockBase: Released /hbase/table-lock/hbase:namespace/write-master:600000000000004
2015-02-17 05:46:15,212 FATAL [master:master:60000] master.HMaster: Master server abort: loaded coprocessors are: []
2015-02-17 05:46:15,213 FATAL [master:master:60000] master.HMaster: Unhandled exception. Starting shutdown.
        org.apache.hadoop.hbase.TableExistsException: hbase:namespace
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:120)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:232)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1049)
        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:913)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)
        at java.lang.Thread.run(Unknown Source)
2015-02-17 05:46:15,214 INFO  [master:master:60000] master.HMaster: Aborting
2015-02-17 05:46:15,214 INFO  [master,60000,1424180766819-BalancerChore] balancer.BalancerChore: master,60000,1424180766819-BalancerChore exiting
2015-02-17 05:46:15,215 INFO  [master,60000,1424180766819-ClusterStatusChore] balancer.ClusterStatusChore: master,60000,1424180766819-ClusterStatusChore exiting
2015-02-17 05:46:15,215 INFO  [CatalogJanitor-master:60000] master.CatalogJanitor: CatalogJanitor-master:60000 exiting
2015-02-17 05:46:15,216 DEBUG [master:master:60000] master.HMaster: Stopping service threads

出現兩個FATAL(第二、3行)，嚴重錯誤，直覺上感受跟Zookeeper有關，嘗試了多種方法後，終於找出了正確解決方案，此方案來自Stack Overflow上的polaris大神（原文網址附在最後，有興趣的能夠看一下）。shell

4個步驟解決問題：apache

一、中止HBase集羣；
oop

二、使用HBase的離線修復命令測試

hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair

三、刪除Zookeeper中已存在的HBase的舊的信息
spa

進入zookeeper客戶端，注意，要在zookeeper集羣啓動狀況下進入客戶端：日誌

./opt/zookeeper/bin/zkCli.sh

使用 ls / 查看zookeeper中的數據目錄
code

使用 rm /hbase 刪除zookeeper中的hbase數據
server

四、重啓HBase集羣，集羣恢復正常。進程

思考：

解決這個問題後，一直在反思集羣爲何會忽然出現這種狀況，後來終於搞明白。以前對集羣測試時，在主節點上部署了一個zookeeper節點(主節點以前沒有部署zookeeper)，後來爲防止zookeeper節點總數變成偶數，又把這個zookeeper節點刪掉了。多是這個緣由形成zookeeper中已存在的hbase數據有問題，因此清空zookeeper中的hbase數據，就能解決問題了。

原文網址：http://stackoverflow.com/questions/28563167/hbase-master-not-starting-correctly

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。