Hadoop + ZK + HBase 環境搭建

Hadoop 環境搭建

參考資料: html

http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/ClusterSetup.htmljava

http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xmlnode

http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xmlweb

下載 2.4.1 bin 包, 解壓好之後按照連接上配置各個配置文件, 啓動時會遇到 "Unable to load realm info from SCDynamicStore" 的問題, 這個問題須要在 hadoop-env.sh 中加入以下配置(配置 HBase 的時候也會遇到這個問題, 使用一樣的方法在 hbase-env.sh 中加入以下配置解決)express

hadoop-env.sh(hbase-env.sh) 配置, 增長apache

export JAVA_HOME="/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home"
export HBASE_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

最後本身寫一下啓動和中止腳本緩存

hadoop-start.sh安全

 

#!/bin/bash

HADOOP_PREFIX="/Users/zhenweiliu/Work/Software/hadoop-2.4.1"
HADOOP_YARN_HOME="/Users/zhenweiliu/Work/Software/hadoop-2.4.1"
HADOOP_CONF_DIR="/Users/zhenweiliu/Work/Software/hadoop-2.4.1/etc/hadoop"
cluster_name="hadoop_cat"

  # Format a new distributed filesystem
  if [ "$1" == "format" ]; then
    $HADOOP_PREFIX/bin/hdfs namenode -format $cluster_name
  fibash

# Start the HDFS with the following command, run on the designated NameNode:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode

# Run a script to start DataNodes on all slaves:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode

# Start the YARN with the following command, run on the designated ResourceManager:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager

# Run a script to start NodeManagers on all slaves:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager

# Start a standalone WebAppProxy server. If multiple servers are used with load balancing it should be run on each of them:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh start proxyserver --config $HADOOP_CONF_DIR

# Start the MapReduce JobHistory Server with the following command, run on the designated server:
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR

 

hadoop-stop.shsession

#!/bin/bash

HADOOP_PREFIX="/Users/zhenweiliu/Work/Software/hadoop-2.4.1"
HADOOP_YARN_HOME="/Users/zhenweiliu/Work/Software/hadoop-2.4.1"
HADOOP_CONF_DIR="/Users/zhenweiliu/Work/Software/hadoop-2.4.1/etc/hadoop"
cluster_name="hadoop_cat"

# Stop the NameNode with the following command, run on the designated NameNode:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode

# Run a script to stop DataNodes on all slaves:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode

# Stop the ResourceManager with the following command, run on the designated ResourceManager:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager

# Run a script to stop NodeManagers on all slaves:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager

# Stop the WebAppProxy server. If multiple servers are used with load balancing it should be run on each of them:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config $HADOOP_CONF_DIR

# Stop the MapReduce JobHistory Server with the following command, run on the designated server:
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR

hadoop-restart.sh

#!/bin/bash
./hadoop-stop.sh
./hadoop-start.sh

 

最後是個人各項須要配置的 hadoop 配置

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:9000</value>
  </property>
  <property>
      <name>io.file.buffer.size</name>
      <value>131072</value>
  </property>
</configuration>

 

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <!-- NameNode Configurations -->
    <property>
        <name>dfs.datanode.max.xcievers</name>
        <value>4096</value>
    </property>
    <property>
        <name>dfs.datanode.datadir</name>
        <value>file:///Users/zhenweiliu/Work/Software/hadoop-2.4.1/data</value>
    </property>
    <property>
        <name>dfs.blocksize</name>
        <value>67108864</value>
    </property>
    <property>
        <name>dfs.namenode.handler.count</name>
        <value>100</value>
    </property>

    <!-- Datanode Configurations -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///Users/zhenweiliu/Work/Software/hadoop-2.4.1/name</value>
    </property>
</configuration>

 

 

yarn-site.xml

?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

    <!-- ResourceManager and NodeManager Configurations -->
    <property>
        <name>yarn.acl.enable</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.acl.enable</name>
        <value>false</value>
    </property>

    <!-- ResourceManager Configurations -->
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:9001</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:9002</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>localhost:9003</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>localhost:9004</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>localhost:9005</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>CapacityScheduler</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>8192</value>
    </property>

    <!-- NodeManager Configurations -->
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>8192</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>${hadoop.tmp.dir}/nm-local-dir</value>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>${yarn.log.dir}/userlogs</value>
    </property>
    <property>
        <name>yarn.nodemanager.log.retain-seconds</name>
        <value>10800</value>
    </property>
    <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/logs</value>
    </property>
    <property>
        <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
        <value>logs</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <!-- History Server Configurations -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>-1</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-check-interval-seconds</name>
        <value>-1</value>
    </property>
</configuration>

 

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <!-- Configurations for MapReduce Applications -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1536</value>
    </property>
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx1024M</value>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>3072</value>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx2560M</value>
    </property>
    <property>
        <name>mapreduce.task.io.sort.mb</name>
        <value>512</value>
    </property>
    <property>
        <name>mapreduce.task.io.sort.factor</name>
        <value>100</value>
    </property>
    <property>
        <name>mapreduce.reduce.shuffle.parallelcopies</name>
        <value>50</value>
    </property>

    <!-- Configurations for MapReduce JobHistory Server -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>localhost:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>localhost:19888</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>file:////Users/zhenweiliu/Work/Software/hadoop-2.4.1/mr-history/tmp</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>file:////Users/zhenweiliu/Work/Software/hadoop-2.4.1/mr-history/done</value>
    </property>

</configuration>

 

 

ZK僞分佈式配置

複製 3 個 ZK 實例文件夾, 分別爲 

zookeeper-3.4.5-1 

zookeeper-3.4.5-2 

zookeeper-3.4.5-3

每一個 ZK 文件下的 zoo.cfg 配置以下

zookeeper-3.4.5-1/zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-1/data
dataLogDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-1/logs
clientPort=2181
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890

zookeeper-3.4.5-2/zoo.cfg 

 

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-2/data
dataLogDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-2/logs
clientPort=2182
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890

 

zookeeper-3.4.5-3/zoo.cfg 

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-3/data
dataLogDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-3/logs
clientPort=2183
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890

而後在每一個實例的 data 文件夾下建立一個文件 myid, 文件內分別寫入 1, 2, 3 三個字符, 例如

zookeeper-3.4.5-1/data/myid

1

最後作一個批量啓動, 中止腳本

startZkCluster.sh

#!/bin/bash

BASE_DIR="/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5"
BIN_EXEC="bin/zkServer.sh start"

for no in $(seq 1 3)
do
    $BASE_DIR"-"$no/$BIN_EXEC
done

stopZkCluster.sh

#!/bin/bash

BASE_DIR="/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5"
BIN_EXEC="bin/zkServer.sh stop"

for no in $(seq 1 3)
do
    $BASE_DIR"-"$no/$BIN_EXEC
done

restartZkCluster.sh

#!/bin/bash

./stopZkCluster.sh
./startZkCluster.sh

 

 HBase

參考資料:

http://abloz.com/hbase/book.html

實際上 HBase 內置了 ZK, 若是不顯式指定 ZK 的配置, 他會使用內置的 ZK, 這個 ZK 會隨着 HBase 啓動而啓動

hbase-env.sh 中顯式啓動內置 ZK

export HBASE_MANAGES_ZK=true

hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-->
<configuration>
      <!--
    <property>
        <name>hbase.rootdir</name>
        <value>file:///Users/zhenweiliu/Work/Software/hbase-0.98.3-hadoop2/hbase</value>
    </property>
    -->
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
    <description>The directory shared by RegionServers.</description>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count.</description>
  </property>
  <property>
      <name>hbase.zookeeper.quorum</name>
      <value>localhost</value>
  </property>
  <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/Users/zhenweiliu/Work/Software/hbase-0.98.3-hadoop2/zookeeper</value>
  </property>
    <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2222</value>
        <description>Property from ZooKeeper's config zoo.cfg.
        The port at which the clients will connect.
        </description>
    </property>
  <property>
      <name>hbase.cluster.distributed</name>
      <value>true</value>
  </property>
</configuration>

最後啓動 hbase

./start-hbase.sh

 

系統參數

另外, hbase 須要大得 processes 數以及 open files 數, 因此須要修改 ulimit, 個人 mac 下增長 /etc/launchd.conf 文件, 文件內容

limit maxfiles 16384 16384
limit maxproc 2048 2048

在 /etc/profile 添加

ulimit -n 16384
ulimit -u 2048

 若是 hbase 出現

2014-07-14 23:00:48,342 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
    at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:695)

1. 查看 hbase master log, 發現

2014-07-14 23:31:51,270 INFO  [master:192.168.126.8:60000] util.FSUtils: Waiting for dfs to exit safe mode...

退出 hadoop 安全模式

bin/hdfs dfsadmin -safemode leave

master log 報錯

2014-07-14 23:32:22,238 WARN  [master:192.168.126.8:60000] hdfs.DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1761102757-192.168.126.8-1404787541755:blk_1073741825_1001 file=/hbase/hbase.version

檢查 hdfs

./hdfs fsck / -files -blocks
14/07/14 23:36:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://localhost:50070
FSCK started by zhenweiliu (auth:SIMPLE) from /127.0.0.1 for path / at Mon Jul 14 23:36:33 CST 2014
.
/hbase/WALs/192.168.126.8,60020,1404917152583-splitting/192.168.126.8%2C60020%2C1404917152583.1404917158940: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741842

/hbase/WALs/192.168.126.8,60020,1404917152583-splitting/192.168.126.8%2C60020%2C1404917152583.1404917158940: MISSING 1 blocks of total size 17 B..
/hbase/WALs/192.168.126.8,60020,1404917152583-splitting/192.168.126.8%2C60020%2C1404917152583.1404917167188.meta: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741843

/hbase/WALs/192.168.126.8,60020,1404917152583-splitting/192.168.126.8%2C60020%2C1404917152583.1404917167188.meta: MISSING 1 blocks of total size 401 B..
/hbase/data/hbase/meta/.tabledesc/.tableinfo.0000000001: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741829

/hbase/data/hbase/meta/.tabledesc/.tableinfo.0000000001: MISSING 1 blocks of total size 372 B..
/hbase/data/hbase/meta/1588230740/.regioninfo: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741827

/hbase/data/hbase/meta/1588230740/.regioninfo: MISSING 1 blocks of total size 30 B..
/hbase/data/hbase/meta/1588230740/info/e63bf8b1e649450895c36f28fb88da98: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741836

/hbase/data/hbase/meta/1588230740/info/e63bf8b1e649450895c36f28fb88da98: MISSING 1 blocks of total size 1340 B..
/hbase/data/hbase/meta/1588230740/oldWALs/hlog.1404787632739: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741828

/hbase/data/hbase/meta/1588230740/oldWALs/hlog.1404787632739: MISSING 1 blocks of total size 17 B..
/hbase/data/hbase/namespace/.tabledesc/.tableinfo.0000000001: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741832

/hbase/data/hbase/namespace/.tabledesc/.tableinfo.0000000001: MISSING 1 blocks of total size 286 B..
/hbase/data/hbase/namespace/a3fbb84530e05cab6319257d03975e6b/.regioninfo: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741833

/hbase/data/hbase/namespace/a3fbb84530e05cab6319257d03975e6b/.regioninfo: MISSING 1 blocks of total size 40 B..
/hbase/data/hbase/namespace/a3fbb84530e05cab6319257d03975e6b/info/770eb1a6dc76458fb97e9213edb80b72: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741837

/hbase/data/hbase/namespace/a3fbb84530e05cab6319257d03975e6b/info/770eb1a6dc76458fb97e9213edb80b72: MISSING 1 blocks of total size 1045 B..
/hbase/hbase.id: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741826

/hbase/hbase.id: MISSING 1 blocks of total size 42 B..
/hbase/hbase.version: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741825

/hbase/hbase.version: MISSING 1 blocks of total size 7 B.Status: CORRUPT
 Total size:    3597 B
 Total dirs:    21
 Total files:    11
 Total symlinks:        0
 Total blocks (validated):    11 (avg. block size 327 B)
  ********************************
  CORRUPT FILES:    11
  MISSING BLOCKS:    11
  MISSING SIZE:        3597 B
  CORRUPT BLOCKS:     11
  ********************************
 Minimally replicated blocks:    0 (0.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    0.0
 Corrupt blocks:        11
 Missing replicas:        0
 Number of data-nodes:        1
 Number of racks:        1
FSCK ended at Mon Jul 14 23:36:33 CST 2014 in 15 milliseconds


The filesystem under path '/' is CORRUPT

執行刪除

./hdfs fsck -delete
14/07/14 23:41:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://localhost:50070
FSCK started by zhenweiliu (auth:SIMPLE) from /127.0.0.1 for path / at Mon Jul 14 23:41:46 CST 2014
Status: HEALTHY
 Total size:    0 B
 Total dirs:    21
 Total files:    0
 Total symlinks:        0
 Total blocks (validated):    0
 Minimally replicated blocks:    0
 Over-replicated blocks:    0
 Under-replicated blocks:    0
 Mis-replicated blocks:        0
 Default replication factor:    3
 Average block replication:    0.0
 Corrupt blocks:        0
 Missing replicas:        0
 Number of data-nodes:        1
 Number of racks:        1
FSCK ended at Mon Jul 14 23:41:46 CST 2014 in 4 milliseconds


The filesystem under path '/' is HEALTHY

 

這時發現 hbase 掛了, 查看 master log

2014-07-14 23:48:53,788 FATAL [master:192.168.126.8:60000] master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.util.FileSystemVersionException: HBase file layout needs to be upgraded.  You have version null and I want version 8.  Is your hbase.rootdir valid?  If so, you may need to run 'hbase hbck -fixVersionFile'.
    at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:602)
    at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:456)
    at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:147)
    at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:128)
    at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:802)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:615)
    at java.lang.Thread.run(Thread.java:695)

 

重建一下 hdfs/hbase 文件

bin/hadoop fs -rm -r /hbase

 

hbase master 報錯

2014-07-14 23:56:33,999 INFO  [master:192.168.126.8:60000] catalog.CatalogTracker: Failed verification of hbase:meta,,1 at address=192.168.126.8,60020,1405352769509, exception=org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is not online on 192.168.126.8,60020,1405353371628
    at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2683)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4117)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:3494)
    at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20036)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
    at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
    at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
    at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
    at java.lang.Thread.run(Thread.java:695)

 

重建 region sever 節點

bin/hbase zkcli
rmr /hbase/meta-region-server

 

再次重啓 hbase, 解決

 

HBase 重要參數

這些參數在 hbase-site.xml 裏配置

1. zookeeper.session.timeout

這個默認值是3分鐘。這意味着一旦一個server宕掉了,Master至少須要3分鐘才能察覺到宕機,開始恢復。你可能但願將這個超時調短,這樣Master就能更快的察覺到了。在你調這個值以前,你須要確認你的JVM的GC參數,不然一個長時間的GC操做就可能致使超時。(當一個RegionServer在運行一個長時間的GC的時候,你可能想要重啓並恢復它).

要想改變這個配置,能夠編輯 hbase-site.xml, 將配置部署到所有集羣,而後重啓。

咱們之因此把這個值調的很高,是由於咱們不想一天到晚在論壇裏回答新手的問題。「爲何我在執行一個大規模數據導入的時候Region Server死掉啦」,一般這樣的問題是由於長時間的GC操做引發的,他們的JVM沒有調優。咱們是這樣想的,若是一我的對HBase不很熟悉,不能指望他知道全部,打擊他的自信心。等到他逐漸熟悉了,他就能夠本身調這個參數了。

2. hbase.regionserver.handler.count

這個設置決定了處理用戶請求的線程數量。默認是10,這個值設的比較小,主要是爲了預防用戶用一個比較大的寫緩衝,而後還有不少客戶端併發,這樣region servers會垮掉。有經驗的作法是,當請求內容很大(上MB,如大puts, 使用緩存的scans)的時候,把這個值放低。請求內容較小的時候(gets, 小puts, ICVs, deletes),把這個值放大。

當客戶端的請求內容很小的時候,把這個值設置的和最大客戶端數量同樣是很安全的。一個典型的例子就是一個給網站服務的集羣,put操做通常不會緩衝,絕大多數的操做是get操做。

把這個值放大的危險之處在於,把全部的Put操做緩衝意味着對內存有很大的壓力,甚至會致使OutOfMemory.一個運行在內存不足的機器的RegionServer會頻繁的觸發GC操做,漸漸就能感覺到停頓。(由於全部請求內容所佔用的內存無論GC執行幾遍也是不能回收的)。一段時間後,集羣也會受到影響,由於全部的指向這個region的請求都會變慢。這樣就會拖累集羣,加重了這個問題。

你可能會對handler太多或太少有感受,能夠經過 Section 12.2.2.1, 「啓用 RPC級 日誌」 ,在單個RegionServer啓動log並查看log末尾 (請求隊列消耗內存)。

相關文章
相關標籤/搜索