HBase Region合併分析

1.概述

HBase中表的基本單位是Region,平常在調用HBase API操做一個表時,交互的數據也會以Region的形式進行呈現。一個表能夠有若干個Region,今天筆者就來和你們分享一下Region合併的一些問題和解決方法。java

2.內容

在分析合併Region以前,咱們先來了解一下Region的體系結構,以下圖所示:apache

從圖中可知,可以總結如下知識點:ruby

  • HRegion:一個Region能夠包含多個Store;
  • Store:每一個Store包含一個Memstore和若干個StoreFile;
  • StoreFile:表數據真實存儲的地方,HFile是表數據在HDFS上的文件格式。

若是要查看HFile文件,HBase有提供命令,命令以下:bash

hbase hfile -p -f /hbase/data/default/ip_login/d0d7d881bb802592c09d305e47ae70a5/_d/7ec738167e9f4d4386316e5e702c8d3d

執行輸出結果,以下圖所示:負載均衡

2.1 爲何須要合併Region

那爲何須要合併Region呢?這個須要從Region的Split來講。當一個Region被不斷的寫數據,達到Region的Split的閥值時(由屬性hbase.hregion.max.filesize來決定,默認是10GB),該Region就會被Split成2個新的Region。隨着業務數據量的不斷增長,Region不斷的執行Split,那麼Region的個數也會愈來愈多。less

一個業務表的Region越多,在進行讀寫操做時,或是對該表執行Compaction操做時,此時集羣的壓力是很大的。這裏筆者作過一個線上統計,在一個業務表的Region個數達到9000+時,每次對該表進行Compaction操做時,集羣的負載便會加劇。而間接的也會影響應用程序的讀寫,一個表的Region過大,勢必整個集羣的Region個數也會增長,負載均衡後,每一個RegionServer承擔的Region個數也會增長。ide

所以,這種狀況是頗有必要的進行Region合併的。好比,當前Region進行Split的閥值設置爲30GB,那麼咱們能夠對小於等於10GB的Region進行一次合併,減小每一個業務表的Region,從而下降整個集羣的Region,減緩每一個RegionServer上的Region壓力。oop

2.2 如何進行Region合併

那麼咱們如何進行Region合併呢?HBase有提供一個合併Region的命令,具體操做以下:性能

# 合併相鄰的兩個Region
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME'
# 強制合併兩個Region
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true

可是,這種方式會有一個問題,就是隻能一次合併2個Region,若是這裏有幾千個Region須要合併,這種方式是不可取的。學習

2.2.1 批量合併

這裏有一種批量合併的方式,就是經過編寫腳本(merge_small_regions.rb)來實現,實現代碼以下:

# Test Mode:
#
# hbase org.jruby.Main merge_empty_regions.rb namespace.tablename <skip_size> <batch_regions> <merge?>
#
# Non Test - ie actually do the merge:
#
# hbase org.jruby.Main merge_empty_regions.rb namespace.tablename <skip_size> <batch_regions> merge
#
# Note: Please replace namespace.tablename with your namespace and table, eg NS1.MyTable. This value is case sensitive.

require 'digest'
require 'java'
java_import org.apache.hadoop.hbase.HBaseConfiguration
java_import org.apache.hadoop.hbase.client.HBaseAdmin
java_import org.apache.hadoop.hbase.TableName
java_import org.apache.hadoop.hbase.HRegionInfo;
java_import org.apache.hadoop.hbase.client.Connection
java_import org.apache.hadoop.hbase.client.ConnectionFactory
java_import org.apache.hadoop.hbase.client.Table
java_import org.apache.hadoop.hbase.util.Bytes

def list_bigger_regions(admin, table, low_size)
  cluster_status = admin.getClusterStatus()
  master = cluster_status.getMaster()
  biggers = []
  cluster_status.getServers.each do |s|
    cluster_status.getLoad(s).getRegionsLoad.each do |r|
      # getRegionsLoad returns an array of arrays, where each array
      # is 2 elements

      # Filter out any regions that don't match the requested
      # tablename
      next unless r[1].get_name_as_string =~ /#{table}\,/
      if r[1].getStorefileSizeMB() > low_size
        if r[1].get_name_as_string =~ /\.([^\.]+)\.$/
          biggers.push $1
        else
          raise "Failed to get the encoded name for #{r[1].get_name_as_string}"
        end
      end
    end
  end
  biggers
end

# Handle command line parameters
table_name = ARGV[0]
low_size = 1024
if ARGV[1].to_i >= low_size
  low_size=ARGV[1].to_i
end

limit_batch = 1000
if ARGV[2].to_i <= limit_batch
  limit_batch = ARGV[2].to_i
end
do_merge = false
if ARGV[3] == 'merge'
  do_merge = true
end

config = HBaseConfiguration.create();
connection = ConnectionFactory.createConnection(config);
admin = HBaseAdmin.new(connection);

bigger_regions = list_bigger_regions(admin, table_name, low_size)
regions = admin.getTableRegions(Bytes.toBytes(table_name));

puts "Total Table Regions: #{regions.length}"
puts "Total bigger regions: #{bigger_regions.length}"

filtered_regions = regions.reject do |r|
  bigger_regions.include?(r.get_encoded_name)
end

puts "Total regions to consider for Merge: #{filtered_regions.length}"

filtered_regions_limit = filtered_regions

if filtered_regions.length < 2
  puts "There are not enough regions to merge"
  filtered_regions_limit = filtered_regions
end

if filtered_regions.length > limit_batch
   filtered_regions_limit = filtered_regions[0,limit_batch]
   puts "But we will merge : #{filtered_regions_limit.length} regions because limit in parameter!"
end


r1, r2 = nil
filtered_regions_limit.each do |r|
  if r1.nil?
    r1 = r
    next
  end
  if r2.nil?
    r2 = r
  end
  # Skip any region that is a split region
  if r1.is_split()
    r1 = r2
    r2 = nil
  puts "Skip #{r1.get_encoded_name} bcause it in spliting!"
    next
  end
  if r2.is_split()
    r2 = nil
 puts "Skip #{r2.get_encoded_name} bcause it in spliting!"
    next
  end
  if HRegionInfo.are_adjacent(r1, r2)
    # only merge regions that are adjacent
    puts "#{r1.get_encoded_name} is adjacent to #{r2.get_encoded_name}"
    if do_merge
      admin.mergeRegions(r1.getEncodedNameAsBytes, r2.getEncodedNameAsBytes, false)
      puts "Successfully Merged #{r1.get_encoded_name} with #{r2.get_encoded_name}"
      sleep 2
    end
    r1, r2 = nil
  else
    puts "Regions are not adjacent, so drop the first one and with the #{r2.get_encoded_name} to  iterate again"
    r1 = r2
    r2 = nil
  end
end
admin.close

該腳本默認是合併1GB之內的Region,個數爲1000個。若是咱們要合併小於10GB,個數在4000之內,腳本(merging-region.sh)以下:

#! /bin/bash

num=$1

echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : RegionServer Start Merging..."
if [ ! -n "$num" ]; then
    echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Default Merging 10 Times."
    num=10
elif [[ $num == *[!0-9]* ]]; then
    echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Input [$num] Times Must Be Number."
    exit 1
else
    echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : User-Defined Merging [$num] Times."
fi

for (( i=1; i<=$num; i++ ))
do
    echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Merging [$i] Times,Total [$num] Times."
    hbase org.jruby.Main merge_small_regions.rb namespace.tablename 10240  4000 merge
    sleep 5
done

在merging-region.sh腳本中,作了參數控制,能夠循環來執行批量合併腳本。可能在實際操做過程當中,批量執行一次Region合併,合併後的結果Region仍是有不少(可能此時又有新的Region生成),這是咱們可使用merging-region.sh這個腳本屢次執行批量合併Region操做,具體操做命令以下:

# 默認循環10次,例如本次循環執行5次
sh merging-region.sh 5

2.3 若是在合併Region的過程當中出現永久RIT怎麼辦

在合併Region的過程當中出現永久RIT怎麼辦?筆者在生產環境中就遇到過這種狀況,在批量合併Region的過程當中,出現了永久MERGING_NEW的狀況,雖然這種狀況不會影響現有集羣的正常的服務能力,可是若是集羣有某個節點發生重啓,那麼可能此時該RegionServer上的Region是無法均衡的。由於在RIT狀態時,HBase是不會執行Region負載均衡的,即便手動執行balancer命令也是無效的。

若是不解決這種RIT狀況,那麼後續有HBase節點相繼重啓,這樣會致使整個集羣的Region驗證不均衡,這是很致命的,對集羣的性能將會影響很大。通過查詢HBase JIRA單,發現這種MERGING_NEW永久RIT的狀況是觸發了HBASE-17682的BUG,須要打上該Patch來修復這個BUG,其實就是HBase源代碼在判斷業務邏輯時,沒有對MERGING_NEW這種狀態進行判斷,直接進入到else流程中了。源代碼以下:

for (RegionState state : regionsInTransition.values()) {
        HRegionInfo hri = state.getRegion();
        if (assignedRegions.contains(hri)) {
          // Region is open on this region server, but in transition.
          // This region must be moving away from this server, or splitting/merging.
          // SSH will handle it, either skip assigning, or re-assign.
          LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn);
        } else if (sn.equals(state.getServerName())) {
          // Region is in transition on this region server, and this
          // region is not open on this server. So the region must be
          // moving to this server from another one (i.e. opening or
          // pending open on this server, was open on another one.
          // Offline state is also kind of pending open if the region is in
          // transition. The region could be in failed_close state too if we have
          // tried several times to open it while this region server is not reachable)
          if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) {
            LOG.info("Found region in " + state +
              " to be reassigned by ServerCrashProcedure for " + sn);
            rits.add(hri);
          } else if(state.isSplittingNew()) {
            regionsToCleanIfNoMetaEntry.add(state.getRegion());
          } else {
            LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state);
          }
        }
      }

修復以後的代碼以下:

for (RegionState state : regionsInTransition.values()) {
        HRegionInfo hri = state.getRegion();
        if (assignedRegions.contains(hri)) {
          // Region is open on this region server, but in transition.
          // This region must be moving away from this server, or splitting/merging.
          // SSH will handle it, either skip assigning, or re-assign.
          LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn);
        } else if (sn.equals(state.getServerName())) {
          // Region is in transition on this region server, and this
          // region is not open on this server. So the region must be
          // moving to this server from another one (i.e. opening or
          // pending open on this server, was open on another one.
          // Offline state is also kind of pending open if the region is in
          // transition. The region could be in failed_close state too if we have
          // tried several times to open it while this region server is not reachable)
          if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) {
            LOG.info("Found region in " + state +
              " to be reassigned by ServerCrashProcedure for " + sn);
            rits.add(hri);
          } else if(state.isSplittingNew()) {
            regionsToCleanIfNoMetaEntry.add(state.getRegion());
          } else if (isOneOfStates(state, State.SPLITTING_NEW, State.MERGING_NEW)) {
             regionsToCleanIfNoMetaEntry.add(state.getRegion());
           }else {
            LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state);
          }
        }
      }

可是,這裏有一個問題,目前該JIRA單只是說了須要去修復BUG,打Patch。可是,實際生產狀況下,面對這種RIT狀況,是不可能長時間中止集羣,影響應用程序讀寫的。那麼,有沒有臨時的解決辦法,先臨時解決當前的MERGING_NEW這種永久RIT,以後在進行HBase版本升級操做。

辦法是有的,在分析了MERGE合併的流程以後,發現HBase在執行Region合併時,會先生成一個初始狀態的MERGING_NEW。整個Region合併流程以下:

從流程圖中能夠看到,MERGING_NEW是一個初始化狀態,在Master的內存中,而處於Backup狀態的Master內存中是沒有這個新Region的MERGING_NEW狀態的,那麼能夠經過對HBase的Master進行一個主備切換,來臨時消除這個永久RIT狀態。而HBase是一個高可用的集羣,進行主備切換時對用戶應用來講是無感操做。所以,面對MERGING_NEW狀態的永久RIT可使用對HBase進行主備切換的方式來作一個臨時處理方案。以後,咱們在對HBase進行修復BUG,打Patch進行版本升級。

3.總結

HBase的RIT問題,是一個比較常見的問題,在遇到這種問題時,能夠先冷靜的分析緣由,例如查看Master的日誌、仔細閱讀HBase Web頁面RIT異常的描述、使用hbck命令查看Region、使用fsck查看HDFS的block等。分析出具體的緣由後,咱們在對症下藥,作到大膽猜測,當心求證。

4.結束語

這篇博客就和你們分享到這裏,若是你們在研究學習的過程中有什麼問題,能夠加羣進行討論或發送郵件給我,我會盡我所能爲您解答,與君共勉!

另外,博主出書了《Hadoop大數據挖掘從入門到進階實戰》,喜歡的朋友或同窗, 能夠在公告欄那裏點擊購買連接購買博主的書進行學習,在此感謝你們的支持。

相關文章
相關標籤/搜索