大數據學習之路之HBASE

Hadoop之HBASE

1、HBASE簡介

HBase是一個開源的、分佈式的,多版本的,面向列的,半結構化的NoSql數據庫,提供高性能的隨機讀寫結構化數據的能力。它能夠直接使用本地文件系統,也可使用Hadoop的HDFS文件存儲系統。不過,爲了提升數據的可靠性和系統的健壯性,而且發揮HBase處理大數據的能力,使用HDFS做爲文件存儲系統才更爲穩妥。html

HBase存儲的數據從邏輯上來看就像一張很大的表,而且它的數據列能夠根據須要動態地增長。除此以外,每一個單元(cell,由行和列所肯定的位置)中的數據又能夠具備多個版本(經過時間戳來區別)。從下圖能夠看出,HBase還具備這樣的特色:它向下提供了存儲,向上提供了運算。另外,在HBase之上還可使用Hadoop的MapReduce計算模型來並行處理大規模數據,這也是它具備強大性能的核心所在。它將數據存儲與並行計算完美地結合在一塊兒。java

圖片描述

HBase 和 HDFSweb

HDFS HBase
HDFS是適於存儲大容量文件的分佈式文件系統。 HBase是創建在HDFS之上的數據庫。
HDFS不支持快速單獨記錄查找。 HBase提供在較大的錶快速查找。
它提供了高延遲批量處理;沒有批處理概念。 它提供了數十億條記錄低延遲訪問單個行記錄(隨機存取)。
它提供的數據只能順序訪問。 HBase內部使用哈希表和提供隨機接入,而且其存儲索引,可將在HDFS文件中的數據進行快速查找。

2、HBASE表結構

HBASE表具備如下特色:shell

  • 大:一個表能夠有上億行,上百萬列
  • 面向列:面向列(族)的存儲和權限控制,列(族)獨立檢索。
  • 稀疏:對於爲空(null)的列,並不佔用存儲空間,所以,表能夠設計的很是稀疏。

HBase以表的形式存儲數據。表有行和列組成。列劃分爲若干個列族(row family)。下面是HBASE表的邏輯視圖:數據庫

圖片描述

在shell客戶端展現:apache

> scan 'member'
ROW                                               COLUMN+CELL                                       lisi                 column=address:, timestamp=1567757931802, value=sichuan                       lisi                 column=info:, timestamp=1567757982455, value=info2 
lisi                 column=info:love, timestamp=1567758039091, value=movie                         lisi                 column=school:, timestamp=1567758005941, value=xinhua                         zhangsan             column=address:city, timestamp=1567755403595, value=beijing                   zhangsan             column=info:, timestamp=1567755827530, value=info1                             zhangsan             column=info:age, timestamp=1567756662127, value=26 
zhangsan             column=info:birthday, timestamp=1567755398376, value=1993-11-20               zhangsan             column=info:country, timestamp=1567755402535, value=china                     zhangsan             column=school:, timestamp=1567757294341, value=shiyan                         2 row(s)
Took 0.0945 seconds

下面依次介紹這些結構:vim

  • Row key:用來檢索記錄的主鍵,相似key-value結構的key。訪問hbase table的行,只有三種方式:ruby

    • 經過單個row key訪問;
    • 經過row key的range;
    • 全表掃描;
  • 列族:hbase表中的每一個列,都屬於某個列族,列族屬於表結構(必須在使用表以前定義),列不屬於(插入數據的時候能夠隨時添加列),好比上面的infoaddressschool這些屬於列族,info:ageinfo:love這些屬於列。
  • Cell:row key和列以及時間戳惟一肯定的單元,用來存儲真實的數據,cell中的數據沒有類型,所有是字節碼形式存儲。
  • 時間戳:每一個cell中保存着同一份數據的多個版本,版本經過時間戳來索引。爲了不數據存在過多版本形成的的管理 (包括存貯和索引)負擔,hbase提供了兩種數據版本回收方式。一是保存數據的最後n個版本,二是保存最近一段時間內的版本(好比最近七天)。用戶能夠針對每一個列族進行設置。

3、安裝運行HBASE

wget http://apache.01link.hk/hbase/2.2.0/hbase-2.2.0-bin.tar.gz 
tar -zxvf hbase-2.2.0-bin.tar.gz 
cd hbase-2.2.0
vim conf/hbase-site.xml
<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>file:///tmp/hbase-${user.name}/hbase</value>
    </property>
</configuration>
# 單機模式運行,使用的是本次文件存儲。不依賴Hadoop
./bin/start-hbase.sh
# 查看進程
jps
9758 HMaster
# 啓動成功後能夠在 http://localhost:16010 訪問hbase的web頁面
# 中止Hbase服務
./bin/stop-hbase.sh

# 進入HBASE shell
./bin/hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.2.0, rUnknown, Tue Jun 11 04:30:30 UTC 2019
Took 0.0128 seconds                                                                                                                                                                              
hbase(main):001:0>

4、shell DDL操做

# 建表
> create 'member','member_id','address','info'
Created table member
Took 1.6592 seconds                                                                                                                                                                              
=> Hbase::Table - member

# 列出全部表
> list
TABLE                                                                                                                                                                                            
member                                                                                                                                                                                           
1 row(s)
Took 0.1501 seconds                                                                                                                                                                              
=> ["member"]

# 列出表描述
> describe 'member'
Table member is ENABLED                                                                            member                                                                                            COLUMN FAMILIES DESCRIPTION                                                                         {NAME => 'address', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NO
NE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLO
CKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                                                                       

{NAME => 'info', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE'
, TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS
_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                                                                          

{NAME => 'member_id', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => '
NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_B
LOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                                                                     

3 row(s)

QUOTAS                                                                                                                                                                                           
0 row(s)
Took 0.6478 seconds

# 刪除一個列族,alter,disable,enable
> alter 'member',{NAME=>'member_id',METHOD=>'delete'}
# 在用describe 查看錶會發現只有兩個列族了

# 刪除一個表,首先要先disable這個表
> disable 'member'
> drop 'member'

# 表是否存在
> exists 'member'

# 判斷表是否enable
> is_enabled 'member'

# 判斷表是否disable
> is_disabled 'member'

5、shell DML操做

# 插入數據
put'member','zhangsan','info:age','24'
put'member','zhangsan','info:birthday','1993-11-20'
put'member','zhangsan','info:country','china'
put'member','zhangsan','address:city','beijing'
put'member','lisi','info:birthday','1998-09-09'
put'member','lisi','info:favotite','movie'
put'member','lisi','address:city','beijing'

# 獲取一個id的全部數據
> get'member','zhangsan'
COLUMN                                            CELL                                              address:city                                     timestamp=1567754003312, value=beijing             info:age                                         timestamp=1567753903167, value=24                 info:birthday                                    timestamp=1567753950339, value=1993-11-20         info:country                                     timestamp=1567753964169, value=china               1 row(s)
Took 0.1351 seconds
# 獲取一個id,一個列族的全部數據
> get'member','zhangsan','info'
COLUMN                                            CELL                                             info:age                                         timestamp=1567753903167, value=24                 info:birthday                                    timestamp=1567753950339, value=1993-11-20         info:country                                     timestamp=1567753964169, value=china               1 row(s)
Took 0.0455 seconds
# 獲取一個id,一個列族中一個列的全部數據
> get'member','zhangsan','info:age'
COLUMN                                            CELL                                             info:age                                         timestamp=1567753903167, value=24                 1 row(s)
Took 0.0364 seconds 

# 更新一條記錄
> put'member','zhangsan','info:age','25'
> get'member','zhangsan','info:age'
COLUMN                                            CELL                                             info:age                                         timestamp=1567754315161, value=25                 1 row(s)
Took 0.0491 seconds
# 經過timestamp來獲取指定版本的數據
> get'member','zhangsan',{COLUMN=>'info:age',TIMESTAMP=>1567753903167}
COLUMN                                            CELL                                             info:age                                         timestamp=1567753903167, value=24                 1 row(s)
Took 0.0342 seconds

# 全表掃描
> scan 'member'
ROW                                COLUMN+CELL                                                                                                                                    
  lisi                              column=address:city, timestamp=1567754078391, value=beijing       lisi                              column=info:birthday, timestamp=1567754038812, value=1998-09-09   lisi                              column=info:favotite, timestamp=1567754057750, value=movie       zhangsan                          column=address:city, timestamp=1567754003312, value=beijing       zhangsan                          column=info:age, timestamp=1567754315161, value=25               zhangsan                          column=info:birthday, timestamp=1567753950339, value=1993-11-20   zhangsan                          column=info:country, timestamp=1567753964169, value=china       2 row(s)
Took 0.1000 seconds

# 刪除指定字段
> delete'member','zhangsan','info:age'
# 這個頗有意思,若是有兩個版本的數據,那麼只會刪除最新的一個版本,當再次查詢的時候結果就是上一個版本的
> get'member','zhangsan','info:age'
COLUMN                                            CELL                                             info:age                                         timestamp=1567753903167, value=24                 1 row(s)
Took 0.0454 seconds
# 再次執行delete就能把當前版本刪除
> delete'member','zhangsan','info:age'
> get'member','zhangsan','info:age'
COLUMN                                            CELL                                              0 row(s)
Took 0.0166 seconds

# 刪除整行
> deleteall'member','lisi'
Took 0.0235 seconds

# 查詢表中有多少行
> count'member'
1 row(s)
Took 0.3753 seconds                                                                                 => 1

# 給"zhangsan"這個id增長'info:age'字段,並使用counter實現遞增
> incr 'member','zhangsan','info:age'
COUNTER VALUE = 1
Took 0.0948 seconds
> get 'member','zhangsan','info:age' 
COLUMN                                            CELL                                             info:age                                         timestamp=1567755056584, value=\x00\x00\x00\x00\x00\x00\x00\x01                                                             1 row(s)
Took 0.0504 seconds
> incr 'member','zhangsan','info:age'
COUNTER VALUE = 2
Took 0.0211 seconds
> get 'member','zhangsan','info:age' 
COLUMN                                            CELL                                              info:age                                         timestamp=1567755133527, value=\x00\x00\x00\x00\x00\x00\x00\x02                                                             1 row(s)
Took 0.0479 seconds
# 獲取當前count的值
> get_counter'member','zhangsan','info:age'
COUNTER VALUE = 2
Took 0.0145 seconds

# 清空整張表
> truncate 'member'
Truncating 'member' table (it may take a while):
Disabling table...
Truncating table...
Took 2.1687 seconds

# 如何查看多個版本的數據,首先須要更新表結構,由於默認只保存一個版本數據,咱們將保存的版本數設置爲3
> alter'member',{NAME=>'info',VERSIONS=>3}
> put'member','zhangsan','info:age','26'
> scan 'member',{COLUMN=>'info:age',VERSIONS=>3}
ROW                                               COLUMN+CELL                                       zhangsan                                         column=info:age, timestamp=1567756662127, value=26  zhangsan                                         column=info:age, timestamp=1567756297089, value=25 1 row(s)
Took 0.0361 seconds 
> get 'member','zhangsan',{COLUMN=>'info',VERSIONS=>3}
COLUMN                                            CELL                                              info:                                            timestamp=1567755827530, value=info1               info:age                                         timestamp=1567756662127, value=26                 info:age                                         timestamp=1567756297089, value=25                 info:birthday                                    timestamp=1567755398376, value=1993-11-20         info:country                                     timestamp=1567755402535, value=china               1 row(s)
Took 0.0622 seconds

### 6、遇到的問題app

問題1:分佈式

運行hbase shell時報錯:

./bin/hbase shell
2019-09-06 11:03:21,079 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.2.0, rUnknown, Tue Jun 11 04:30:30 UTC 2019
Took 0.0080 seconds                                                                                                                                                                              
NotImplementedError: fstat unimplemented unsupported or native support failed to load; see http://wiki.jruby.org/Native-Libraries
  initialize at org/jruby/RubyIO.java:1013
        open at org/jruby/RubyIO.java:1154
  initialize at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/irb/input-method.rb:141
  initialize at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/irb/context.rb:70
  initialize at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/irb.rb:426
  initialize at /home/wangjun/software/hbase-2.2.0/lib/ruby/irb/hirb.rb:47
       start at /home/wangjun/software/hbase-2.2.0/bin/../bin/hirb.rb:207
      <main> at /home/wangjun/software/hbase-2.2.0/bin/../bin/hirb.rb:219

解決方案:

Unable to load native-hadoop library for your platform... using builtin-java classes where applicable這個問題只須要修改conf/hbase-env.sh,加入:

export LD_LIBRARY_PATH=${hadoop_home}/lib/native:$LD_LIBRARY_PATH

${hadoop_home}爲你的hadoop的安裝路徑。

NotImplementedError: fstat unimplemented unsupported or native support failed to load這個問題的解決方案:

sudo apt-get install jruby -y
sudo apt-get install asciidoctor -y
參考:

https://www.cnblogs.com/gaope...

https://blog.csdn.net/scutshu...

相關文章
相關標籤/搜索