HBase是一個開源的、分佈式的,多版本的,面向列的,半結構化的NoSql數據庫,提供高性能的隨機讀寫結構化數據的能力。它能夠直接使用本地文件系統,也可使用Hadoop的HDFS文件存儲系統。不過,爲了提升數據的可靠性和系統的健壯性,而且發揮HBase處理大數據的能力,使用HDFS做爲文件存儲系統才更爲穩妥。html
HBase存儲的數據從邏輯上來看就像一張很大的表,而且它的數據列能夠根據須要動態地增長。除此以外,每一個單元(cell,由行和列所肯定的位置)中的數據又能夠具備多個版本(經過時間戳來區別)。從下圖能夠看出,HBase還具備這樣的特色:它向下提供了存儲,向上提供了運算。另外,在HBase之上還可使用Hadoop的MapReduce計算模型來並行處理大規模數據,這也是它具備強大性能的核心所在。它將數據存儲與並行計算完美地結合在一塊兒。java
HBase 和 HDFSweb
HDFS | HBase |
---|---|
HDFS是適於存儲大容量文件的分佈式文件系統。 | HBase是創建在HDFS之上的數據庫。 |
HDFS不支持快速單獨記錄查找。 | HBase提供在較大的錶快速查找。 |
它提供了高延遲批量處理;沒有批處理概念。 | 它提供了數十億條記錄低延遲訪問單個行記錄(隨機存取)。 |
它提供的數據只能順序訪問。 | HBase內部使用哈希表和提供隨機接入,而且其存儲索引,可將在HDFS文件中的數據進行快速查找。 |
HBASE表具備如下特色:shell
HBase以表的形式存儲數據。表有行和列組成。列劃分爲若干個列族(row family)。下面是HBASE表的邏輯視圖:數據庫
在shell客戶端展現:apache
> scan 'member' ROW COLUMN+CELL lisi column=address:, timestamp=1567757931802, value=sichuan lisi column=info:, timestamp=1567757982455, value=info2 lisi column=info:love, timestamp=1567758039091, value=movie lisi column=school:, timestamp=1567758005941, value=xinhua zhangsan column=address:city, timestamp=1567755403595, value=beijing zhangsan column=info:, timestamp=1567755827530, value=info1 zhangsan column=info:age, timestamp=1567756662127, value=26 zhangsan column=info:birthday, timestamp=1567755398376, value=1993-11-20 zhangsan column=info:country, timestamp=1567755402535, value=china zhangsan column=school:, timestamp=1567757294341, value=shiyan 2 row(s) Took 0.0945 seconds
下面依次介紹這些結構:vim
Row key:用來檢索記錄的主鍵,相似key-value結構的key。訪問hbase table的行,只有三種方式:ruby
wget http://apache.01link.hk/hbase/2.2.0/hbase-2.2.0-bin.tar.gz tar -zxvf hbase-2.2.0-bin.tar.gz cd hbase-2.2.0 vim conf/hbase-site.xml <configuration> <property> <name>hbase.rootdir</name> <value>file:///tmp/hbase-${user.name}/hbase</value> </property> </configuration> # 單機模式運行,使用的是本次文件存儲。不依賴Hadoop ./bin/start-hbase.sh # 查看進程 jps 9758 HMaster # 啓動成功後能夠在 http://localhost:16010 訪問hbase的web頁面 # 中止Hbase服務 ./bin/stop-hbase.sh # 進入HBASE shell ./bin/hbase shell HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell Version 2.2.0, rUnknown, Tue Jun 11 04:30:30 UTC 2019 Took 0.0128 seconds hbase(main):001:0>
# 建表 > create 'member','member_id','address','info' Created table member Took 1.6592 seconds => Hbase::Table - member # 列出全部表 > list TABLE member 1 row(s) Took 0.1501 seconds => ["member"] # 列出表描述 > describe 'member' Table member is ENABLED member COLUMN FAMILIES DESCRIPTION {NAME => 'address', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NO NE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLO CKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} {NAME => 'info', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE' , TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS _ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} {NAME => 'member_id', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => ' NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_B LOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 3 row(s) QUOTAS 0 row(s) Took 0.6478 seconds # 刪除一個列族,alter,disable,enable > alter 'member',{NAME=>'member_id',METHOD=>'delete'} # 在用describe 查看錶會發現只有兩個列族了 # 刪除一個表,首先要先disable這個表 > disable 'member' > drop 'member' # 表是否存在 > exists 'member' # 判斷表是否enable > is_enabled 'member' # 判斷表是否disable > is_disabled 'member'
# 插入數據 put'member','zhangsan','info:age','24' put'member','zhangsan','info:birthday','1993-11-20' put'member','zhangsan','info:country','china' put'member','zhangsan','address:city','beijing' put'member','lisi','info:birthday','1998-09-09' put'member','lisi','info:favotite','movie' put'member','lisi','address:city','beijing' # 獲取一個id的全部數據 > get'member','zhangsan' COLUMN CELL address:city timestamp=1567754003312, value=beijing info:age timestamp=1567753903167, value=24 info:birthday timestamp=1567753950339, value=1993-11-20 info:country timestamp=1567753964169, value=china 1 row(s) Took 0.1351 seconds # 獲取一個id,一個列族的全部數據 > get'member','zhangsan','info' COLUMN CELL info:age timestamp=1567753903167, value=24 info:birthday timestamp=1567753950339, value=1993-11-20 info:country timestamp=1567753964169, value=china 1 row(s) Took 0.0455 seconds # 獲取一個id,一個列族中一個列的全部數據 > get'member','zhangsan','info:age' COLUMN CELL info:age timestamp=1567753903167, value=24 1 row(s) Took 0.0364 seconds # 更新一條記錄 > put'member','zhangsan','info:age','25' > get'member','zhangsan','info:age' COLUMN CELL info:age timestamp=1567754315161, value=25 1 row(s) Took 0.0491 seconds # 經過timestamp來獲取指定版本的數據 > get'member','zhangsan',{COLUMN=>'info:age',TIMESTAMP=>1567753903167} COLUMN CELL info:age timestamp=1567753903167, value=24 1 row(s) Took 0.0342 seconds # 全表掃描 > scan 'member' ROW COLUMN+CELL lisi column=address:city, timestamp=1567754078391, value=beijing lisi column=info:birthday, timestamp=1567754038812, value=1998-09-09 lisi column=info:favotite, timestamp=1567754057750, value=movie zhangsan column=address:city, timestamp=1567754003312, value=beijing zhangsan column=info:age, timestamp=1567754315161, value=25 zhangsan column=info:birthday, timestamp=1567753950339, value=1993-11-20 zhangsan column=info:country, timestamp=1567753964169, value=china 2 row(s) Took 0.1000 seconds # 刪除指定字段 > delete'member','zhangsan','info:age' # 這個頗有意思,若是有兩個版本的數據,那麼只會刪除最新的一個版本,當再次查詢的時候結果就是上一個版本的 > get'member','zhangsan','info:age' COLUMN CELL info:age timestamp=1567753903167, value=24 1 row(s) Took 0.0454 seconds # 再次執行delete就能把當前版本刪除 > delete'member','zhangsan','info:age' > get'member','zhangsan','info:age' COLUMN CELL 0 row(s) Took 0.0166 seconds # 刪除整行 > deleteall'member','lisi' Took 0.0235 seconds # 查詢表中有多少行 > count'member' 1 row(s) Took 0.3753 seconds => 1 # 給"zhangsan"這個id增長'info:age'字段,並使用counter實現遞增 > incr 'member','zhangsan','info:age' COUNTER VALUE = 1 Took 0.0948 seconds > get 'member','zhangsan','info:age' COLUMN CELL info:age timestamp=1567755056584, value=\x00\x00\x00\x00\x00\x00\x00\x01 1 row(s) Took 0.0504 seconds > incr 'member','zhangsan','info:age' COUNTER VALUE = 2 Took 0.0211 seconds > get 'member','zhangsan','info:age' COLUMN CELL info:age timestamp=1567755133527, value=\x00\x00\x00\x00\x00\x00\x00\x02 1 row(s) Took 0.0479 seconds # 獲取當前count的值 > get_counter'member','zhangsan','info:age' COUNTER VALUE = 2 Took 0.0145 seconds # 清空整張表 > truncate 'member' Truncating 'member' table (it may take a while): Disabling table... Truncating table... Took 2.1687 seconds # 如何查看多個版本的數據,首先須要更新表結構,由於默認只保存一個版本數據,咱們將保存的版本數設置爲3 > alter'member',{NAME=>'info',VERSIONS=>3} > put'member','zhangsan','info:age','26' > scan 'member',{COLUMN=>'info:age',VERSIONS=>3} ROW COLUMN+CELL zhangsan column=info:age, timestamp=1567756662127, value=26 zhangsan column=info:age, timestamp=1567756297089, value=25 1 row(s) Took 0.0361 seconds > get 'member','zhangsan',{COLUMN=>'info',VERSIONS=>3} COLUMN CELL info: timestamp=1567755827530, value=info1 info:age timestamp=1567756662127, value=26 info:age timestamp=1567756297089, value=25 info:birthday timestamp=1567755398376, value=1993-11-20 info:country timestamp=1567755402535, value=china 1 row(s) Took 0.0622 seconds
### 6、遇到的問題app
問題1:分佈式
運行hbase shell時報錯:
./bin/hbase shell 2019-09-06 11:03:21,079 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell Version 2.2.0, rUnknown, Tue Jun 11 04:30:30 UTC 2019 Took 0.0080 seconds NotImplementedError: fstat unimplemented unsupported or native support failed to load; see http://wiki.jruby.org/Native-Libraries initialize at org/jruby/RubyIO.java:1013 open at org/jruby/RubyIO.java:1154 initialize at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/irb/input-method.rb:141 initialize at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/irb/context.rb:70 initialize at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/irb.rb:426 initialize at /home/wangjun/software/hbase-2.2.0/lib/ruby/irb/hirb.rb:47 start at /home/wangjun/software/hbase-2.2.0/bin/../bin/hirb.rb:207 <main> at /home/wangjun/software/hbase-2.2.0/bin/../bin/hirb.rb:219
解決方案:
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
這個問題只須要修改conf/hbase-env.sh
,加入:
export LD_LIBRARY_PATH=${hadoop_home}/lib/native:$LD_LIBRARY_PATH
${hadoop_home}爲你的hadoop的安裝路徑。
NotImplementedError: fstat unimplemented unsupported or native support failed to load
這個問題的解決方案:
sudo apt-get install jruby -y sudo apt-get install asciidoctor -y
參考: