Hbase scan 查詢命令大全,前綴,模糊,正則

Hbase scan 查詢例子數據 https://java-er.com/blog/hbase-scan-all-command/
stu 學生
列族 base 存儲學生姓名,身高基本信息
列族 score 存儲成績
c1_s1 c1 班級 s1 學生編號java

create 'stu','base','score'
put 'stu','c1_s1','base:name','jack'
put 'stu','c1_s2','base:name','jack2'
put 'stu','c1_s3','base:name','jack3'
put 'stu','c1_s4','base:name','jack4'
put 'stu','c2_s1','base:name','tom1'
put 'stu','c2_s2','base:name','tom2'
put 'stu','c2_s2','base:weight','70kg'
put 'stu','c2_s3','base:name','tom3'
put 'stu','c2_s3','base:weight','85kg'
put 'stu','c2_s3','base:height','1.70m'

小菜:如何將查詢的結果,輸入文件
echo 「scan ‘stu’,{LIMIT=>1}」 | ./hbase shell > a.txt正則表達式

1. Hbase scan掃描全表,指定返回特定的列shell

hbase(main):028:0> scan 'stu',{COLUMNS => ['base:weight','base:height']}
ROW                                  COLUMN+CELL
 c2_s2                               column=base:weight, timestamp=1588154167692, value=70kg
 c2_s3                               column=base:height, timestamp=1588154125060, value=1.70m
 c2_s3                               column=base:weight, timestamp=1588154124202, value=85kg
2 row(s)
Took 0.0113 seconds

2. Hbase TIMERANGE 掃描指定時間內數據,前閉後開
注意:包含等於前面時間的數據,不含等於後面時間的數據apache

hbase(main):028:0> scan 'stu',{TIMERANGE=>[1588153968060,1588153968207]}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
2 row(s)
Took 0.0108 seconds

3. Hbase 利用STARTROW STOPROW 掃描rowkey的範圍
注意:包含等於前面key的數據,不含等於後面key的數據oop

hbase(main):028:0> scan 'stu',{STARTROW=>'c1_s1',STOPROW=>'c1_s3'}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
2 row(s)
Took 0.0092 seconds

4. HBase 翻轉結果和時間組合排序 REVERSEDblog

全表掃描翻轉結果排序

scan 'stu', {REVERSED => TRUE}

和時間組合翻轉hadoop

hbase(main):009:0> scan 'stu',{TIMERANGE=>[1588153968060,1588153968207],REVERSED => TRUE}
ROW                                  COLUMN+CELL
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack

5. Hbase 返回指標 ALL_METRICS or METRICSget

hbase(main):011:0> scan 'stu',{ALL_METRICS => true}ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
 c1_s3                               column=base:name, timestamp=1588153968207, value=jack3
 c1_s4                               column=base:name, timestamp=1588153968258, value=jack4
 c2_s1                               column=base:name, timestamp=1588153968324, value=tom1
 c2_s2                               column=base:name, timestamp=1588153968367, value=tom2
 c2_s2                               column=base:weight, timestamp=1588154167692, value=70kg
 c2_s3                               column=base:height, timestamp=1588154125060, value=1.70m
 c2_s3                               column=base:name, timestamp=1588153968409, value=tom3
 c2_s3                               column=base:weight, timestamp=1588154124202, value=85kg
7 row(s)
 
METRIC                               VALUE
 BYTES_IN_REMOTE_RESULTS             0
 BYTES_IN_RESULTS                    420
 MILLIS_BETWEEN_NEXTS                66
 NOT_SERVING_REGION_EXCEPTION        0
 REGIONS_SCANNED                     1
 REMOTE_RPC_CALLS                    0
 REMOTE_RPC_RETRIES                  0
 ROWS_FILTERED                       0
 ROWS_SCANNED                        7
 RPC_CALLS                           1
 RPC_RETRIES                         0
scan 'stu',{METRICS => ['ROWS_SCANNED','RPC_CALLS']}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
 c1_s3                               column=base:name, timestamp=1588153968207, value=jack3
 c1_s4                               column=base:name, timestamp=1588153968258, value=jack4
 c2_s1                               column=base:name, timestamp=1588153968324, value=tom1
 c2_s2                               column=base:name, timestamp=1588153968367, value=tom2
 c2_s2                               column=base:weight, timestamp=1588154167692, value=70kg
 c2_s3                               column=base:height, timestamp=1588154125060, value=1.70m
 c2_s3                               column=base:name, timestamp=1588153968409, value=tom3
 c2_s3                               column=base:weight, timestamp=1588154124202, value=85kg
7 row(s)
 
METRIC                               VALUE
 ROWS_SCANNED                        7
 RPC_CALLS                           1

Took 0.0476 secondsstring

6.Hbase 查詢以指定開頭的rowkey數據。

hbase(main):014:0> scan 'stu',{ROWPREFIXFILTER => 'c1'}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
 c1_s3                               column=base:name, timestamp=1588153968207, value=jack3
 c1_s4                               column=base:name, timestamp=1588153968258, value=jack4
4 row(s)
hbase(main):016:0> scan 'stu',{FILTER => "PrefixFilter('c1')"}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
 c1_s3                               column=base:name, timestamp=1588153968207, value=jack3
 c1_s4                               column=base:name, timestamp=1588153968258, value=jack4
4 row(s)
Took 0.0181 seconds

7.按列查找 QualifierFilter

按列查找,能夠指定某一肯定的列或列的範圍。binary是肯定的參數,substring是參數中含有的值。

scan 'stu',{FILTER => "(QualifierFilter (<,'binary:name')) AND (QualifierFilter (=,'substring:jack'))"}

8.以指定列的前綴查找數據。ColumnPrefixFilter

hbase(main):012:0> scan 'stu',{FILTER=>"ColumnPrefixFilter('na') AND (ValueFilter(=,'substring:1') OR ValueFilter(=,'substring:3'))"}
ROW                                  COLUMN+CELL
 c1_s3                               column=base:name, timestamp=1588153968207, value=jack3
 c2_s1                               column=base:name, timestamp=1588153968324, value=tom1
 c2_s3                               column=base:name, timestamp=1588153968409, value=tom3
3 row(s)
Took 0.0075 seconds

9. 按值查找,能夠指定肯定的值或者值的範圍。ValueFilter

hbase(main):018:0> scan 'stu',{FILTER=>"ValueFilter(=,'binary:jack')"}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 1 row(s)

10.按時間戳 TimestampsFilter

hbase(main):022:0> scan 'stu',{FILTER => "TimestampsFilter(1588153968060,1588153968207)"}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s3                               column=base:name, timestamp=1588153968207, value=jack3
2 row(s)
Took 0.0151 seconds

時間等於1588153968060 和 1588153968207 的記錄

11. RAW指導掃描器返回全部單元格(包括刪除標記和未收集的已刪除單元格)。此選項不能與請求特定列相結合。默認狀況下禁用。

hbase(main):024:0> scan 'stu',{RAW => true,VERSIONS => 2}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
 c1_s3                               column=base:name, timestamp=1588153968207, value=jack3
 c1_s4                               column=base:name, timestamp=1588153968258, value=jack4
 c2_s1                               column=base:name, timestamp=1588153968324, value=tom1
 c2_s2                               column=base:name, timestamp=1588153968367, value=tom2
 c2_s2                               column=base:weight, timestamp=1588154167692, value=70kg
 c2_s3                               column=base:height, timestamp=1588154125060, value=1.70m
 c2_s3                               column=base:name, timestamp=1588153968409, value=tom3
 c2_s3                               column=base:weight, timestamp=1588154124202, value=85kg
7 row(s)
Took 0.0346 seconds

咱們刪除一條

delete 'stu','c1_s4','base:name'
 
hbase(main):027:0> scan 'stu',{RAW => true,VERSIONS => 2}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
 c1_s3                               column=base:name, timestamp=1588153968207, value=jack3
 c1_s4                               column=base:name, timestamp=1588153968258, type=Delete
 c1_s4                               column=base:name, timestamp=1588153968258, value=jack4
 c2_s1                               column=base:name, timestamp=1588153968324, value=tom1
 c2_s2                               column=base:name, timestamp=1588153968367, value=tom2
 c2_s2                               column=base:weight, timestamp=1588154167692, value=70kg
 c2_s3                               column=base:height, timestamp=1588154125060, value=1.70m
 c2_s3                               column=base:name, timestamp=1588153968409, value=tom3
 c2_s3                               column=base:weight, timestamp=1588154124202, value=85kg
7 row(s)
Took 0.0189 seconds

顯示type=Delete

12.FirstKeyOnlyFilter
一個rowkey能夠有多個version,同一個rowkey的同一個column也會有多個的值, 只拿出key中的第一個column的第一個version
KeyOnlyFilter: 只要key,不要value

hbase(main):038:0> scan 'stu',FILTER => "FirstKeyOnlyFilter() AND ValueFilter(=,'binary:jack2') AND KeyOnlyFilter()"
ROW                                  COLUMN+CELL
 c1_s2                               column=base:name, timestamp=1588153968114, value=
1 row(s)
Took 0.0083 seconds

13. 限制返回只要兩列

hbase(main):040:0> scan 'stu', {LIMIT => 2}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
2 row(s)
Took 0.0077 seconds

14.引入Java類包

列分頁過濾器:基於列進行分頁,須要設置偏移量與返回數量。分頁ColumnPaginationFilter

語法 ColumnPaginationFilter.new(limit, offset)

hbase(main):002:0> import org.apache.hadoop.hbase.filter.ColumnPaginationFilter
=> [Java::OrgApacheHadoopHbaseFilter::ColumnPaginationFilter]
 
hbase(main):040:0> scan 'stu', {FILTER =>ColumnPaginationFilter.new(3, 1)}
ROW                                  COLUMN+CELL
 c2_s2                               column=base:weight, timestamp=1588154167692, value=70kg
 c2_s3                               column=base:name, timestamp=1588153968409, value=tom3
 c2_s3                               column=base:weight, timestamp=1588154124202, value=85kg
2 row(s)
Took 0.0154 seconds

15. 查找rowkey裏面包含s2

hbase(main):013:0> import org.apache.hadoop.hbase.filter.CompareFilter
=> [Java::OrgApacheHadoopHbaseFilter::CompareFilter]
hbase(main):014:0> import org.apache.hadoop.hbase.filter.CompareFilter
=> [Java::OrgApacheHadoopHbaseFilter::CompareFilter]
hbase(main):015:0> import org.apache.hadoop.hbase.filter.SubstringComparator
=> [Java::OrgApacheHadoopHbaseFilter::SubstringComparator]
hbase(main):016:0> import org.apache.hadoop.hbase.filter.RowFilter
=> [Java::OrgApacheHadoopHbaseFilter::RowFilter]
 
hbase(main):017:0> scan 'stu',{FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new('s2'))}
ROW                                  COLUMN+CELL
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
 c2_s2                               column=base:name, timestamp=1588153968367, value=tom2
 c2_s2                               column=base:weight, timestamp=1588154167692, value=70kg
2 row(s)
Took 0.0427 seconds

16. 正則表達式查詢

import org.apache.hadoop.hbase.filter.RegexStringComparator
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.RowFilter

直接拷貝上面的四句話

hbase(main):018:0> import org.apache.hadoop.hbase.filter.RegexStringComparator
=> [Java::OrgApacheHadoopHbaseFilter::RegexStringComparator]
hbase(main):019:0> import org.apache.hadoop.hbase.filter.CompareFilter
=> [Java::OrgApacheHadoopHbaseFilter::CompareFilter]
hbase(main):020:0> import org.apache.hadoop.hbase.filter.SubstringComparator
=> [Java::OrgApacheHadoopHbaseFilter::SubstringComparator]
hbase(main):021:0> import org.apache.hadoop.hbase.filter.RowFilter
=> [Java::OrgApacheHadoopHbaseFilter::RowFilter]
hbase(main):027:0> scan 'stu', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),RegexStringComparator.new('^c\d+_[a-z]\d+$'))}
ROW                                  COLUMN+CELL
 c1_s1                               column=base:name, timestamp=1588153968060, value=jack
 c1_s2                               column=base:name, timestamp=1588153968114, value=jack2
 c1_s3                               column=base:name, timestamp=1588153968207, value=jack3
 c2_s1                               column=base:name, timestamp=1588153968324, value=tom1
 c2_s2                               column=base:name, timestamp=1588153968367, value=tom2
 c2_s2                               column=base:weight, timestamp=1588154167692, value=70kg
 c2_s3                               column=base:height, timestamp=1588154125060, value=1.70m
 c2_s3                               column=base:name, timestamp=1588153968409, value=tom3
 c2_s3                               column=base:weight, timestamp=1588154124202, value=85kg
6 row(s)
Took 0.0385 seconds

感受不到變化

hbase(main):036:0> put 'stu','c3_s55','base:name','Lucy'
hbase(main):037:0> scan 'stu', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),RegexStringComparator.new('^c\d+_s55$'))}
ROW                                  COLUMN+CELL
 c3_s55                              column=base:name, timestamp=1588162870203, value=Lucy
1 row(s)
Took 0.0082 seconds
相關文章
相關標籤/搜索