[TOC]html
Phoenix做爲應用層和HBASE之間的中間件,如下特性使它在大數據量的簡單查詢場景有着獨有的優點sql
通常可使用如下三種方式訪問Phoenixshell
Ambari其實已經自帶了Phoenix,因此安裝比較簡單,勾起來直接點擊安裝就行了。apache
不過在ambari2.4.2中,Phoenix版本4.7.0與HBase1.1.2中,使用sqlline.py鏈接的時候會報錯緩存
Class org.apache.phoenix.coprocessor.MetaDataEndpointImpl cannot be loaded Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks
按照提示去增長配置,重啓HBase就行了。異步
新版本ambari2.7.3沒有這個問題。Phoenix 5.0.0 HBase2.0.0工具
本次使用4.7.0版本做爲示例,並使用phoenix-sqlline ${zookeeper}演示。oop
如下只有部分演示。
CREATE TABLE IF NOT EXISTS ljktest (ID VARCHAR PRIMARY KEY,NAME VARCHAR,AGE TINYINT)
CREATE TABLE IF NOT EXISTS "ljktest" (ID VARCHAR PRIMARY KEY,NAME VARCHAR,AGE TINYINT)
CREATE TABLE LJKTEST2 (ID INTEGER NOT NULL,AGE TINYINT NOT NULL,NAME VARCHAR,CONSTRAINT PK PRIMARY KEY(ID, AGE)) TTL = 86400;
在hbase-shell裏面查看下是否有表生成,能夠看到默認會在HBase這邊表是大寫的,要區分大小寫,加入引號便可。而且當中加入了不少協處理器。不指定列簇的話,默認列簇用0來表示。
hbase(main):008:0> desc 'LJKTEST' Table LJKTEST is ENABLED LJKTEST, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObser ver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObser ver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserve r|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|80530 6366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|index.builder=org.ap ache.phoenix.index.PhoenixIndexBuilder,org.apache.hadoop.hbase.index.codec.class=org.apache.phoe nix.index.PhoenixIndexCodec'} COLUMN FAMILIES DESCRIPTION {NAME => '0', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false' , KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'FAST_DI FF', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CAC HE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_B LOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 1 row(s) Took 0.1066 seconds
CREATE VIEW LJK_TEST(ROWKEY VARCHAR PRIMARY KEY,"mycf"."name" VARCHAR);
`CREATE TABLE LJK_TEST (ROWKEY VARCHAR PRIMARY KEY,
"mycf"."name" VARCHAR) COLUMN_ENCODED_BYTES=0;`
簡單插入
UPSERT INTO LJKTEST VALUES('0001','LINJIKAI',18);
同步在hbase-shell看到的結果是
hbase(main):010:0> scan 'LJKTEST' ROW COLUMN+CELL 0001 column=0:\x00\x00\x00\x00, timestamp=1557719617275, value=x 0001 column=0:\x80\x0B, timestamp=1557719617275, value=LINJIKAI 0001 column=0:\x80\x0C, timestamp=1557719617275, value=\x92 1 row(s) Took 0.0079 seconds
UPSERT INTO LJKTEST(ID,NAME) VALUES('0002','張三');
重複KEY插入策略
更新字段
UPSERT INTO LJKTEST VALUES('0003','李四',19) ON DUPLICATE KEY UPDATE AGE = AGE + 1;
效果就是若是碰到value相同的狀況則年齡會大一歲,展現結果以下,李四最後的年齡變成了20歲
0: jdbc:phoenix:> UPSERT INTO LJKTEST VALUES('0003','李四',19) ON DUPLICATE KEY UPDATE AGE = AGE + 1; 1 row affected (0.011 seconds) 0: jdbc:phoenix:> SELECT * FROM LJKTEST; +-------+-----------+-------+ | ID | NAME | AGE | +-------+-----------+-------+ | 0001 | LINJIKAI | 18 | | 0002 | 張三 | null | | 0003 | 李四 | 20 | +-------+-----------+-------+ 3 rows selected (0.026 seconds)
不更新
UPSERT INTO LJKTEST VALUES('0003','李四',19) ON DUPLICATE KEY IGNORE;
這種寫法不會改變李四的年齡,由於這條數據的key已經存在了。
Phoenix特別強大,由於咱們提供了覆蓋索引。一旦找到索引條目,咱們就不須要返回主表了。相反,咱們將咱們關心的數據捆綁在索引行中,從而節省了讀取時間開銷
建一張表來測試二級索引機制,如下其餘功能索引也會使用這張表來建對應二級索引
CREATE TABLE LJKTEST (ID CHAR(4) NOT NULL PRIMARY KEY,AGE UNSIGNED_TINYINT,NAME VARCHAR,COMPANY VARCHAR,SCHOOL VARCHAR)
並插入一些數據
UPSERT INTO LJKTEST VALUES('0001',18,'張三','張三公司','張三學校'); UPSERT INTO LJKTEST VALUES('0002',19,'李四','李四公司','李四學校'); UPSERT INTO LJKTEST VALUES('0003',20,'王五','王五公司','王五學校'); UPSERT INTO LJKTEST VALUES('0004',21,'趙六','趙六公司','趙六學校');
建立多字段覆蓋索引 CREATE INDEX COVER_LJK_INDEX ON LJKTEST(COMPANY,SCHOOL) INCLUDE (NAME);
查看索引邏輯
能夠看到使用school字段去查詢的話,不會走索引,進入了全表掃描。
0: jdbc:phoenix:> EXPLAIN SELECT NAME FROM LJKTEST WHERE SCHOOL ='張三學校'; +---------------------------------------------------------------------------+------------------+ | PLAN | EST_BYTES_READ | +---------------------------------------------------------------------------+------------------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER COVER_LJK_INDEX | null | | SERVER FILTER BY "SCHOOL" = '張三學校' | null | +---------------------------------------------------------------------------+------------------+
但若是使用company字段去查詢則會走前綴索引,進入了range scan.
0: jdbc:phoenix:> EXPLAIN SELECT NAME FROM LJKTEST WHERE COMPANY ='張三公司'; +-------------------------------------------------------------------------------------+--------+ | PLAN | EST_BY | +-------------------------------------------------------------------------------------+--------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER COVER_LJK_INDEX ['張三公司'] | null | +-------------------------------------------------------------------------------------+--------+
接下來建立一個單字段覆蓋索引看看
CREATE INDEX COVER_LJK_INDEX_COMPANY ON LJKTEST(COMPANY) INCLUDE (NAME);
0: jdbc:phoenix:> EXPLAIN SELECT NAME FROM LJKTEST WHERE COMPANY ='張三公司'; +----------------------------------------------------------------------------------------------+ | PLAN | +----------------------------------------------------------------------------------------------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER COVER_LJK_INDEX_COMPANY ['張三公司'] | +----------------------------------------------------------------------------------------------+ 1 row selected (0.028 seconds)
當使用*號做爲字段去檢索時,走的FULL SCAN。
0: jdbc:phoenix:> EXPLAIN SELECT /*+ INDEX(LJKTEST COVER_LJK_INDEX_COMPANY)*/ * FROM LJKTEST WHERE COMPANY ='張三公司'; +----------------------------------------------------------------------------------------------+ | PLAN | +----------------------------------------------------------------------------------------------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER LJKTEST | | SKIP-SCAN-JOIN TABLE 0 | | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER COVER_LJK_INDEX_COMPANY_ON | | SERVER FILTER BY FIRST KEY ONLY | | DYNAMIC SERVER FILTER BY "LJKTEST.ID" IN ($34.$36) | +----------------------------------------------------------------------------------------------+
這個時候你須要使用hint來指定索引
0: jdbc:phoenix:> EXPLAIN SELECT /*+INDEX(LJKTEST COVER_LJK_INDEX_COMPANY)*/* FROM LJKTEST WHERE COMPANY='張三公司'; +-----------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+ | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | +-----------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER LJKTEST | null | null | null | | SKIP-SCAN-JOIN TABLE 0 | null | null | null | | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER COVER_LJK_INDEX_COMPANY ['張三公司'] | null | null | null | | SERVER FILTER BY FIRST KEY ONLY | null | null | null | | DYNAMIC SERVER FILTER BY "LJKTEST.ID" IN ($10.$12) | null | null | null | +-----------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+ 5 rows selected (0.046 seconds)
CREATE INDEX SCHOOL_WITH_COMPANY ON LJKTEST(COMPANY||' '||SCHOOL)
本地索引須要增長LOCAL關鍵字。上面都是全局索引。
CREATE LOCAL INDEX COVER_LJK_INDEX_COMPANY ON LJKTEST(COMPANY) INCLUDE (NAME);
適合寫多讀少的狀況,數據會存儲在原表中,侵入性強。
適合讀多寫少的狀況,數據會存儲上單獨的一張表中。
默認狀況下,建立索引時,會在CREATE INDEX調用期間同步填充索引。可是數據表的當前大小,這多是不可行的。從4.5開始,經過在索引建立DDL語句中包含ASYNC關鍵字,能夠異步完成索引的填充
CREATE INDEX INDEX_LJKTEST_AGE_ASYNC ON LJKTEST(AGE) INCLUDE(SCHOOL) ASYNC;
這只是第一步,你還必須經過HBase命令行單獨啓動填充索引表的map reduce做業
hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table LJKTEST --index-table INDEX_LJKTEST_AGE_ASYNC --output-path ASYNC_IDX_HFILES
只有當map reduce做業完成時,纔會激活索引並開始在查詢中使用。
output-path選項用於指定用於寫入HFile的HDFS目錄
對於其中數據僅寫入一次且從未就地更新的表,能夠進行某些優化以減小增量維護的寫入時間開銷。這對於時間序列數據(例如日誌或事件數據)很常見,一旦寫入一行,就永遠不會更新。要利用這些優化,請經過向DDL語句添加IMMUTABLE_ROWS = true屬性將表聲明爲不可變
CREATE TABLE LJKTEST_IMMU (ROWKEY VARCHAR PRIMARY KEY, NAME VARCHAR,AGE VARCHAR) IMMUTABLE_ROWS=true;
CREATE INDEX INDEX_LJKTEST_IMMU ON LJKTEST_IMMU(NAME) INCLUDE(AGE);
測試了下不變索引的特性
即便rowkey相同的數據,作了更新動做,在phoenix中則會作追加動做
UPSERT INTO LJKTEST_IMMU VALUES('1','LILEI','18'); UPSERT INTO LJKTEST_IMMU VALUES('1','HANGMEIMEI','18');
0: jdbc:phoenix:dn1> select * from LJKTEST_IMMU; +---------+-------------+------+ | ROWKEY | NAME | AGE | +---------+-------------+------+ | 1 | HANGMEIMEI | 18 | | 1 | LILEI | 18 | +---------+-------------+------+ 2 rows selected (0.078 seconds) 0: jdbc:phoenix:dn1> SELECT * FROM INDEX_LJKTEST_IMMU; +-------------+----------+--------+ | 0:NAME | :ROWKEY | 0:AGE | +-------------+----------+--------+ | HANGMEIMEI | 1 | 18 | | LILEI | 1 | 18 | +-------------+----------+--------+ 2 rows selected (0.021 seconds) 0: jdbc:phoenix:dn1> SELECT * FROM LJKTEST_IMMU WHERE ROWKEY='1'; +---------+-------------+------+ | ROWKEY | NAME | AGE | +---------+-------------+------+ | 1 | HANGMEIMEI | 18 | +---------+-------------+------+ 1 row selected (0.017 seconds) 0: jdbc:phoenix:dn1> SELECT * FROM LJKTEST_IMMU WHERE NAME='LILEI'; +---------+--------+------+ | ROWKEY | NAME | AGE | +---------+--------+------+ | 1 | LILEI | 18 | +---------+--------+------+ 1 row selected (0.024 seconds) 0: jdbc:phoenix:dn1> SELECT * FROM LJKTEST_IMMU WHERE AGE='18'; +---------+-------------+------+ | ROWKEY | NAME | AGE | +---------+-------------+------+ | 1 | HANGMEIMEI | 18 | | 1 | LILEI | 18 | +---------+-------------+------+ 2 rows selected (0.027 seconds)
在hbase表中,則是根據原來的規則,rowkey相同替換對應字段
hbase(main):002:0> scan 'LJKTEST_IMMU' ROW COLUMN+CELL 1 column=0:AGE, timestamp=1563241706845, value=18 1 column=0:NAME, timestamp=1563241706845, value=HANGMEIMEI 1 column=0:_0, timestamp=1563241706845, value=x 1 row(s) in 0.0590 seconds
<font color="red">使用Phoenix 4.12</font>,如今有一個工具能夠運行MapReduce做業來驗證索引表是否對其數據表有效。在任一表中查找孤立行的惟一方法是掃描表中的全部行,並在另外一個表中查找相應行。所以,該工具可使用數據或索引表做爲「源」表運行,另外一個做爲「目標」表運行。該工具將找到的全部無效行寫入文件或輸出表PHOENIX_INDEX_SCRUTINY。無效行是源行,它在目標表中沒有對應的行,或者在目標表中具備不正確的值(即覆蓋的列值)。
hbase org.apache.phoenix.mapreduce.index.IndexScrutinyTool -dt my_table -it my_index -o
或者
HADOOP_CLASSPATH=$(hbase mapredcp) hadoop jar phoenix-<version>-server.jar org.apache.phoenix.mapreduce.index.IndexScrutinyTool -dt my_table -it my_index -o
開箱即用,索引很是快。可是,要針對特定環境和工做負載進行優化,能夠調整幾個屬性。
屬性名 | 描述 | 默認值 |
---|---|---|
index.builder.threads.max | 用於從主表更新構建索引更新的線程數 | 10 |
index.builder.threads.keepalivetime | 咱們使構建器線程池中的線程到期後的時間量(以秒爲單位)。 | 60 |
index.writer.threads.max | 寫入目標索引表時要使用的線程數。 | 10 |
index.writer.threads.keepalivetime | 咱們使寫入器線程池中的線程到期後的時間量(以秒爲單位)。 | 60 |
hbase.htable.threads.max | 每一個索引HTable可用於寫入的線程數。 | 2,147,483,647 |
hbase.htable.threads.keepalivetime | 咱們使HTable的線程池中的線程到期後的時間量(以秒爲單位) | 60 |
index.tablefactory.cache.size | 咱們應該在緩存中保留的索引HTable的數量。 | 10 |
org.apache.phoenix.regionserver.index.priority.min | 指定索引優先級所在範圍的底部(包括)的值。 | 1000 |
org.apache.phoenix.regionserver.index.priority.max | 指定索引優先級所在範圍的頂部(不包括)的值。 | 1050 |
org.apache.phoenix.regionserver.index.handler.count | 爲全局索引維護提供索引寫入請求時要使用的線程數。 | 30 |
19/05/14 11:23:40 WARN iterate.BaseResultIterators: Unable to find parent table "LJKTEST" of table "COVER_LJK_INDEX_COMPANY_ONLY" to determine USE_STATS_FOR_PARALLELIZATION org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=LJKTEST
其中若是建立索引表報錯,根據錯誤信息去修改hbase-site,重啓hbase就行了。
Error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server. tableName=COVER_LJK_INDEX (state=42Y88,code=1029)