Phoenix最先是saleforce的一個開源項目,後來成爲Apache基金的頂級項目。html
Phoenix是構建在HBase上的一個SQL層,能讓咱們用標準的JDBC APIs而不是HBase客戶端APIs來建立表,插入數據和對HBase數據進行查詢。node
put the SQL back in NoSQLgit
Phoenix徹底使用Java編寫,做爲HBase內嵌的JDBC驅動。Phoenix查詢引擎會將SQL查詢轉換爲一個或多個HBase掃描,並編排執行以生成標準的JDBC結果集。直接使用HBase API、協同處理器與自定義過濾器,對於簡單查詢來講,其性能量級是毫秒,對於百萬級別的行數來講,其性能量級是秒。github
HBase的查詢工具備不少,如:Hive、Tez、Impala、Spark SQL、Phoenix等。sql
Phoenix經過如下方式使咱們能夠少寫代碼,而且性能比咱們本身寫代碼更好:apache
使用Phoenix的公司數組
Paste_Image.png服務器
ARRAY Type. 支持標準的JDBC數組類型架構
Sequences. 支持 CREATE/DROP SEQUENCE, NEXT VALUE FOR, CURRENT VALUE FOR也實現了app
Multi-tenancy. 同一張HBase物理表上,不一樣的租戶能夠建立相互獨立的視圖
Views. 同一張HBase物理表上能夠建立不一樣的視圖
Apache Pig Loader . 經過pig來處理數據時支持pig加載器來利用Phoenix的性能
Derived Tables. 容許在一個FROM子句中使用SELECT子句來定義一張衍生表
Local Indexing. 後面介紹
Tracing. 後面介紹
Subqueries 支持在WHERE和FROM子句中的獨立子查詢和相關子查詢
Semi/anti joins. 經過標準的[NOT] IN 和 [NOT] EXISTS關鍵字來支持半/反鏈接
Optimize foreign key joins. 經過利用跳躍掃描過濾器來優化外鍵鏈接
Statistics Collection. 經過收集表的統計信息來提升並行查詢能力
Many-to-many joins. 支持兩邊都太大以致於沒法放進內存的鏈接
Map-reduce Integration. 支持Map-reduce集成
Functional Indexes. 後面介紹
User Defined Functions. 後面介紹
Asynchronous Index Population. 經過一個Map-reduce job,索引能夠被異步建立
Time series Optimization. 優化針對時間序列數據的查詢
Transaction Support. 後面介紹
DISTINCT Query Optimization. 使用搜索邏輯來大幅提升 SELECT DISTINCT 和 COUNT DISTINCT的查詢性能
Local Index Improvements. Reworked 後面介紹
Hive Integration. 可以在Phoenix內使用Hive來支持大表和大表之間的鏈接
Namespace Mapping. 將Phoenix schema映射到HBase的命名空間來加強不一樣schema之間的隔離性
該特性還處於beta版,並不是正式版。經過集成Tephra,Phoenix能夠支持ACID特性。Tephra也是Apache的一個項目,是事務管理器,它在像HBase這樣的分佈式數據存儲上提供全局一致事務。HBase自己在行層次和區層次上支持強一致性,Tephra額外提供交叉區、交叉表的一致性來支持可擴展性。
要想讓Phoenix支持事務特性,須要如下步驟:
<property> <name>phoenix.transactions.enabled</name> <value>true</value> </property>
<property> <name>data.tx.snapshot.dir</name> <value>/tmp/tephra/snapshots</value> </property> <property> <name>data.tx.timeout</name> <value>60</value> <description> set the transaction timeout (time after which open transactions become invalid) to a reasonable value.</description> </property>
./bin/tephra
經過以上配置,Phoenix已經支持了事務特性,但建立表的時候默認仍是不支持的。若是想建立一個表支持事務特性,須要顯示聲明,以下:
CREATE TABLE my_table (k BIGINT PRIMARY KEY, v VARCHAR) TRANSACTIONAL=true;
就是在建表語句末尾增長 TRANSACTIONAL=true
。
本來存在的表也能夠更改爲支持事務的,須要注意的是,事務表沒法改回非事務的,所以更改的時候要當心。一旦改爲事務的,就改不回去了。
ALTER TABLE my_other_table SET TRANSACTIONAL=true;
Phoenix從4.4.0版本開始支持用戶自定義函數。
用戶能夠建立臨時或永久的用戶自定義函數。這些用戶自定義函數能夠像內置的create、upsert、delete同樣被調用。臨時函數是針對特定的會話或鏈接,對其餘會話或鏈接不可見。永久函數的元信息會被存儲在一張叫作SYSTEM.FUNCTION的系統表中,對任何會話或鏈接都可見。
<property> <name>phoenix.functions.allowUserDefinedFunctions</name> <value>true</value> </property> <property> <name>fs.hdfs.impl</name> <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> </property> <property> <name>hbase.rootdir</name> <value>${hbase.tmp.dir}/hbase</value> <description>The directory shared by region servers and into which HBase persists. The URL should be 'fully-qualified' to include the filesystem scheme. For example, to specify the HDFS directory '/hbase' where the HDFS instance's namenode is running at namenode.example.org on port 9000, set this value to: hdfs://namenode.example.org:9000/hbase. By default, we write to whatever ${hbase.tmp.dir} is set too -- usually /tmp -- so change this configuration or else all data will be lost on machine restart.</description> </property> <property> <name>hbase.dynamic.jars.dir</name> <value>${hbase.rootdir}/lib</value> <description> The directory from which the custom udf jars can be loaded dynamically by the phoenix client/region server without the need to restart. However, an already loaded udf class would not be un-loaded. See HBASE-1936 for more details. </description> </property>
後兩個配置須要跟hbse服務端的配置一致。
以上配置完後,在JDBC鏈接時還須要執行如下語句:
Properties props = new Properties(); props.setProperty("phoenix.functions.allowUserDefinedFunctions", "true"); Connection conn = DriverManager.getConnection("jdbc:phoenix:localhost", props);
如下是可選的配置,用於動態類加載的時候把jar包從hdfs拷貝到本地文件系統
<property> <name>hbase.local.dir</name> <value>${hbase.tmp.dir}/local/</value> <description>Directory on the local filesystem to be used as a local storage.</description> </property>
在HBase中,只有一個單一的按照字典序排序的rowKey索引,當使用rowKey來進行數據查詢的時候速度較快,可是若是不使用rowKey來查詢的話就會使用filter來對全表進行掃描,很大程度上下降了檢索性能。而Phoenix提供了二級索引技術來應對這種使用rowKey以外的條件進行檢索的場景。
只須要經過索引就能返回所要查詢的數據,因此索引的列必須包含所需查詢的列(SELECT的列和WHRER的列)
從Phoeinx4.3以上就支持函數索引,其索引不侷限於列,能夠合適任意的表達式來建立索引,當在查詢時用到了這些表達式時就直接返回表達式結果
Global indexing適用於多讀少寫的業務場景。
使用Global indexing的話在寫數據的時候會消耗大量開銷,由於全部對數據表的更新操做(DELETE, UPSERT VALUES and UPSERT SELECT),會引發索引表的更新,而索引表是分佈在不一樣的數據節點上的,跨節點的數據傳輸帶來了較大的性能消耗。在讀數據的時候Phoenix會選擇索引表來下降查詢消耗的時間。在默認狀況下若是想查詢的字段不是索引字段的話索引表不會被使用,也就是說不會帶來查詢速度的提高。
Local indexing適用於寫操做頻繁的場景。
與Global indexing同樣,Phoenix會自動斷定在進行查詢的時候是否使用索引。使用Local indexing時,索引數據和數據表的數據是存放在相同的服務器中的避免了在寫操做的時候往不一樣服務器的索引表中寫索引帶來的額外開銷。使用Local indexing的時候即便查詢的字段不是索引字段索引表也會被使用,這會帶來查詢速度的提高,這點跟Global indexing不一樣。一個數據表的全部索引數據都存儲在一個單一的獨立的可共享的表中。
UPDATE STATISTICS能夠更新某張表的統計信息,以提升查詢性能
從4.6版本開始,Phoenix提供了一種將HBase原生的row timestamp映射到Phoenix列的方法。這樣有利於充分利用HBase提供的針對存儲文件的時間範圍的各類優化,以及Phoenix內置的各類查詢優化。
Phoenix支持分頁查詢:
若是row key是自動增加的,那麼HBase的順序寫會致使region server產生數據熱點的問題,Phoenix的Salted Tables技術能夠解決region server的熱點問題
能夠在範圍掃描的時候提升性能
標準的SQL視圖語法如今在Phoenix上也支持了。這使得能在同一張底層HBase物理表上建立多個虛擬表。
經過指定不一樣的租戶鏈接實現數據訪問的隔離
Phoenix 1.2, specifying columns dynamically is now supported by allowing column definitions to included in parenthesis after the table in the FROM clause on a SELECT statement. Although this is not standard SQL, it is useful to surface this type of functionality to leverage the late binding ability of HBase.
加載CSV數據到Phoenix表有兩種方式:1. 經過psql命令以單線程的方式加載,數據量少的狀況下適用。 2. 基於MapReduce的bulk load工具,適用於數據量大的狀況
Phoenix4.4引入的一個單獨的服務器來提供thin客戶端的鏈接
從4.1版本開始Phoenix增長這個特性來追蹤每條查詢的蹤影,這使用戶可以看到每一條查詢或插入操做背後從客戶端到HBase端執行的每一步。
Phoenix提供各類各樣的指標使咱們可以知道Phoenix客戶端在執行不一樣SQL語句的時候其內部發生了什麼。這些指標在客戶端JVM中經過兩種方式來收集:
Phoenix Architecture.png
位置.png
Phoenix將HBase的數據模型映射到關係型世界
Data Model.png
支持的命令以下:
Example: SELECT * FROM TEST LIMIT 1000; SELECT * FROM TEST LIMIT 1000 OFFSET 100; SELECT full_name FROM SALES_PERSON WHERE ranking >= 5.0 UNION ALL SELECT reviewer_name FROM CUSTOMER_REVIEW WHERE score >= 8.0
Example: UPSERT INTO TEST VALUES('foo','bar',3); UPSERT INTO TEST(NAME,ID) VALUES('foo',123);
Example: UPSERT INTO test.targetTable(col1, col2) SELECT col3, col4 FROM test.sourceTable WHERE col5 < 100 UPSERT INTO foo SELECT * FROM bar;
Example: DELETE FROM TEST; DELETE FROM TEST WHERE ID=123; DELETE FROM TEST WHERE NAME LIKE 'foo%';
CREATE TABLE my_schema.my_table ( id BIGINT not null primary key, date) CREATE TABLE my_table ( id INTEGER not null primary key desc, date DATE not null,m.db_utilization DECIMAL, i.db_utilization) m.DATA_BLOCK_ENCODING='DIFF' CREATE TABLE stats.prod_metrics ( host char(50) not null, created_date date not null,txn_count bigint CONSTRAINT pk PRIMARY KEY (host, created_date) ) CREATE TABLE IF NOT EXISTS "my_case_sensitive_table" ( "id" char(10) not null primary key, "value" integer) DATA_BLOCK_ENCODING='NONE',VERSIONS=5,MAX_FILESIZE=2000000 split on (?, ?, ?) CREATE TABLE IF NOT EXISTS my_schema.my_table (org_id CHAR(15), entity_id CHAR(15), payload binary(1000),CONSTRAINT pk PRIMARY KEY (org_id, entity_id) )TTL=86400
Example: DROP TABLE my_schema.my_table; DROP TABLE IF EXISTS my_table; DROP TABLE my_schema.my_table CASCADE;
Example: CREATE FUNCTION my_reverse(varchar) returns varchar as 'com.mypackage.MyReverseFunction' using jar 'hdfs:/localhost:8080/hbase/lib/myjar.jar' CREATE FUNCTION my_reverse(varchar) returns varchar as 'com.mypackage.MyReverseFunction' CREATE FUNCTION my_increment(integer, integer constant defaultvalue='10') returns integer as 'com.mypackage.MyIncrementFunction' using jar '/hbase/lib/myincrement.jar' CREATE TEMPORARY FUNCTION my_reverse(varchar) returns varchar as 'com.mypackage.MyReverseFunction' using jar 'hdfs:/localhost:8080/hbase/lib/myjar.jar'
Example: DROP FUNCTION IF EXISTS my_reverse DROP FUNCTION my_reverse
Example: CREATE VIEW "my_hbase_table"( k VARCHAR primary key, "v" UNSIGNED_LONG) default_column_family='a'; CREATE VIEW my_view ( new_col SMALLINT ) AS SELECT * FROM my_table WHERE k = 100; CREATE VIEW my_view_on_view AS SELECT * FROM my_view WHERE new_col > 70;
Example: DROP VIEW my_view DROP VIEW IF EXISTS my_schema.my_view DROP VIEW IF EXISTS my_schema.my_view CASCADE
Example: CREATE SEQUENCE my_sequence; CREATE SEQUENCE my_sequence START WITH -1000 CREATE SEQUENCE my_sequence INCREMENT BY 10 CREATE SEQUENCE my_schema.my_sequence START 0 CACHE 10
Example: DROP SEQUENCE my_sequence DROP SEQUENCE IF EXISTS my_schema.my_sequence
Example: ALTER TABLE my_schema.my_table ADD d.dept_id char(10) VERSIONS=10 ALTER TABLE my_table ADD dept_name char(50), parent_id char(15) null primary key ALTER TABLE my_table DROP COLUMN d.dept_id, parent_id; ALTER VIEW my_view DROP COLUMN new_col; ALTER TABLE my_table SET IMMUTABLE_ROWS=true,DISABLE_WAL=true;
Example: CREATE INDEX my_idx ON sales.opportunity(last_updated_date DESC) CREATE INDEX my_idx ON log.event(created_date DESC) INCLUDE (name, payload) SALT_BUCKETS=10 CREATE INDEX IF NOT EXISTS my_comp_idx ON server_metrics ( gc_time DESC, created_date DESC ) DATA_BLOCK_ENCODING='NONE',VERSIONS=?,MAX_FILESIZE=2000000 split on (?, ?, ?) CREATE INDEX my_idx ON sales.opportunity(UPPER(contact_name))
Example: DROP INDEX my_idx ON sales.opportunity DROP INDEX IF EXISTS my_idx ON server_metrics
Example: ALTER INDEX my_idx ON sales.opportunity DISABLE ALTER INDEX IF EXISTS my_idx ON server_metrics REBUILD
Example: EXPLAIN SELECT NAME, COUNT(*) FROM TEST GROUP BY NAME HAVING COUNT(*) > 2; EXPLAIN SELECT entity_id FROM CORE.CUSTOM_ENTITY_DATA WHERE organization_id='00D300000000XHP' AND SUBSTR(entity_id,1,3) = '002' AND created_date < CURRENT_DATE()-1;
Example: UPDATE STATISTICS my_table UPDATE STATISTICS my_schema.my_table INDEX UPDATE STATISTICS my_index UPDATE STATISTICS my_table COLUMNS UPDATE STATISTICS my_table SET phoenix.stats.guidepost.width=50000000
Example: CREATE SCHEMA IF NOT EXISTS my_schema CREATE SCHEMA my_schema
Example: USE my_schema USE DEFAULT
Example: DROP SCHEMA IF EXISTS my_schema DROP SCHEMA my_schema
下載並解壓最新版的phoenix-[version]-bin.tar包
將phoenix-[version]-server.jar放入服務端和master節點的HBase的lib目錄下
重啓HBase
將phoenix-[version]-client.jar添加到全部Phoenix客戶端的classpath
若要在命令行執行交互式SQL語句:
1.切換到bin目錄
2.執行如下語句
$ sqlline.py localhost
若要在命令行執行SQL腳本
$ sqlline.py localhost ../examples/stock_symbol.sql
Paste_Image.png
SQuirrel是用來鏈接Phoenix的客戶端。
SQuirrel安裝步驟以下:
1. Remove prior phoenix-[*oldversion*]-client.jar from the lib directory of SQuirrel, copy phoenix-[*newversion*]-client.jar to the lib directory (*newversion* should be compatible with the version of the phoenix server jar used with your HBase installation) 2. Start SQuirrel and add new driver to SQuirrel (Drivers -> New Driver) 3. In Add Driver dialog box, set Name to Phoenix, and set the Example URL to jdbc:phoenix:localhost. 4. Type 「org.apache.phoenix.jdbc.PhoenixDriver」 into the Class Name textbox and click OK to close this dialog. 5. Switch to Alias tab and create the new Alias (Aliases -> New Aliases) 6. In the dialog box, Name: *any name*, Driver: Phoenix, User Name: *anything*, Password: *anything* 7. Construct URL as follows: jdbc:phoenix: *zookeeper quorum server*. For example, to connect to a local HBase use: jdbc:phoenix:localhost 8. Press Test (which should succeed if everything is setup correctly) and press OK to close. 9. Now double click on your newly created Phoenix alias and click Connect. Now you are ready to run SQL queries against Phoenix.
Paste_Image.png
Pherf是能夠經過Phoenix來進行性能和功能測試的工具。Pherf能夠用來生成高度定製的數據集,而且測試SQL在這些數據集上的性能。
Pherf是在用maven構建Phoenix的過程當中同時構建的。能夠用兩種不一樣的配置來構建:
This profile builds Pherf such that it can run along side an existing cluster. The dependencies are pulled from the HBase classpath.
This profile builds all of Pherf’s dependencies into a single standalone jar. The deps will be pulled from the versions specified in Phoenix’s pom.
mvn clean package -DskipTests
mvn clean package -P standalone -DskipTests
用以上的Maven命令構建完Pherf後,會在該模塊的目標目錄下生成一個zip文件。
env.sh
文件./pherf.sh -drop all -l -q -z localhost -schemaFile .*user_defined_schema.sql -scenarioFile .*user_defined_scenario.xml
$./pherf.sh -listFiles
$./pherf.sh -drop all -l -q -z localhost
-h Help
-l Apply schema and load data
-q Executes Multi-threaded query sets and write results
-z [quorum] Zookeeper quorum
-m Enable monitor for statistics
-monitorFrequency [frequency in Ms] _Frequency at which the monitor will snopshot stats to log file.
-drop [pattern] Regex drop all tables with schema name as PHERF. Example drop Event tables: -drop .(EVENT). Drop all: -drop .* or -drop all*
-scenarioFile Regex or file name of a specific scenario file to run.
-schemaFile Regex or file name of a specific schema file to run.
-export Exports query results to CSV files in CSV_EXPORT directory
-diff Compares results with previously exported results
-hint Executes all queries with specified hint. Example SMALL
-rowCountOverride
-rowCountOverride [number of rows] Specify number of rows to be upserted rather than using row count specified in schema
結果實時寫入結果目錄中。能夠打開.jpg格式文件來實時可視化。
Run unit tests: mvn test -DZK_QUORUM=localhost
Run a specific method: mvn -Dtest=ClassName#methodName test
More to come...
Phoenix經過如下方法來奉行把計算帶到離數據近的地方
的哲學:
協處理器
在服務端執行操做來最小化服務端和客戶端的數據傳輸
定製的過濾器
爲了刪減數據使之儘量地靠近源數據並最小化啓動代價,Phoenix使用原生的HBase APIs而不是使用Map/Reduce框架
8.2.1.1 Phoenix vs Hive (running over HDFS and HBase)
Paste_Image.png
Query: select count(1) from table over 10M and 100M rows. Data is 5 narrow columns. Number of Region Servers: 4 (HBase heap: 10GB, Processor: 6 cores @ 3.3GHz Xeon)
8.2.1.2 Phoenix vs Impala (running over HBase)
Paste_Image.png
Query: select count(1) from table over 1M and 5M rows. Data is 3 narrow columns. Number of Region Server: 1 (Virtual Machine, HBase heap: 2GB, Processor: 2 cores @ 3.3GHz Xeon)
Latest Automated Performance Run | Automated Performance Runs History
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
做者:Jeffbond 連接:https://www.jianshu.com/p/d862337247b1 來源:簡書 簡書著做權歸做者全部,任何形式的轉載都請聯繫做者得到受權並註明出處。