實現目標apache
Hive map hbaseapp
1,啓動hive hbaseoop
在hive hbase服務啓動的狀況下, $HIVE_HOME/bin/hive --auxpath $HIVE_HOME/lib/hive-hbase-handler-1.1.0-cdh5.7.1.jar,$HIVE_HOME/lib/hbase-common-1.2.0-cdh5.7.1.jar,$HIVE_HOME/lib/zookeeper-3.4.5-cdh5.7.1.jar,$HIVE_HOME/lib/guava-14.0.1.jar --hiveconfhbase.master=dwrj5123:60000 (可能不須要這個過程) .spa
2, 查詢hbase中表的結構hadoop
(1)查詢jinan:SI3U_AC06_TEMPci
describe 'jinan:SI3U_AC06_TEMP'同步
Table jinan:SI3U_AC06_TEMP is ENABLED string
jinan:SI3U_AC06_TEMP io
COLUMN FAMILIES DESCRIPTION table
{NAME => 'AC06_TEMP', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '
FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.1960 seconds
其中jinan:SI3U_AC06_TEMP爲表名, 列族爲AC06_TEMP,經過查詢表中的數據得知有不限於如下列:
AC06_TEMP:AAC001,
AC06_TEMP:AAE140,
AC06_TEMP:AAE149,
AC06_TEMP:BAA044,
AC06_TEMP:BAA035,
AC06_TEMP:BAA036,
AC06_TEMP:AAE034
(2) 查詢jinan:SI3U_AC01
hbase(main):003:0> describe 'jinan:SI3U_AC01'
Table jinan:SI3U_AC01 is ENABLED
jinan:SI3U_AC01
COLUMN FAMILIES DESCRIPTION
{NAME => 'AC01', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE'
MPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>
1 row(s) in 0.1850 seconds
表jinan:SI3U_AC01 經過查詢表中的數據獲得不限於如下列:
AC01:AAC001,
AC01:AAC003,
AC01:AAA109
3, 建立hive 表到hbase映射
(1) 建立hive 表jinan_SI3U_AC01 到hbase 表"jinan:SI3U_AC01的映射:
解釋: jinan_SI3U_AC01爲hive 中表名, jinan:SI3U_AC01爲須要映射的hbase表名.
":key,AC01:AAC001,AC01:AAC003,AC01:AAA109": 爲須要映射的列, AC01爲列族,多列以逗號隔開.
(2) 建立hive 表jinan_SI3U_AC06_TEMP 到hbase表jinan:SI3U_AC06_TEMP的映射:
CREATE EXTERNAL TABLE jinan_SI3U_AC06_TEMP(key string,AAC001 string,AAE140 string,AAE149 string,BAA044 string,BAA035 decimal(19,4),BAA036 decimal(19,4),AAE034 TIMESTAMP )
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,AC06_TEMP:AAC001,AC06_TEMP:AAE140,AC06_TEMP:AAE149,AC06_TEMP:BAA044,AC06_TEMP:BAA035,AC06_TEMP:BAA036,AC06_TEMP:AAE034") TBLPROPERTIES ("hbase.table.name" = "jinan:SI3U_AC06_TEMP");
Hive建立視圖
hive 中能夠經過建立視圖的方式將多個表的數據整合成一個視圖,方便查詢和使用. 這裏以上面映射的jinan_SI3U_AC01和jinan_SI3U_AC06_TEMP兩個表爲例.
create view fact_view (AAC001,AAC003,AAA109,AAE140,AAE149,BAA044,BAA035,BAA036, AAE034 ) as SELECT a.AAC001, a.AAC003,a.AAA109, b.AAE140,b.AAE149, b.BAA044,b.BAA035, b.BAA036,b.AAE034 FROM jinan_SI3U_AC01 a RIGHT JOIN jinan_SI3U_AC06_TEMP b ON a.aac001 =b.aac001;
表名 |
列名 |
|
視圖名 |
jinan_SI3U_AC01 |
AAC001 AAC003 AAA109 |
fact_view |
|
jinan_SI3U_AC06_TEMP |
AAE140 AAE149 BAA044 BAA035 BAA036 AAE034 |
經過SELECT * FROM fact_view; 能夠查詢到有效數據.
KYLIN hive 視圖的使用
kylin 支持hive 視圖構創建方體, 過程與使用hive 表相同. 構創建方體完成以後,執行查詢,
SELECT SUM(BAA035) FROM FACT_VIEW left inner join DATE_VIEW ON FACT_VIEW.aae034=DATE_VIEW.start_date where(DATE_VIEW.start_date>'2014-05-01' and DATE_VIEW.start_date<'2015-01-01');
總結
Hive 與hbase實現了經過一次映射, 能夠實時查詢hbase中的數據, 也能夠從hive表中插入數據到hbase. 經過構建視圖的方式能夠將多個hive 表的數據整合到一個視圖中, 方便數據的使用, 經過以上方式對hbase 中數據的利用不佔用數據儲存空間.缺點,以上過程無數據清洗過程,可能會存在數據衝突的問題.
參考
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration