1、hive與hbase的結合
Hive會常常和Hbase結合使用,把Hbase做爲Hive的存儲路徑,因此Hive整合Hbase尤爲重要。使用Hive讀取Hbase中的數據,可使用HQL語句在HBase表上進行查詢、插入操做;甚至是進行Join和Union等複雜查詢。此功能是從Hive 0.6.0開始引入的。Hive與HBase整合的實現是利用二者自己對外的API接口互相進行通訊,相互通訊主要是依靠hive-hbase-handler-*.jar工具裏面的類實現的。使用Hive操做HBase中的表,只是提供了便捷性,hiveQL引擎使用的是MapReduce,對於性能上,表現不盡人意。html
步驟:
一、將hbase相關jar包複製到hive/lib下,操做以下:java
[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-protocol-1.4.5.jar /opt/hive/lib/ [hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-server-1.4.5.jar /opt/hive/lib/ [hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-client-1.4.5.jar /opt/hive/lib/ [hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-common-1.4.5.jar /opt/hive/lib/ [hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-common-1.4.5-tests.jar /opt/hive/lib/ [hadoop@bus-stable hive]$
二、在hive-site.xml文件中引用hbase,添加以下內容:sql
[hadoop@bus-stable hive]$ vim /opt/hive/conf/hive-site.xml <property> <name>hive.aux.jars.path</name> <value> file:///opt/hive/lib/hive-hbase-handler-2.3.3.jar, file:///opt/hive/lib/hbase-protocol-1.4.5.jar, file:///opt/hive/lib/hbase-server-1.4.5.jar, file:///opt/hive/lib/hbase-client-1.4.5.jar, file:///opt/hive/lib/hbase-common-1.4.5.jar, file:///opt/hive/lib/hbase-common-1.4.5-tests.jar, file:///opt/hive/lib/zookeeper-3.4.6.jar, file:///opt/hive/lib/guava-14.0.1.jar </value> <description>The location of the plugin jars that contain implementations of user defined functions and serdes.</description> </property> <property> <name>hbase.zookeeper.quorum</name> <value>open-stable,permission-stable,sp-stable</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property>
三、啓動hive:shell
[hadoop@bus-stable hive]$ hive -hiveconf hbase.master=oversea-stable:60000 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.3-bin/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> create table htest(key int,value string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties('hbase.columns.mapping'=':key,f:value') tblproperties('hbase.table.name'='htest'); OK Time taken: 9.376 seconds hive> show databases; OK default inspiry Time taken: 0.121 seconds, Fetched: 2 row(s) hive> show tables; OK htest Time taken: 0.047 seconds, Fetched: 1 row(s) hive> select * from htest; OK Time taken: 1.967 seconds hive>
四、在hbase中驗證數據:apache
[hadoop@oversea-stable opt]$ hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hbase-1.4.5/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. Version 1.4.5, rca99a9466415dc4cfc095df33efb45cb82fe5480, Wed Jun 13 15:13:00 EDT 2018 hbase(main):001:0> list TABLE htest 1 row(s) in 0.2970 seconds => ["htest"] hbase(main):002:0> scan "htest" ROW COLUMN+CELL 0 row(s) in 0.1410 seconds hbase(main):003:0>
2、導入外部數據
(1) 數據文件以下:
[hadoop@bus-stable ~]$ cat score.csv
hive,85
hbase,90
hadoop,92
flume,89
kafka,95
spark,80
storm,70
[hadoop@bus-stable ~]$ hadoop fs -put score.csv /data/score.csv
[hadoop@bus-stable ~]$ hadoop fs -ls /data/
Found 2 items
-rw-r--r-- 3 hadoop supergroup 88822 2018-06-15 10:32 /data/notepad.txt
-rw-r--r-- 3 hadoop supergroup 70 2018-06-26 15:59 /data/score.csv
[hadoop@bus-stable ~]$vim
(2) 建立外部表
利用hdfs上的現有數據,建立hive外部表
hive> create external table if not exists course.testcourse(cname string,score int) row format delimited fields terminated by ',' stored as textfile location '/data';
OK
Time taken: 0.282 seconds
hive> show databases;
OK
course
default
inspiry
Time taken: 0.013 seconds, Fetched: 3 row(s)
hive> use course;
OK
Time taken: 0.021 seconds
hive> show tables;
OK
testcourse
Time taken: 0.036 seconds, Fetched: 1 row(s)
hive> select * from testcourse ;
OK
hive 85
hbase 90
hadoop 92
flume 89
kafka 95
spark 80
storm 70
Time taken: 2.272 seconds, Fetched: 7 row(s)
hive> app
3、利用HQL語句建立hbase 表
使用HQL語句建立一個指向HBase的Hive表,語法以下:maven
CREATE TABLE tbl_name(key int, value string) //Hive中的表名tbl_name STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' //指定存儲處理器 WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") //聲明列族,列名 TBLPROPERTIES ("hbase.table.name" = "tbl_name", "hbase.mapred.output.outputtable" = "iteblog"); //hbase.table.name 聲明HBase表名, 爲可選屬性默認與Hive的表名相同, hbase.mapred.output.outputtable 指定插入數據時寫入的表, 若是之後須要往該表插入數據就須要指定該值
(1) 建立語句以下ide
hive> create table course.hbase_testcourse(cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,cf:score")TBLPROPERTIES("hbase.table.name" = "hbase_testcourse","hbase.mapred.output.outputtable" = "hbase_testcourse"); OK Time taken: 3.745 seconds hive> show databases; OK course default inspiry Time taken: 0.019 seconds, Fetched: 3 row(s) hive> use course; OK Time taken: 0.02 seconds hive> show tables; OK hbase_testcourse testcourse Time taken: 0.025 seconds, Fetched: 2 row(s) hive> select * from hbase_testcourse; OK Time taken: 1.883 seconds hive>
(2) 建立完內部表,能夠經過Hive支持的insert overwrite 方式將一個表的數據導入 HBase工具
hive> insert overwrite table course.hbase_testcourse select cname,score from course.testcourse; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = hadoop_20180626170540_c7eecb8d-2925-4ad2-be7f-237d9815d1cb Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1529932626564_0002, Tracking URL = http://oversea-stable:8088/proxy/application_1529932626564_0002/ Kill Command = /opt/hadoop/bin/hadoop job -kill job_1529932626564_0002 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2018-06-26 17:06:02,793 Stage-3 map = 0%, reduce = 0% 2018-06-26 17:06:14,126 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 6.12 sec MapReduce Total cumulative CPU time: 6 seconds 120 msec Ended Job = job_1529932626564_0002 MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 6.12 sec HDFS Read: 4224 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 120 msec OK Time taken: 41.489 seconds hive> hive> select * from hbase_testcourse; OK flume 89 hadoop 92 hbase 90 hive 85 kafka 95 spark 80 storm 70 Time taken: 0.201 seconds, Fetched: 7 row(s) hive>
(3) 驗證hbase
hbase(main):011:0> list TABLE hbase_testcourse htest 2 row(s) in 0.0110 seconds => ["hbase_testcourse", "htest"] hbase(main):012:0> scan "hbase_testcourse" ROW COLUMN+CELL flume column=cf:score, timestamp=1530003973026, value=89 hadoop column=cf:score, timestamp=1530003973026, value=92 hbase column=cf:score, timestamp=1530003973026, value=90 hive column=cf:score, timestamp=1530003973026, value=85 kafka column=cf:score, timestamp=1530003973026, value=95 spark column=cf:score, timestamp=1530003973026, value=80 storm column=cf:score, timestamp=1530003973026, value=70 7 row(s) in 0.0760 seconds hbase(main):013:0>
4、使用Hive映射HBase中已經存在的表
(1) 在hbase中建立HBase表,進入HBase Shell客戶端執行建表命令
hbase(main):036:0> create 'hbase_test',{ NAME => 'cf'} 0 row(s) in 2.2830 seconds => Hbase::Table - hbase_test
(2) 插入數據
hbase(main):037:0> put 'hbase_test','hadoop','cf:score', '95' 0 row(s) in 0.1110 seconds hbase(main):038:0> put 'hbase_test','storm','cf:score', '96' 0 row(s) in 0.0120 seconds hbase(main):039:0> put 'hbase_test','spark','cf:score', '97' 0 row(s) in 0.0110 seconds
(3) 查看數據
hbase(main):041:0> scan "hbase_test" ROW COLUMN+CELL hadoop column=cf:score, timestamp=1530004351399, value=95 spark column=cf:score, timestamp=1530004365368, value=97 storm column=cf:score, timestamp=1530004359169, value=96 3 row(s) in 0.0220 seconds hbase(main):042:0>
(4) 進入Hive Shell 客戶端,建立外部表course.hbase_test,建表命令以下所示
hive> create external table course.hbase_test(cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,cf:score") TBLPROPERTIES("hbase.table.name" = "hbase_test", "hbase.mapred.output.outputtable" = "hbase_test"); OK Time taken: 0.221 seconds hive> show tables; OK hbase_test hbase_testcourse testcourse Time taken: 0.024 seconds, Fetched: 3 row(s) 備註:建立外部表和建立內部表的命令基本一致,惟一的區別就是:建立內部表使用create table,建立外部表使用create external table。 Hive 查看數據 hive> select * from hbase_test; OK hadoop 95 spark 97 storm 96 Time taken: 0.22 seconds, Fetched: 3 row(s) hive>
該Hive表一個外部表,因此刪除該表並不會刪除HBase表中的數據,有幾點須要注意的是:
a)、建表或映射表的時候若是沒有指定:key則第一個列默認就是行鍵
b)、HBase對應的Hive表中沒有時間戳概念,默認返回的就是最新版本的值
c)、因爲HBase中沒有數據類型信息,因此在存儲數據的時候都轉化爲String類型
5、使用java鏈接hive操做hbase
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>cn.itcast.hbase</groupId> <artifactId>hbase</artifactId> <version>0.0.1-SNAPSHOT</version> <dependencies> <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.6.4</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.4</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs --> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client --> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.4.0</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-server --> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>1.4.0</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc --> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>1.2.1</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-metastore --> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-metastore</artifactId> <version>1.2.1</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc --> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>1.2.1</version> </dependency> </dependencies> </project>
Hive_Hbase.java
package cn.itcast.bigdata.hbase; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class Hive_Hbase { public static void main(String[] args) { try { Class.forName("org.apache.hive.jdbc.HiveDriver"); Connection connection = DriverManager.getConnection("jdbc:hive2://hadoop1:10000/shizhan02","hadoop",""); Statement statement = connection.createStatement(); String sql = "SELECT * FROM hive_hbase_table_kv"; ResultSet res = statement.executeQuery(sql); while (res.next()) { System.out.println(res.getString(2)); } } catch (ClassNotFoundException | SQLException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }