概述:javascript
基於Hadoop的一個數據倉庫工具,能夠將結構化的數據文件映射爲一張數據庫表,經過類SQL語句快速實現簡單的MapReduce統計.html
組成:java
(1)用戶接口:主要是cli , beeline , hiveserver2 client(thrift客戶端);用於接受用戶任務。node
(2)元數據存儲:表結構和元數據存儲於關係型數據庫中,客戶端經過訪問metastore 服務獲取元數據。mysql
(3)解釋器、編譯器、優化器、執行器:HQL轉換爲做業。sql
(4)Hadoop:數據存儲與HDFS中,查詢操做轉換成MapReduce做業。數據庫
安裝:apache
Hive的安裝主要區別在於元數據存儲位置,針對不一樣的元數據存儲,分爲以下狀況:api
一 Metastore 內嵌模式(embeded)bash
說明:
本模式使用Derby 服務器存儲,能提供單進程存儲服務,沒法啓動多個客戶端(注:測試cli不能啓動多個,可是使用hiveserver2可使用多個beeline開多個會話訪問),多用戶時併發訪問,不適合使用
Derby默認會在調用 hive 命令所在目錄metastore_db持久化元數據,建議修改。
準備:
Hadoop 安裝 (略),本次使用的是Hadoop僞分佈式模式,保證有以下進程:
安裝:
1)解壓:tar -xvf hive-0.12.0-cdh5.0.1.tar.gz
2)配置環境變量:vi ~/.bashrc
Shell代碼
source ~/.bashrc
3)修改配置文件:
(1)修改hive-env.sh 指定hadoop 位置
cp hive-env.sh.template hive-env.sh
vi hive-env.sh
Shell代碼
(2)修改hive-site.xml指定倉庫目錄,元數據存儲目錄等:
Xml代碼
(3)修改 hive-log4j.properties 指定日誌輸出路徑
cp hive-log4j.properties.template hive-log4j.properties
Properties代碼
4)啓動cli服務
------------------------------------------------------------------------------------------------------
[zero@CentOS-StandAlone conf]$ hive
......
hive> create table test (id int);
OK
Time taken: 0.419 seconds
hive> show tables;
OK
test
Time taken: 0.064 seconds, Fetched: 1 row(s)
------------------------------------------------------------------------------------------------------
注:cli服務只能有一個會話使用,另啓動一個會話會出現鏈接不上metastore錯誤
------------------------------------------------------------------------------------------------------
[zero@CentOS-StandAlone ~]$ hive
......
hive> show tables;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
hive>
------------------------------------------------------------------------------------------------------
5)使用beeline
(1)後臺啓動hiveserver2:nohup hive --service hiveserver2 &
------------------------------------------------------------------------------------------------------
[zero@CentOS-StandAlone ~]$ nohup hive --service hiveserver2 &
[1] 28026
[zero@CentOS-StandAlone ~]$ nohup: ignoring input and appending output to `nohup.out'
^C
[zero@CentOS-StandAlone ~]$ jps -lm
28096 sun.tools.jps.Jps -lm
28026 org.apache.hadoop.util.RunJar /home/zero/hive/hive-0.12.0-cdh5.0.1/lib/hive-service-0.12.0-cdh5.0.1.jar org.apache.hive.service.server.HiveServer2
10594 org.apache.zookeeper.server.quorum.QuorumPeerMain /home/zero/zookeeper/zookeeper-3.4.5-cdh5.0.1/bin/../conf/zoo.cfg
4835 org.apache.hadoop.hdfs.server.namenode.NameNode
5239 org.apache.hadoop.yarn.server.nodemanager.NodeManager
5010 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
4889 org.apache.hadoop.hdfs.server.datanode.DataNode
------------------------------------------------------------------------------------------------------
注:beeline依賴hiveserver2提供的thirft服務,必須啓動
hiveserver2默認提供端口爲1000
(2)啓動beeline:
------------------------------------------------------------------------------------------------------
[zero@CentOS-StandAlone conf]$ beeline
Beeline version 0.12.0-cdh5.0.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
scan complete in 6ms
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: zero 注這裏要寫hadoop的用戶,不然鏈接上沒有hdfs操做權限,密碼沒有直接回車便可。
Enter password for jdbc:hive2://localhost:10000:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/zero/hadoop/hadoop-2.3.0-cdh5.0.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/zero/hive/hive-0.12.0-cdh5.0.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show tables;
+-----------+
| tab_name |
+-----------+
| test |
+-----------+
1 row selected (2.679 seconds)
------------------------------------------------------------------------------------------------------
注:beeline鏈接到hiveserver2 , hiveserver2 經過api訪問metastore,貌似beeline能夠起多個會話,不影響。由於hiveserver2服務是惟一的。
二 Metastore 本地模式(local)
說明:
本地模式與內嵌模式最大的區別在與數據庫有內嵌於hive服務變成獨立部署,hive服務使用jdbc訪問元數據,多個服務能夠同時訪問。
安裝:
1)安裝mysql (略)
2)安裝mysql connector:下載 mysql-connector-java-5.1.31.jar 放置 $HIVE_HOME/lib
3)建立數據庫及用戶
(1)建立數據庫:
------------------------------------------------------------------------------------------------------
[zero@CentOS-StandAlone ~]$ mysql -u root -p -h 127.0.0.1
Enter password:
......
mysql> CREATE DATABASE metastore;
Query OK, 1 row affected (0.11 sec)
mysql> USE metastore;
Database changed
mysql> SOURCE /home/zero/hive/hive-0.12.0-cdh5.0.1/scripts/metastore/upgrade/mysql/hive-schema-0.12.0.mysql.sql;
------------------------------------------------------------------------------------------------------
(2)建立hive用戶並受權
------------------------------------------------------------------------------------------------------
mysql> CREATE USER 'hive'@'%' IDENTIFIED BY '1234_qwer';
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE USER 'hive'@'127.0.0.1' IDENTIFIED BY '1234_qwer';
Query OK, 0 rows affected (0.04 sec)
mysql> CREATE USER 'hive'@'CentOS-StandAlone' IDENTIFIED BY '1234_qwer';
Query OK, 0 rows affected (0.01 sec)
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO 'hive'@'%';
Query OK, 0 rows affected (0.03 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.02 sec)
------------------------------------------------------------------------------------------------------
4)解壓:同上
5)配置環境變量(同上)
6)修改配置文件:
(1)修改hive-env.sh(同上)
(2)修改hive-site.xml
Xml代碼
7)啓動和使用方法同上
三 Metastroe 遠程模式(remote)
說明:
遠程模式,原內嵌與hive服務的metastore服務獨立出來單獨運行,hive服務經過thrift訪問metastore,這種模式能夠控制到數據庫的鏈接等。
部署規劃:
(1)元數據服務器:部署metastore 服務和mysql
(2)hiveserver服務器:部署hiveserver2服務,經過thrift訪問metastore
(3)客戶服務器:部署hive客戶端腳本,能夠基於cli或beeline或直接使用thrift訪問hiveserver2
安裝:
安裝過程同本地模式,只須要修改本地模式中的hive-site.xml 增長metastroe的訪問地址以下,並在不一樣的服務器上部署。
Xml代碼
啓動:
(1)元數據服務器:
service mysql start
nohup hive --service metastore &
(2)hiveserver服務器:
nohup hive --service hiveserver2 &
(3)客戶端服務器:
腳本:showtables.sql
Txt代碼
其中
第一行:鏈接
第二行:用戶名
第三行:密碼
第四行:命令
執行:
------------------------------------------------------------------------------------------------------
[zero@CentOS-StandAlone test]$ beeline -f test.sql
Beeline version 0.12.0-cdh5.0.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
scan complete in 5ms
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: zero
Enter password for jdbc:hive2://localhost:10000:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/zero/hadoop/hadoop-2.3.0-cdh5.0.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/zero/hive/hive-0.12.0-cdh5.0.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show tables;
+-----------+
| tab_name |
+-----------+
| test |
| test2 |
+-----------+
2 rows selected (0.583 seconds)
0: jdbc:hive2://localhost:10000> Closing: org.apache.hive.jdbc.HiveConnection
------------------------------------------------------------------------------------------------------