1. 簡介php
Hive是基於Hadoop的一個數據倉庫工具,能夠將結構化的數據文件映射爲一張數據庫表,並提供完整的sql查詢功能,能夠將sql語句轉換爲MapReduce任務進行運行。 其優勢是學習成本低,能夠經過類SQL語句快速實現簡單的MapReduce統計,沒必要開發專門的MapReduce應用,十分適合數據倉庫的統計分析。html
Hive與HBase的整合功能的實現是利用二者自己對外的API接口互相進行通訊,相互通訊主要是依靠hive_hbase-handler.jar工具類, 大體意思如圖所示:java
2. Hive項目介紹node
Hive配置文件介紹
•hive-site.xml hive的配置文件
•hive-env.sh hive的運行環境文件
•hive-default.xml.template 默認模板
•hive-env.sh.template hive-env.sh默認配置
•hive-exec-log4j.properties.template exec默認配置
• hive-log4j.properties.template log默認配置
hive-site.xml
< property>
<name>javax.jdo.option.ConnectionURL</name> <value>jdbc:MySQL://localhost:3306/hive?createData baseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>test</value>
<description>password to use against metastore database</description>
</property>
mysql
hive-env.sh
•配置Hive的配置文件路徑
•export HIVE_CONF_DIR= your path
•配置Hadoop的安裝路徑
•HADOOP_HOME=your hadoop homelinux
咱們按數據元的存儲方式不一樣安裝。sql
3. 使用Derby數據庫安裝shell
Hadoop集羣配置:http://blog.csdn.net/hguisu/article/details/723739數據庫
hbase安裝配置:http://blog.csdn.net/hguisu/article/details/7244413apache
hive目前最新的版本是0.12,咱們先從http://mirror.bit.edu.cn/apache/hive/hive-0.12.0/ 上下載hive-0.12.0.tar.gz,可是請注意,此版本基因而基於hadoop1.3和hbase0.94的(若是安裝hadoop2.X ,咱們須要修改相應的內容)
tar zxvf hive-0.12.0.tar.gz
cd hive-0.12.0
拷貝protobuf.**.jar和zookeeper-3.4.5.jar到hive/lib下。
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- Hive Execution Parameters --> <property> <name>hive.exec.reducers.bytes.per.reducer</name> <value>1000000000</value> <description>size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.</description> </property> <property> <name>hive.exec.reducers.max</name> <value>999</value> <description>max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is negative, hive will use this one as the max number of reducers when automatically determine number of reducers.</description> </property> <property> <name>hive.exec.scratchdir</name> <value>/hive/scratchdir</value> <description>Scratch space for Hive jobs</description> </property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/${user.name}</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:;databaseName=metastore_db;create=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.apache.derby.jdbc.EmbeddedDriver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.PersistenceManagerFactoryClass</name> <value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value> <description>class implementing the jdo persistence</description> </property> <property> <name>javax.jdo.option.DetachAllOnCommit</name> <value>true</value> <description>detaches all objects from session so that they can be used after transaction is committed</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>APP</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>mine</value> <description>password to use against metastore database</description> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/hive/warehousedir</value> <description>location of default database for the warehouse</description> </property> <property> <name>hive.aux.jars.path</name> <value> file:///home/hadoop/hive-0.12.0/lib/hive-ant-0.13.0-SNAPSHOT.jar, file:///home/hadoop/hive-0.12.0/lib/protobuf-java-2.4.1.jar, file:///home/hadoop/hive-0.12.0/lib/hbase-client-0.96.0-hadoop2.jar, file:///home/hadoop/hive-0.12.0/lib/hbase-common-0.96.0-hadoop2.jar, file:///home/hadoop/hive-0.12.0/lib/zookeeper-3.4.5.jar, file:///home/hadoop/hive-0.12.0/lib/guava-11.0.2.jar </value> </property>
1).單節點啓動
#bin/hive -hiveconf hbase.master=master:490001
2) 集羣啓動:
#bin/hive -hiveconf hbase.zookeeper.quorum=node1,node2,node3
如何hive-site.xml文件中沒有配置hive.aux.jars.path,則能夠按照以下方式啓動。
bin/hive --auxpath /usr/local/hive/lib/hive-hbase-handler-
0.96
.
0
.jar, /usr/local/hive/lib/hbase-
0.96
.jar, /usr/local/hive/lib/zookeeper-
3.3
.
2
.jar -hiveconf hbase.zookeeper.quorum=node1,node2,node3
注:使用derby存儲方式時,運行hive會在當前目錄生成一個derby文件和一個metastore_db目錄。這種存儲方式的弊端是在同一個目錄下同時只能有一個hive客戶端能使用數據庫,不然報錯。
4. 使用MYSQL數據庫的方式安裝
安裝MySQL
• Ubuntu 採用apt-get安裝
• sudo apt-get install mysql-server
• 創建數據庫hive
• create database hivemeta
• 建立hive用戶,並受權
• grant all on hive.* to hive@'%' identified by 'hive';
• flush privileges;
咱們直接修改hive-site.xml就能夠啦。
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.warehouse.dir</name> <value>/hive/warehousedir</value> </property> <property> <name>hive.metastore.local</name> <value>false</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://192.168.1.214:9083</value> </property> </configuration>
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.exec.scratchdir</name> <value>/hive/scratchdir</value> <description>Scratch space for Hive jobs</description> </property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/${user.name}</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.1.214:3306/hiveMeta?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/hive/warehousedir</value> <description>location of default database for the warehouse</description> </property> <property> <name>hive.aux.jars.path</name> <value> file:///home/hadoop/hive-0.12.0/lib/hive-ant-0.13.0-SNAPSHOT.jar, file:///home/hadoop/hive-0.12.0/lib/protobuf-java-2.4.1.jar, file:///home/hadoop/hive-0.12.0/lib/hbase-client-0.96.0-hadoop2.jar, file:///home/hadoop/hive-0.12.0/lib/hbase-common-0.96.0-hadoop2.jar, file:///home/hadoop/hive-0.12.0/lib/zookeeper-3.4.5.jar, file:///home/hadoop/hive-0.12.0/lib/guava-11.0.2.jar </value> <property> <name>hive.metastore.uris</name> <value>thrift://192.168.1.214:9083</value> </property> </property>
4. 與Hbase整合
以前咱們測試建立表的都是建立本地表,非hbase對應表。如今咱們整合回到hbase。
CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");
hbase.table.name 定義在hbase的table名稱
hbase.columns.mapping 定義在hbase的列族
在hbase 下也能看到,兩邊新增數據都能實時看到。
能夠登陸Hbase去查看數據了
#bin/hbase shell
hbase(main):001:0> describe 'xyz'
hbase(main):002:0> scan 'xyz'
hbase(main):003:0> put 'xyz','100','cf1:val','www.360buy.com'
這時在Hive中能夠看到剛纔在Hbase中插入的數據了。
使用sql導入hbase_table_1:
hive> INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=86;
使用CREATE EXTERNAL TABLE:
CREATE EXTERNAL TABLE hbase_table_2(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val") TBLPROPERTIES("hbase.table.name" = "some_existing_table");
內容參考:http://wiki.apache.org/hadoop/Hive/HBaseIntegration
5. 問題
bin/hive 執行show tables 報錯:
Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
若是是使用Derby數據庫的安裝方式,查看
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive/warehousedir</value>
<description>location of default database for the warehouse</description>
</property>
配置是否正確,
或者
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
是否有權限訪問。
若是配置了mysql的Metastore方式,檢查的權限:
bin/hive -hiveconf hive.root.logger=DEBUG,console
而後show tables 就會看到ava.sql.SQLException: Access denied for user 'hive'@'××××8' (using password: YES) 之類從錯誤消息。
執行
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
報錯:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.MasterNotRunningException: Retried 10 times
出現這個錯誤的緣由是引入的hbase包和hive自帶的hive包衝突,刪除hive/lib下的 hbase-0.94.×××.jar, OK了。
同時也要移走hive-0.12**.jar 包。
執行
hive>select uid from user limit 100;
Java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
解決:修改$HIVE_HOME/conf/hive-env.sh文件,加入
export HADOOP_HOME=hadoop的安裝目錄
5. 經過thrift訪問hive(使用php作客戶端)
使用php鏈接hive的條件:
wget http://mirror.bjtu.edu.cn/apache//thrift/0.9.1/thrift-0.9.1.tar.gz
tar -xzf thrift-0.9.1.tar.gz
若是是源碼編譯的,首先要使用./boostrap.sh建立文件./configure ,咱們這下載的tar包,自帶有configure文件了。((能夠查閱README文件))
If you are building from the first time out of the source repository, you will
need to generate the configure scripts. (This is not necessary if you
downloaded a tarball.) From the top directory, do:
./bootstrap.sh
./configure
1 須要安裝thrift 安裝步驟
# ./configure --without-ruby
不要使用ruby,
make ; make install
若是沒有安裝libevent libevent-devel的應該先安裝這兩個依賴庫yum -y install libevent libevent-devel
其實Thrift就是使用來生成客戶端和服務器端代碼的。在這裏沒用到。
安裝好後啓動hive thrift
# ./hive --service hiveserver >/dev/null 2>/dev/null &
查看hiveserver默認端口是否打開10000 若是打開表示成功,在官網wiki有介紹文章:https://cwiki.apache.org/confluence/display/Hive/HiveServer
HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer is built on Apache ThriftTM(http://thrift.apache.org/), therefore it is sometimes called the Thrift server although this can lead to confusion because a newer service named HiveServer2 is also built on Thrift.
Thrift's interface definition language (IDL) file for HiveServer is hive_service.thrift
, which is installed in $HIVE_HOME/service/if/
.
Once Hive has been built using steps in Getting Started, the Thrift server can be started by running the following:
$ build
/dist/bin/hive
--service hiveserver --help
usage: hiveserver
-h,--help Print help information
--hiveconf <property=value> Use value
for
given property
--maxWorkerThreads <arg> maximum number of worker threads,
default:2147483647
--minWorkerThreads <arg> minimum number of worker threads,
default:100
-p <port> Hive Server port number, default:10000
-
v
,--verbose Verbose mode
$ bin
/hive
--service hiveserver
|
下載php客戶端包:
其實hive-0.12包中自帶的php lib,經測試,該包報php語法錯誤。命名空間的名稱居然是空的。
我上傳php客戶端包:http://download.csdn.net/detail/hguisu/6913673(源下載http://download.csdn.net/detail/jiedushi/3409880)
php鏈接hive客戶端代碼
<?php // php鏈接hive thrift依賴包路徑 ini_set('display_errors', 1); error_reporting(E_ALL); $GLOBALS['THRIFT_ROOT'] = dirname(__FILE__). "/"; // load the required files for connecting to Hive require_once $GLOBALS['THRIFT_ROOT'] . 'packages/hive_service/ThriftHive.php'; require_once $GLOBALS['THRIFT_ROOT'] . 'transport/TSocket.php'; require_once $GLOBALS['THRIFT_ROOT'] . 'protocol/TBinaryProtocol.php'; // Set up the transport/protocol/client $transport = new TSocket('192.168.1.214', 10000); $protocol = new TBinaryProtocol($transport); //$protocol = new TBinaryProtocolAccelerated($transport); $client = new ThriftHiveClient($protocol); $transport->open(); // run queries, metadata calls etc $client->execute('show tables'); var_dump($client->fetchAll()); $transport->close(); ?>
打開瀏覽器瀏覽http://localhost/Thrift/test.php就能夠看到查詢結果了