SequoiaDB 與 Hive 集成

SequoiaDB與Hadoop部署

SequoiaDB與Hadoop在物理上部署方案以下圖所示,部署建議以下: java

l  SequoiaDB與Hadoop部署在相同的物理設備上,以減小Hadoop與SequoiaDB之間的網絡數據傳輸; shell

l  每一個物理設備上都部署一個協調節點和多個數據節點,編目節點可選在任意三臺物理設備各部署一個編目節點; 網絡

捕獲

SequoiaDB支持的Hive 版本列表

n  Hive 0.11.0 app

n  Hive 0.10.0 jsp

配置方法

  1. 安裝和配置好Hadoop/Hive 環境,啓動hadoop環境;
  2. 拷貝sequoiadb安裝目錄下(默認在/opt/sequoiadb) 的hadoop/hive-sequoiadb.jar 和 java/sdbdriver.jar 兩個文件拷貝到 hive/lib 安裝目錄下;
  3. 修改hive 安裝目錄下的 bin/hive-site.xml文件(若是不存在,可拷貝$HIVE_HOME/conf/hive-default.xml.template爲 hive-site.xml文件 ),增長以下屬性(假設Hive 安裝在 /opt/hive 目錄):

<property> oop

   <name>hive.aux.jars.path</name>   <value>file:///opt/hive/lib/hive-sequoiadb.jar,file:///opt/hive/lib/sdbdirver.jar</value> spa

   <description>Sequoiadb store handler jar file</description> 命令行

</property> orm

 

<property> xml

<name> hive.auto.convert.join</name>

<value>false</value>

</property>

使用方法

建立基於SequoiaDB的表:

啓動hive shell 命令行窗口,執行以下命令建立數據表;

hive> create external table sdb_tab(id INT, name STRING, value DOUBLE) stored by 「com.sequoiadb.hive.SdbHiveStorageHandler」 tblproperties(「sdb.address」 = 「localhost:50000」;)

OK

Time taken: 0.386 seconds

其中:

Sdb.address 用於指定SequoiaDB協調節點的IP和端口,若是有多個協調節點,能夠寫入多個,之間用逗號隔開;

從HDFS文件中倒入數據到SequoiaDB表:

hive> insert overwrite table sdb_tab select * from hdfs_tab;

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there’s no reduce operator

Starting Job = job_201310172156_0010, Tracking URL = http://bl465-5:50030/jobdetails.jsp?jobid=job_201310172156_0010

Kill Command = /opt/hadoop-hive/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201310172156_0010

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2013-10-18 04:44:47,733 Stage-0 map = 0%,  reduce = 0%

2013-10-18 04:44:49,763 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.85 sec

2013-10-18 04:44:50,777 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.85 sec

2013-10-18 04:44:51,795 Stage-0 map = 100%,  reduce = 100%, Cumulative CPU 1.85 sec

MapReduce Total cumulative CPU time: 1 seconds 850 msec

Ended Job = job_201310172156_0010

10 Rows loaded to sdb_tab

MapReduce Jobs Launched:

Job 0: Map: 1   Cumulative CPU: 1.85 sec   HDFS Read: 2301 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 850 msec

OK

Time taken: 12.201 seconds

說明:在導入數據到SequoiaDB表以前,請確保已經建立基於HDFS文件的 hdfs_tab數據表,並Load了數據;

查詢數據:

hive> select * from new_tab;

OK

0       false   0.0     ALGERIA

1       true    1.0     ARGENTINA

2       true    1.0     BRAZIL

3       true    1.0     CANADA

4       true    4.0     EGYPT

5       false   0.0     ETHIOPIA

6       true    3.0     FRANCE

7       true    3.0     GERMANY

8       true    2.0     INDIA

9       true    2.0     INDONESIA

Time taken: 0.306 seconds, Fetched: 10 row(s)

相關文章
相關標籤/搜索