Java進行Hive的鏈接和訪問

時間 2019-11-09

標籤 java 進行 hive 鏈接訪問欄目 Java 简体版

原文原文鏈接

今天看了一遍不錯的文章，關於Java訪問Hive的，正好要用到這一塊，分享到此以便更多的人能夠學習和應用java

很是感謝博主的總結和分享python

博文連接： https://www.jianshu.com/p/4ef28607fc04程序員

Hive內置服務與HiveServer2應用

內置服務介紹

咱們執行hive --service help查看內置的服務幫助，圖中的Service List右側羅列了不少Hive支持的服務列表，種類不少。web

下面介紹最有用的一些服務：sql

（1）clishell

cli是Command Line Interface 的縮寫，是Hive的命令行界面，用的比較多，是默認服務，直接能夠在命令行裏使用。數據庫

（2）hiveserverapache

這個可讓Hive以提供Thrift服務的服務器形式來運行，能夠容許許多個不一樣語言編寫的客戶端進行通訊，使用須要啓動HiveServer服務以和客戶端聯繫，咱們能夠經過設置HIVE_PORT環境變量來設置服務器所監聽的端口，在默認狀況下，端口號爲10000。
咱們可使用以下的指令啓動該服務：hive --service hiveserver -p 10002，其中-p參數也是用來指定監聽端口的。編程

（3）hwi瀏覽器

其實就是hive web interface的縮寫它是hive的web藉口，是hive cli的一個web替代方案。

（4）jar

與hadoop jar等價的Hive接口，這是運行類路徑中同時包含Hadoop 和Hive類的Java應用程序的簡便方式。

（5）metastore

在默認的狀況下，metastore和hive服務運行在同一個進程中，使用這個服務，可讓metastore做爲一個單獨的進程運行，咱們能夠經過METASTOE——PORT來指定監聽的端口號。

Hive的三種啓動方式

hive shell模式

bin/hive 或者 bin/hive –-service cli

hive web界面啓動模式

bin/hive –-service hwi &， & 表示後臺運行。咱們後臺啓動hwi服務，而後輸入jps查看進程發現多了一個RunJar，代表咱們的hive hwi啓動成功。

用於經過瀏覽器來訪問hive，感受沒多大用途，瀏覽器訪問地址是：http://huatec01:9999/hwi/

啓動示意圖：

瀏覽器訪問：

hive遠程服務 (端口號10000) 啓動方式

bin/hive --service hiveserver2 &

用java，python等程序實現經過jdbc等驅動的訪問hive就用這種起動方式了，這個是程序員最須要的方式了。

HiveServer與HiveServer2

HiveServer2介紹

HiveServer與HiveServer2，二者都容許遠程客戶端使用多種編程語言，經過HiveServer或者HiveServer2，客戶端能夠在不啓動CLI的狀況下對Hive中的數據進行操做，連這個和都容許遠程客戶端使用多種編程語言如java，python等向hive提交請求，取回結果。

官方說明：

HiveServer is scheduled to be removed from Hive releases starting Hive 0.15. See HIVE-6977. Please switch over to HiveServer2.

從hive0.15起就再也不支持hiveserver了(個人hive版本爲2.1.1)，可是在這裏咱們仍是要說一下hiveserver,其實在前面的Server List中就不包含hiveserver。

咱們也能夠嘗試執行bin/hive –-service hiveserver，會輸出日誌提示Service hiveserver not found。

HiveServer或者HiveServer2都是基於Thrift的，但HiveSever有時被稱爲Thrift server，而HiveServer2卻不會。既然已經存在HiveServer，爲何還須要HiveServer2呢？

這是由於HiveServer不能處理多於一個客戶端的併發請求，這是因爲HiveServer使用的Thrift接口所致使的限制，不能經過修改HiveServer的代碼修正。所以在Hive-0.11.0版本中重寫了HiveServer代碼獲得了HiveServer2，進而解決了該問題。HiveServer2支持多客戶端的併發和認證，爲開放API客戶端如JDBC、ODBC提供更好的支持。

HiveServer與HiveServer2的區別

Hiveserver和hiveserver2的JDBC區別：

HiveServer version               Connection URL                    Driver Class 
HiveServer2                          jdbc:hive2://:                          org.apache.hive.jdbc.HiveDriver
HiveServer                          jdbc:hive://:                            org.apache.hadoop.hive.jdbc.HiveDriver

HiveServer2的配置

Hiveserver2容許在配置文件hive-site.xml中進行配置管理，具體的參數爲：

hive.server2.thrift.min.worker.threads– 最小工做線程數，默認爲5。  
hive.server2.thrift.max.worker.threads – 最小工做線程數，默認爲500。  
hive.server2.thrift.port– TCP 的監聽端口，默認爲10000。  
hive.server2.thrift.bind.host– TCP綁定的主機，默認爲localhost

咱們能夠在hive-site.xml文件中搜索「hive.server2.thrift.min.worker.threads」屬性（hive-site.xml文件配置屬性達到5358行，太長了，建議搜索），而後進行編輯，示例以下：

從Hive-0.13.0開始，HiveServer2支持經過HTTP傳輸消息，該特性當客戶端和服務器之間存在代理中介時特別有用。與HTTP傳輸相關的參數以下：

hive.server2.transport.mode – 默認值爲binary（TCP），可選值HTTP。  
hive.server2.thrift.http.port– HTTP的監聽端口，默認值爲10001。  
hive.server2.thrift.http.path – 服務的端點名稱，默認爲 cliservice。  
hive.server2.thrift.http.min.worker.threads– 服務池中的最小工做線程，默認爲5。  
hive.server2.thrift.http.max.worker.threads– 服務池中的最小工做線程，默認爲500。

咱們同理能夠進行搜索，而後進行配置。

啓動HiveServer2

啓動Hiveserver2有兩種方式，一種是上面已經介紹過的hive --service hiveserver2，另外一種更爲簡潔，爲hiveserver2。

咱們採用第二種方式啓動hiveserver2,以下圖所示：

啓動後hiveserver2會在前臺運行，咱們開啓一個新的SSH連接，使用jps查看會發現多出一個RunJar進程，它表明的就是HiveServer2服務。

使用hive--service hiveserver2 –H或hive--service hiveserver2 –help查看幫助信息。

默認狀況下，HiveServer2以提交查詢的用戶執行查詢（true），若是hive.server2.enable.doAs設置爲false，查詢將以運行hiveserver2進程的用戶運行。爲了防止非加密模式下的內存泄露，能夠經過設置下面的參數爲true禁用文件系統的緩存

fs.hdfs.impl.disable.cache – 禁用HDFS文件系統緩存，默認值爲false。  
fs.file.impl.disable.cache – 禁用本地文件系統緩存，默認值爲false。

瀏覽器查看http://huatec01:10002，以下圖所示：

配置和使用HiveServer2

配置堅挺端口和路徑

<property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
  </property>
  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>huatec01</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
  </property>

第一個屬性默認便可，第二個將主機名改成咱們當前安裝hive的節點。

設置impersonation

這樣hive server會以提交用戶的身份去執行語句，若是設置爲false，則會以起hive server daemon的admin user來執行語句。

<property>
    <name>hive.server2.enable.doAs</name>
    <value>true</value>
    <description>
      Setting this property to true will have HiveServer2 execute
      Hive operations as the user making the calls to it.
    </description>
  </property>

咱們將值改成true。

hiveserver2節點配置

Hiveserver2已經再也不須要hive.metastore.local這個配置項了,咱們配置hive.metastore.uris，若是該屬性值爲空，則表示是metastore在本地，不然就是遠程。

<property>
    <name>hive.metastore.uris</name>
    <value/>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>

默認留空，也就是metastore在本地，使用默認便可。

若是想要配置爲遠程的話，參考以下：

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://xxx.xxx.xxx.xxx:9083</value>
</property>

zookeeper配置

<property>
    <name>hive.support.concurrency</name>
    <value>true</value>
    <description>
      Whether Hive supports concurrency control or not. 
      A ZooKeeper instance must be up and running when using zookeeper Hive lock manager 
    </description>
  </property>
 <property>
    <name>hive.zookeeper.quorum</name>
    <value>huatec03:2181,huatec04:2181,huatec05:2181</value>
    <description>
      List of ZooKeeper servers to talk to. This is needed for: 
      1. Read/write locks - when hive.lock.manager is set to 
      org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager, 
      2. When HiveServer2 supports service discovery via Zookeeper.
      3. For delegation token storage if zookeeper store is used, if
      hive.cluster.delegation.token.store.zookeeper.connectString is not set
      4. LLAP daemon registry service
    </description>
  </property>

屬性1設置支持併發，屬性2設置Zookeeper集羣。

注意：沒有配置hive.zookeeper.quorum會致使沒法併發執行hive ql請求和致使數據異常。

hiveserver2的Web UI配置

Hive 2.0 之後才支持Web UI的，在之前的版本中並不支持。

<property>
    <name>hive.server2.webui.host</name>
    <value>0.0.0.0</value>
    <description>The host address the HiveServer2 WebUI will listen on</description>
  </property>
  <property>
    <name>hive.server2.webui.port</name>
    <value>10002</value>
    <description>The port the HiveServer2 WebUI will listen on. This can beset to 0 or a negative integer to disable the web UI</description>
  </property>

默認便可，咱們經過瀏覽器訪問：http://huatec01:10002便可訪問hiveserver2，這個前面已經試過了。

啓動服務

啓動metastore

bin/hive --service metastore &

啓動hiveserver2

bin/hive --service hiveserver2 &

WebUI：http://huatec01:10002

使用beeline控制檯控制hiveserver2

首先咱們必須啓動metastore和hiveserver2

而後啓動beeline

bin/beeline

嘗試鏈接metastore：

!connect jdbc:hive2://huatec01:10000 root root

以下圖代表鏈接成功！

beeline錯誤1

beeline鏈接hiveserver2失敗，報錯以下：

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: master is not allowed to impersonate hive (state=,code=0)

解決方法：

關閉hadoop集羣
修改core-site.xml文件，增長以下內容：

<property>
      <name>hadoop.proxyuser.hadoop.groups</name>
      <value>root</value>
      <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
 </property>
 
 <property>
      <name>hadoop.proxyuser.hadoop.hosts</name>
      <value>huatec01,127.0.0.1,localhost</value>
      <description>The superuser can connect only from host1 and host2 to impersonate a user</description>
  </property>

注意全部節點的core-site.xml都修改。

重啓hadoop集羣
啓動metastore和hiveserver2,從新鏈接hiveserver2。

beeline錯誤2

beeline鏈接hiveserver2成功，可是執行sql語句報錯，錯誤以下：

0: jdbc:hive2://huatec01:10000> show databases;
Error: java.io.IOException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:user.name%7D (state=,code=0)

解決方法：

修改hive-site.xml中的hive.exec.local.scratchdir屬性值。將${system:user.name}改成${user.name},以下所示：

<property>
    <name>hive.exec.local.scratchdir</name>
    <value>/huatec/apache-hive-2.1.1-bin/tmp/${user.name}</value>
    <description>Local scratch space for Hive jobs</description>
  </property>

從新使用beeline鏈接hiveserver2,執行sql語句，以下圖所示：

Java編程操做MetaStore

用java，python等程序實現經過jdbc等驅動的訪問hive，這須要咱們啓動hiveserver2。若是咱們可以使用beeline控制hiveserver2,那麼咱們毫無疑問是能夠經過Java代碼來訪問hive了。

若是beeline控制hiveserver2出現錯誤，也沒法執行sql，那麼請先解決這方面的錯誤，而後再進行代碼編程。

準備工做

新建maven java app項目，而後添加Hive依賴，咱們編寫junitc俄式代碼，因此也添加junit依賴，以下所示：

<!--junit-->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <!--hive jdbc-->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.1.1</version>
        </dependency>

編寫測試類

完整的類代碼以下：

package com.huatec.hive;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.sql.*;
/**
 * Created by zhusheng on 2018/1/2.
 */
public class HiveJDBC {
    private static String driverName = "org.apache.hive.jdbc.HiveDriver";
    private static String url = "jdbc:hive2://huatec01:10000/hive_jdbc_test";
    private static String user = "root";
    private static String password = "root";

    private static Connection conn = null;
    private static Statement stmt = null;
    private static ResultSet rs = null;

    // 加載驅動、建立鏈接
    @Before
    public void init() throws Exception {
        Class.forName(driverName);
        conn = DriverManager.getConnection(url,user,password);
        stmt = conn.createStatement();
    }

    // 建立數據庫
    @Test
    public void createDatabase() throws Exception {
        String sql = "create database hive_jdbc_test";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查詢全部數據庫
    @Test
    public void showDatabases() throws Exception {
        String sql = "show databases";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1));
        }
    }

    // 建立表
    @Test
    public void createTable() throws Exception {
        String sql = "create table emp(\n" +
                "empno int,\n" +
                "ename string,\n" +
                "job string,\n" +
                "mgr int,\n" +
                "hiredate string,\n" +
                "sal double,\n" +
                "comm double,\n" +
                "deptno int\n" +
                ")\n" +
                "row format delimited fields terminated by '\\t'";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查詢全部表
    @Test
    public void showTables() throws Exception {
        String sql = "show tables";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1));
        }
    }

    // 查看錶結構
    @Test
    public void descTable() throws Exception {
        String sql = "desc emp";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1) + "\t" + rs.getString(2));
        }
    }

    // 加載數據
    @Test
    public void loadData() throws Exception {
        String filePath = "/home/hadoop/data/emp.txt";
        String sql = "load data local inpath '" + filePath + "' overwrite into table emp";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查詢數據
    @Test
    public void selectData() throws Exception {
        String sql = "select * from emp";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        System.out.println("員工編號" + "\t" + "員工姓名" + "\t" + "工做崗位");
        while (rs.next()) {
            System.out.println(rs.getString("empno") + "\t\t" + rs.getString("ename") + "\t\t" + rs.getString("job"));
        }
    }

    // 統計查詢（會運行mapreduce做業）
    @Test
    public void countData() throws Exception {
        String sql = "select count(1) from emp";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getInt(1) );
        }
    }

    // 刪除數據庫
    @Test
    public void dropDatabase() throws Exception {
        String sql = "drop database if exists hive_jdbc_test";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 刪除數據庫表
    @Test
    public void deopTable() throws Exception {
        String sql = "drop table if exists emp";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 釋放資源
    @After
    public void destory() throws Exception {
        if ( rs != null) {
            rs.close();
        }
        if (stmt != null) {
            stmt.close();
        }
        if (conn != null) {
            conn.close();
        }
    }
}

須要注意的是，由於hive默認只有一個數據庫default，從前面的beeline訪問hiveserver2的時候咱們也能夠看出。若是咱們須要對默認數據庫進行操做的話，咱們的數據庫鏈接爲：

private static String url = "jdbc:hive2://huatec01:10000/default";

這裏我寫了一個建立數據庫的測試方法，其它的Sql操做都是基於該數據庫的，因此我修改個人數據庫鏈接爲我新建的數據庫。

private static String url = "jdbc:hive2://huatec01:10000/hive_jdbc_test";

測試函數比較多，我本地進行了測試都是能夠成功的，我選取其中的createTable測試函數爲例，截圖以下：

做者：Jusen 連接：https://www.jianshu.com/p/4ef28607fc04 來源：簡書簡書著做權歸做者全部，任何形式的轉載都請聯繫做者得到受權並註明出處。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。