Hive經過JavaAPI操做

時間 2019-11-11

標籤 hive 經過 javaapi 欄目 Hadoop 简体版

原文原文鏈接

Java 想要訪問Hive，須要經過beeline的方式鏈接Hive，hiveserver2提供了一個新的命令行工具beeline，hiveserver2 對以前的hive作了升級，功能更增強大，它增長了權限控制，要使用beeline須要先啓動hiverserver2，再使用beeline鏈接css

基於hadoop的Hive數據倉庫JavaAPI簡單調用的實例，關於Hive的簡介在此不贅述。hive提供了三種用戶接口：CLI，JDBC/ODBC和 WebUIhtml

CLI，即Shell命令行
JDBC/ODBC 是 Hive 的Java，與使用傳統數據庫JDBC的方式相似
WebGUI是經過瀏覽器訪問 Hive
本文主要介紹的就是第二種用戶接口及Beeline的簡單使用。

1.Beeline的使用

1.啓動hiveserver2java

$ hiveserver2

2.啓動beeline鏈接hivesql

$ beeline -u jdbc:hive2://192.168.150.1:10000 -n hadoop -p

參數的解釋:
-u：鏈接url，可使用IP，也可使用主機名，端口默認爲10000
-n：鏈接的用戶名（注：不是登陸hive的用戶名，是hive所在服務器登陸用戶名）
-p：密碼，能夠不用輸入shell

可使用以下命令來修改端口號(默認端口號是:10000):數據庫

hiveserver2 --hiveconf hive.server2.thrift.port=10001

若是不知道beeline怎麼使用，可使用以下命令來查看beeline的使用幫助apache

$ beeline --help

具體以下:api

[hadoop@hadoop ~]$ beeline --help
Usage: java org.apache.hive.cli.beeline.BeeLine 
   -u <database url> the JDBC URL to connect to    -r reconnect to last saved connect url (in conjunction with !save)    -n <username> the username to connect as    -p <password> the password to connect as    -d <driver class> the driver class to use    -i <init file> script file for initialization    -e <query> query that should be executed    -f <exec file> script file that should be executed    -w (or) --password-file <password file> the password file to read password from    --hiveconf property=value Use value for given property    --hivevar name=value hive variable name and value                                    This is Hive specific settings in which variables
                                   can be set at session level and referenced in Hive
                                   commands or queries.
   --property-file=<property-file> the file to read connection properties (url, driver, user, password) from    --color=[true/false] control whether color is used for display    --showHeader=[true/false] show column names in query results    --headerInterval=ROWS; the interval between which heades are displayed    --fastConnect=[true/false] skip building table/column list for tab-completion    --autoCommit=[true/false] enable/disable automatic transaction commit    --verbose=[true/false] show verbose error messages and debug info    --showWarnings=[true/false] display connection warnings    --showDbInPrompt=[true/false] display the current database name in the prompt    --showNestedErrs=[true/false] display nested errors    --numberFormat=[pattern] format numbers using DecimalFormat pattern    --force=[true/false] continue running script even after errors    --maxWidth=MAXWIDTH the maximum width of the terminal    --maxColumnWidth=MAXCOLWIDTH the maximum width to use when displaying columns    --silent=[true/false] be more silent    --autosave=[true/false] automatically save preferences    --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv] format mode for result display                                    Note that csv, and tsv are deprecated - use csv2, tsv2 instead
   --incremental=[true/false] Defaults to false. When set to false, the entire result set                                    is fetched and buffered before being displayed, yielding optimal
                                   display column sizing. When set to true, result rows are displayed
                                   immediately as they are fetched, yielding lower latency and
                                   memory usage at the price of extra display column padding.
                                   Setting --incremental=true is recommended if you encounter an OutOfMemory
                                   on the client side (due to the fetched result set size being large).
                                   Only applicable if --outputformat=table.
   --incrementalBufferRows=NUMROWS the number of rows to buffer when printing rows on stdout,                                    defaults to 1000; only applicable if --incremental=true
                                   and --outputformat=table
   --truncateTable=[true/false] truncate table column when it exceeds length    --delimiterForDSV=DELIMITER specify the delimiter for delimiter-separated values output format (default: |)    --isolation=LEVEL set the transaction isolation level    --nullemptystring=[true/false] set to true to get historic behavior of printing null as empty string    --maxHistoryRows=MAXHISTORYROWS The maximum number of rows to store beeline history.    --help display this message 
   Example:
    1. Connect using simple authentication to HiveServer2 on localhost:10000
    $ beeline -u jdbc:hive2://localhost:10000 username password

    2. Connect using simple authentication to HiveServer2 on hs.local:10000 using -n for username and -p for password
    $ beeline -n username -p password -u jdbc:hive2://hs2.local:10012

    3. Connect using Kerberos authentication with hive/localhost@mydomain.com as HiveServer2 principal
    $ beeline -u "jdbc:hive2://hs2.local:10013/default;principal=hive/localhost@mydomain.com"

    4. Connect using SSL connection to HiveServer2 on localhost at 10000
    $ beeline "jdbc:hive2://localhost:10000/default;ssl=true;sslTrustStore=/usr/local/truststore;trustStorePassword=mytruststorepassword"

    5. Connect using LDAP authentication
    $ beeline -u jdbc:hive2://hs2.local:10013/default <ldap-username> <ldap-password>

若是使用beeline鏈接時報了以下錯:瀏覽器

hadoop is not allowed to impersonate hadoop (state=08S01,code=0)

緣由：hiveserver2增長了權限控制，須要在hadoop的配置文件中配置
解決方法：在hadoop的core-site.xml中添加以下內容，而後重啓hadoop，再使用beeline鏈接便可ruby

參考官網： https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html

<property>
    <name>hadoop.proxyuser.hadoop.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.hadoop.groups</name>
    <value>*</value>
</property>

鏈接成功後，和執行hive後相同執行shell命令便可，若是想要退出鏈接使用 !q 或 !quit 命令

2.Java API 操做 Hive 的例子

用idea工具建立一個maven項目，pom.xml文件配置以下:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.bigdata.hadoop</groupId>
    <artifactId>hive</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.3.0</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.9</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.5.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

建立測試類HiveJDBC，代碼以下
官網參考：https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

package com.bigdata.hadoop.hive;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.sql.*;

/** * JDBC 操做 Hive（注：JDBC 訪問 Hive 前須要先啓動HiveServer2） */
public class HiveJDBC {

    private static String driverName = "org.apache.hive.jdbc.HiveDriver";
    private static String url = "jdbc:hive2://hdpcomprs:10000/db_comprs";
    private static String user = "hadoop";
    private static String password = "";

    private static Connection conn = null;
    private static Statement stmt = null;
    private static ResultSet rs = null;

    // 加載驅動、建立鏈接
    @Before
    public void init() throws Exception {
        Class.forName(driverName);
        conn = DriverManager.getConnection(url,user,password);
        stmt = conn.createStatement();
    }

    // 建立數據庫
    @Test
    public void createDatabase() throws Exception {
        String sql = "create database hive_jdbc_test";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查詢全部數據庫
    @Test
    public void showDatabases() throws Exception {
        String sql = "show databases";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1));
        }
    }

    // 建立表
    @Test
    public void createTable() throws Exception {
        String sql = "create table emp(\n" +
                        "empno int,\n" +
                        "ename string,\n" +
                        "job string,\n" +
                        "mgr int,\n" +
                        "hiredate string,\n" +
                        "sal double,\n" +
                        "comm double,\n" +
                        "deptno int\n" +
                        ")\n" +
                     "row format delimited fields terminated by '\\t'";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查詢全部表
    @Test
    public void showTables() throws Exception {
        String sql = "show tables";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1));
        }
    }

    // 查看錶結構
    @Test
    public void descTable() throws Exception {
        String sql = "desc emp";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1) + "\t" + rs.getString(2));
        }
    }

    // 加載數據
    @Test
    public void loadData() throws Exception {
        String filePath = "/home/hadoop/data/emp.txt";
        String sql = "load data local inpath '" + filePath + "' overwrite into table emp";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查詢數據
    @Test
    public void selectData() throws Exception {
        String sql = "select * from emp";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        System.out.println("員工編號" + "\t" + "員工姓名" + "\t" + "工做崗位");
        while (rs.next()) {
            System.out.println(rs.getString("empno") + "\t\t" + rs.getString("ename") + "\t\t" + rs.getString("job"));
        }
    }

    // 統計查詢（會運行mapreduce做業）
    @Test
    public void countData() throws Exception {
        String sql = "select count(1) from emp";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getInt(1) );
        }
    }

    // 刪除數據庫
    @Test
    public void dropDatabase() throws Exception {
        String sql = "drop database if exists hive_jdbc_test";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 刪除數據庫表
    @Test
    public void deopTable() throws Exception {
        String sql = "drop table if exists emp";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 釋放資源
    @After
    public void destory() throws Exception {
        if ( rs != null) {
            rs.close();
        }
        if (stmt != null) {
            stmt.close();
        }
        if (conn != null) {
            conn.close();
        }
    }
}