Hadoop 數據倉庫工具——Hive

1.安裝Mysql html

  a.在官網下載 Mysql 8.0 (mysql-8.0.16-winx64.zip)並解壓,地址:https://dev.mysql.com/downloads/mysql/java

  b.在 Mysql 根目錄下 my.ini 文件和 data 文件夾,my.ini 內容以下:mysql

[mysqld]
# 設置3306端口
port=3306
# 設置mysql的安裝目錄
basedir=D:\Tools\mysql-8.0.16-winx64
# 設置mysql數據庫的數據的存放目錄
datadir=D:\Tools\mysql-8.0.16-winx64\data
# 容許最大鏈接數
max_connections=200
# 容許鏈接失敗的次數。這是爲了防止有人從該主機試圖攻擊數據庫系統
max_connect_errors=10
# 服務端使用的字符集默認爲UTF8
character-set-server=utf8
# 建立新表時將使用的默認存儲引擎
default-storage-engine=INNODB
[mysql]
# 設置mysql客戶端默認字符集
default-character-set=utf8
[client]
# 設置mysql客戶端鏈接服務端時默認使用的端口
port=3306
default-character-set=utf8

 

  c.新增系統環境變量 MYSQL_HOMED:\Tools\mysql-8.0.16-winx64,並在 Path 變量中添加 %MYSQL_HOME%\binsql

  d.以管理員的身份打開cmd窗口,並跳轉到 Mysql 的 bin 目錄下數據庫

    ①執行初始化命令:mysqld --initialize --user=mysql --console,並記住臨時密碼express

    ②執行安裝服務命令:mysqld -installapache

    ③執行啓動服務命令:net start mysqlwindows

    ④執行修改密碼命令:mysql -u root -p  (此時須要輸入①中的臨時密碼)api

    ⑤執行修改密碼語句:ALTER USER root@localhost IDENTIFIED  BY '123456';session

 

 

 

2.安裝Hive

  a.在官網下載 Hive(apache-hive-3.1.1-bin.tar.gz)並解壓,地址:http://mirror.bit.edu.cn/apache/hive/

  b.新增系統環境變量 HIVE_HOMED:\Tools\apache-hive-3.1.1-bin,並在 Path 變量中添加 %HIVE_HOME%\bin

  c.在 Hive 的 conf 目錄下建立 hive-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--
    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.
-->

<configuration>

 <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>
    <property>
    <name>hive.exec.local.scratchdir</name>    
    <value>D:/Tools/apache-hive-3.1.1-bin/scratch_dir</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>    
    <value>D:/Tools/apache-hive-3.1.1-bin/resources_dir/${hive.session.id}_resources</value>    
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
    <name>hive.querylog.location</name>
    <value>D:/Tools/apache-hive-3.1.1-bin/querylog_dir</value>
    <description>Location of Hive run time structured log file</description>
  </property>
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>D:/Tools/apache-hive-3.1.1-bin/operation_dir</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://127.0.0.1:3306/hive?serverTimezone=UTC&amp;createDatabaseIfNotExist=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
   <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>
      Enforce metastore schema version consistency.
      True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
            schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
            proper metastore schema migration. (Default)
      False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
    </description>
  </property>
  
  <!-- 設置hiveserver2外部訪問端口,默認爲10000 -->
  <property>
    <name>hive.server2.thrift.port</name>
    <value>10010</value>
  </property>
  <!-- 使用本地mr,解決使用Hadoop的mr時報錯的問題 -->
  <property>
    <name>hive.exec.mode.local.auto</name>
    <value>true</value>
    <description>Let Hive determine whether to run in local mode automatically</description>
  </property>

</configuration>

  注意:hive.exec.mode.local.auto 配置是爲了解決 return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 的問題

 

  d.在 Hive 的根目錄下建立 scratch_dir、resources_dir、querylog_dir、operation_dir 四個文件夾

  e.添加對windows的支持:在官網下載 apache-hive-1.2.2-src.tar.gz,解壓後將 bin 目錄及子目錄下的 cmd 文件複製到 Hive 對應的 bin 目錄及子目錄下

  f.下載 mysql-connector-java-8.0.16.jar 驅動包,放到 Hive 的 lib 目錄下

  g.新建cmd窗口進入 Hive 的 bin 目錄執行命令初始化數據庫表:hive --service schematool -dbType mysql -initSchema

  h.在cmd窗口啓動metastore:hive --service metastore

  i.在cmd窗口啓動hiveserver2:hive --service hiveserver2

  j.在cmd窗口輸入 hive 便可進入 Hive 的控制檯

  k.問題及解決:

    ①問題一:執行DDL時出現 (org.apache.hadoop.security.authorize.AuthorizationException): User: A(A爲windows管理員名稱) is not allowed to impersonate root 的權限問題

    ①解決:在 Hadoop 的 hdfs-site.xml 配置文件中加入

    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>

       在 Hadoop 的 core-site.xml 配置文件中加入

    <property>
        <name>hadoop.proxyuser.zwj.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.zwj.groups</name>
        <value>*</value>
    </property>

 

    ②問題二:在執行查詢操做時,報出 com.google.common.collect.ImmutableSortedMap 類衝突的異常

    ②解決:將 Hive 的 lib 目錄下的 guava-19.0.jar 替換掉 Hadoop 的 share\hadoop 目錄的子目錄下的 guava-11.0.2.jar (共兩處須要替換),注意作好備份

 

  l.之後使用:先進入 Hadoop 的 sbin 目錄執行cmd啓動命令:start-all.cmd

        在執行cmd命令:hive --service metastore

        最後執行cmd命令:hive --service hiveserver2

 

 

 

3.使用 hive-jdbc 操做 Hive

  a.加入pom依賴

    <!-- Hive -->
    <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-jdbc</artifactId>
      <version>2.3.0</version>
    </dependency>

 

  b.使用

public class HiveDao {

    private static String driverName = "org.apache.hive.jdbc.HiveDriver";
    private static String url = "jdbc:hive2://localhost:10010/default";
    private static String user = "root";
    private static String password = "123456";

    private static Connection conn = null;
    private static Statement stmt = null;
    private static ResultSet rs = null;

    // 加載驅動、建立鏈接
    public static void init() throws Exception {
        Class.forName(driverName);
        conn = DriverManager.getConnection(url,user,password);
        stmt = conn.createStatement();
    }

    // 建立數據庫
    public static void createDatabase() throws Exception {
        String sql = "create database hive_jdbc_test";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查詢全部數據庫
    public static void showDatabases() throws Exception {
        String sql = "show databases";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1));
        }
    }

    // 建立表
    public static void createTable() throws Exception {
        String sql = "create table cmdty(" +
                "cmdtyCode int," +
                "cmdtyName string," +
                "firstPrice double" +
                ")" +
                "row format delimited fields terminated by '\\t'";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查詢全部表
    public static void showTables() throws Exception {
        String sql = "show tables";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1));
        }
    }

    // 查看錶結構
    public static void descTable() throws Exception {
        String sql = "desc cmdty";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1) + "\t" + rs.getString(2));
        }
    }

    // 加載數據
    public static void loadData() throws Exception {
        //先建立文件夾
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://localhost:9527");  // 對應 core-site.xml 中配置的端口
        // 拿到操做HDFS的一個實例,而且設置其用戶(因爲windows權限問題"zwj"需替換爲管理員帳號)
        FileSystem fs = FileSystem.get(new URI("hdfs://localhost:9527"),conf,"zwj");
        fs.mkdirs(new Path("/user/hive/warehouse/cmdty"));
        fs.close();

        //在導入數據
        String filePath = "E:/tmp/cmdty.txt";
        String sql = "load data local inpath '" + filePath + "' overwrite into table cmdty";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查詢數據
    public static void selectData() throws Exception {
        String sql = "select * from cmdty";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        System.out.println("員工編號" + "\t" + "員工姓名" + "\t" + "工做崗位");
        while (rs.next()) {
            System.out.println(rs.getInt("cmdtyCode") + "\t\t" + rs.getString("cmdtyName") + "\t\t" + rs.getDouble("firstPrice"));
        }
    }

    // 統計查詢(會運行mapreduce做業)
    public static void countData() throws Exception {
        String sql = "select count(1) from cmdty";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getInt(1) );
        }
    }

    // 刪除數據庫
    public static void dropDatabase() throws Exception {
        String sql = "drop database if exists hive_jdbc_test";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 刪除數據庫表
    public static void deopTable() throws Exception {
        String sql = "drop table if exists cmdty";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 釋放資源
    public static void destory() throws Exception {
        if ( rs != null) {
            rs.close();
        }
        if (stmt != null) {
            stmt.close();
        }
        if (conn != null) {
            conn.close();
        }
    }


    public static void main(String[] args) throws Exception {
        init();

//        createDatabase();
//        showDatabases();
//        createTable();
//        showTables();
//        descTable();
//        loadData();
//        selectData();
        countData();
//        dropDatabase();
//        deopTable();

        destory();
    }
}

  注意:在加載數據到HDFS和數據庫表中時,須要提早在HDFS中建好對應文件夾,否則後會報錯

 

 

 

4.遺留問題

  a.在啓動hiveserver2,會報 java.lang.ClassNotFoundException: org.apache.tez.dag.api.TezConfiguration 的異常,是由於沒有集成tez的緣由,但暫時不影響 hive-jdbc 操做 Hive。

 

 

參考文章:https://www.cnblogs.com/tangyb/p/8971658.html

     https://www.cnblogs.com/maria-ld/p/10171780.html

     https://www.cnblogs.com/takemybreathaway/articles/9750175.html

相關文章
相關標籤/搜索