Hive學習之路 (二)Hive安裝

 

正文java

Hive的下載

下載地址http://mirrors.hust.edu.cn/apache/mysql

選擇合適的Hive版本進行下載,進到stable-2文件夾能夠看到穩定的2.x的版本是2.3.3sql

Hive的安裝

一、本人使用MySQL作爲Hive的元數據庫,因此先安裝MySQL。

MySql安裝過程http://www.cnblogs.com/qingyunzong/p/8294876.html數據庫

二、上傳Hive安裝包

三、解壓安裝包

[hadoop@hadoop3 ~]$ tar -zxvf apache-hive-2.3.3-bin.tar.gz -C apps/

四、修改配置文件

配置文件所在目錄apache-hive-2.3.3-bin/confapache

複製代碼
[hadoop@hadoop3 apps]$ cd apache-hive-2.3.3-bin/
[hadoop@hadoop3 apache-hive-2.3.3-bin]$ ls
bin  binary-package-licenses  conf  examples  hcatalog  jdbc  lib  LICENSE  NOTICE  RELEASE_NOTES.txt  scripts
[hadoop@hadoop3 apache-hive-2.3.3-bin]$ cd conf/
[hadoop@hadoop3 conf]$ ls
beeline-log4j2.properties.template    ivysettings.xml
hive-default.xml.template             llap-cli-log4j2.properties.template
hive-env.sh.template                  llap-daemon-log4j2.properties.template
hive-exec-log4j2.properties.template  parquet-logging.properties
hive-log4j2.properties.template
[hadoop@hadoop3 conf]$ pwd
/home/hadoop/apps/apache-hive-2.3.3-bin/conf
[hadoop@hadoop3 conf]$ 
複製代碼

新建hive-site.xml並添加如下內容bash

[hadoop@hadoop3 conf]$ touch hive-site.xml
[hadoop@hadoop3 conf]$ vi hive-site.xml 
複製代碼
<configuration>
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://hadoop1:3306/hivedb?createDatabaseIfNotExist=true</value>
                <description>JDBC connect string for a JDBC metastore</description>
                <!-- 若是 mysql 和 hive 在同一個服務器節點,那麼請更改 hadoop02 爲 localhost -->
        </property>
        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.jdbc.Driver</value>
                <description>Driver class name for a JDBC metastore</description>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
                <description>username to use against metastore database</description>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>root</value>
        <description>password to use against metastore database</description>
        </property>
</configuration>
複製代碼

如下可選配置,該配置信息用來指定 Hive 數據倉庫的數據存儲在 HDFS 上的目錄服務器

        <property>
                <name>hive.metastore.warehouse.dir</name>
                <value>/hive/warehouse</value>
                <description>hive default warehouse, if nessecory, change it</description>
        </property>    

五、 必定要記得加入 MySQL 驅動包(mysql-connector-java-5.1.40-bin.jar)該 jar 包放置在 hive 的根路徑下的 lib 目錄

 

六、 安裝完成,配置環境變量

[hadoop@hadoop3 lib]$ vi ~/.bashrc 
#Hive
export HIVE_HOME=/home/hadoop/apps/apache-hive-2.3.3-bin
export PATH=$PATH:$HIVE_HOME/bin

使修改的配置文件當即生效app

[hadoop@hadoop3 lib]$ source ~/.bashrc 

七、 驗證 Hive 安裝

複製代碼
[hadoop@hadoop3 ~]$ hive --help
Usage ./hive <parameters> --service serviceName <service parameters>
Service List: beeline cleardanglingscratchdir cli hbaseimport hbaseschematool help hiveburninclient hiveserver2 hplsql jar lineage llapdump llap llapstatus metastore metatool orcfiledump rcfilecat schemaTool version 
Parameters parsed:
  --auxpath : Auxiliary jars 
  --config : Hive configuration directory
  --service : Starts specific service/component. cli is default
Parameters used:
  HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
  HIVE_OPT : Hive options
For help on a particular service:
  ./hive --service serviceName --help
Debug help:  ./hive --debug --help
[hadoop@hadoop3 ~]$ 
複製代碼

八、 初始化元數據庫

  注意:當使用的 hive 是 2.x 以前的版本,不作初始化也是 OK 的,當 hive 第一次啓動的 時候會自動進行初始化,只不過會不會生成足夠多的元數據庫中的表。在使用過程當中會 慢慢生成。但最後進行初始化。若是使用的 2.x 版本的 Hive,那麼就必須手動初始化元 數據庫。使用命令:ide

複製代碼
[hadoop@hadoop3 ~]$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apps/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/apps/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:     jdbc:mysql://hadoop1:3306/hivedb?createDatabaseIfNotExist=true
Metastore Connection Driver :     com.mysql.jdbc.Driver
Metastore connection User:     root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
[hadoop@hadoop3 ~]$ 
複製代碼

九、 啓動 Hive 客戶端

hive --service cli和hive效果同樣

複製代碼
[hadoop@hadoop3 ~]$ hive --service cli
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apps/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/apps/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-2.3.3-bin/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> 
複製代碼


基本使用

現有一個文件student.txt,將其存入hive中,student.txt數據格式以下:

複製代碼
95002,劉晨,女,19,IS
95017,王風娟,女,18,IS
95018,王一,女,19,IS
95013,馮偉,男,21,CS
95014,王小麗,女,19,CS
95019,邢小麗,女,19,IS
95020,趙錢,男,21,IS
95003,王敏,女,22,MA
95004,張立,男,19,IS
95012,孫花,女,20,CS
95010,孔小濤,男,19,CS
95005,劉剛,男,18,MA
95006,孫慶,男,23,CS
95007,易思玲,女,19,MA
95008,李娜,女,18,CS
95021,週二,男,17,MA
95022,鄭明,男,20,MA
95001,李勇,男,20,CS
95011,包小柏,男,18,MA
95009,夢圓圓,女,18,MA
95015,王君,男,18,MA
複製代碼

 

一、建立一個數據庫myhive

hive> create database myhive;
OK
Time taken: 7.847 seconds
hive> 

二、使用新的數據庫myhive

hive> use myhive;
OK
Time taken: 0.047 seconds
hive> 

三、查看當前正在使用的數據庫

hive> select current_database();
OK
myhive
Time taken: 0.728 seconds, Fetched: 1 row(s)
hive> 

四、在數據庫myhive建立一張student表

hive> create table student(id int, name string, sex string, age int, department string) row format delimited fields terminated by ",";
OK
Time taken: 0.718 seconds
hive> 

五、往表中加載數據

hive> load data local inpath "/home/hadoop/student.txt" into table student;
Loading data to table myhive.student
OK
Time taken: 1.854 seconds
hive> 

六、查詢數據

複製代碼
hive> select * from student;
OK
95002    劉晨    女    19    IS
95017    王風娟    女    18    IS
95018    王一    女    19    IS
95013    馮偉    男    21    CS
95014    王小麗    女    19    CS
95019    邢小麗    女    19    IS
95020    趙錢    男    21    IS
95003    王敏    女    22    MA
95004    張立    男    19    IS
95012    孫花    女    20    CS
95010    孔小濤    男    19    CS
95005    劉剛    男    18    MA
95006    孫慶    男    23    CS
95007    易思玲    女    19    MA
95008    李娜    女    18    CS
95021    週二    男    17    MA
95022    鄭明    男    20    MA
95001    李勇    男    20    CS
95011    包小柏    男    18    MA
95009    夢圓圓    女    18    MA
95015    王君    男    18    MA
Time taken: 2.455 seconds, Fetched: 21 row(s)
hive> 
複製代碼

七、查看錶結構

複製代碼
hive> desc student;
OK
id                      int                                         
name                    string                                      
sex                     string                                      
age                     int                                         
department              string                                      
Time taken: 0.102 seconds, Fetched: 5 row(s)
hive> 
複製代碼

 

複製代碼
hive> desc extended student;
OK
id                      int                                         
name                    string                                      
sex                     string                                      
age                     int                                         
department              string                                      
          
Detailed Table Information    Table(tableName:student, dbName:myhive, owner:hadoop, createTime:1522750487, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:sex, type:string, comment:null), FieldSchema(name:age, type:int, comment:null), FieldSchema(name:department, type:string, comment:null)], location:hdfs://myha01/user/hive/warehouse/myhive.db/student, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1522750695, totalSize=523, numRows=0, rawDataSize=0, numFiles=1}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false)    
Time taken: 0.127 seconds, Fetched: 7 row(s)
hive> 
複製代碼

 

複製代碼
hive> desc formatted student;
OK
# col_name                data_type               comment             
          
id                      int                                         
name                    string                                      
sex                     string                                      
age                     int                                         
department              string                                      
          
# Detailed Table Information          
Database:               myhive                   
Owner:                  hadoop                   
CreateTime:             Tue Apr 03 18:14:47 CST 2018     
LastAccessTime:         UNKNOWN                  
Retention:              0                        
Location:               hdfs://myha01/user/hive/warehouse/myhive.db/student     
Table Type:             MANAGED_TABLE            
Table Parameters:          
    numFiles                1                   
    numRows                 0                   
    rawDataSize             0                   
    totalSize               523                 
    transient_lastDdlTime    1522750695          
          
# Storage Information          
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:          
    field.delim             ,                   
    serialization.format    ,                   
Time taken: 0.13 seconds, Fetched: 34 row(s)
hive> 
複製代碼
相關文章
相關標籤/搜索