環境
虛擬機:VMware 10
Linux版本:CentOS-6.5-x86_64
客戶端:Xshell4
FTP:Xftp4
jdk8
hadoop-3.1.1
apache-hive-3.1.1html
1、簡介java
Hive是基於Hadoop的一個數據倉庫工具,能夠將結構化的數據文件映射成一張表,並提供類sql語句的查詢功能;
Hive使用Hql做爲查詢接口,使用HDFS存儲,使用mapreduce計算;Hive的本質是將Hql轉化爲mapreduce;讓非java編程者對hdfs的數據作mapreduce操做node
Hive: 數據倉庫。與關係型數據庫區別:
①數據庫能夠用在Online的應用中,Hive主要進行離線的大數據分析;mysql
②數據庫的查詢語句爲SQL,Hive的查詢語句爲HQL;web
③數據庫數據存儲在LocalFS,Hive的數據存儲在HDFS;sql
④Hive執行MapReduce,MySQL執行Executor;shell
⑤Hive沒有索引;數據庫
⑥Hive延遲性高;apache
⑦Hive可擴展性高;編程
⑧Hive數據規模大;
2、架構
Hive的架構
(1)用戶接口主要有三個:Hive命令行模式(CLI),最經常使用模式;Hive Client(如JavaApi方式),對外提供RPC服務 ; Hive Web Interface(HWI):瀏覽器方式。
(2)Hive運行時將元數據存儲在數據庫中,如mysql、derby。Hive中的元數據包括表的名字,表的列和分區及其屬性,表的屬性(是否爲外部表等),表的數據所在目錄等。
(3)解釋器、編譯器、優化器完成HQL查詢語句從詞法分析、語法分析、編譯、優化以及查詢計劃的生成。生成的查詢計劃存儲在HDFS中,並在隨後有MapReduce調用執行。
(4)Hive的數據存儲在HDFS中,大部分的查詢、計算由MapReduce完成(包含*的查詢,好比select * from tbl不會生成MapRedcue任務)。
Operator操做符
編譯器將一個Hive SQL轉換操做符;
操做符是Hive的最小的處理單元;
每一個操做符表明HDFS的一個操做或者一道MapReduce做業;
ANTLR詞法語法分析工具解析hql
3、搭建
劃分的維度:對關係型數據庫的訪問和管理來劃分的
一、local模式:此模式鏈接到一個In-memory的數據庫Derby,通常用於Unit Test。由於在內存不易維護,建議不用。
二、單用戶模式:經過網絡鏈接到一個數據庫中,是最常常使用到的模式;
步驟一:安裝mysql
參考:搭建Linux-java web運行環境之二:安裝mysql
步驟二:解壓apache-hive-3.1.1-bin.tar.gz 並設置環境變量
[root@PCS102 src]# tar -xf apache-hive-3.1.1-bin.tar.gz -C /usr/local
步驟三:修改配置
[root@PCS102 conf]# cd /usr/local/apache-hive-3.1.1-bin/conf && ll total 332 -rw-r--r--. 1 root root 1596 Apr 4 2018 beeline-log4j2.properties.template -rw-r--r--. 1 root root 299970 Oct 24 08:19 hive-default.xml.template -rw-r--r--. 1 root root 2365 Apr 4 2018 hive-env.sh.template -rw-r--r--. 1 root root 2274 Apr 4 2018 hive-exec-log4j2.properties.template -rw-r--r--. 1 root root 3086 Oct 24 07:49 hive-log4j2.properties.template -rw-r--r--. 1 root root 2060 Apr 4 2018 ivysettings.xml -rw-r--r--. 1 root root 3558 Oct 24 07:49 llap-cli-log4j2.properties.template -rw-r--r--. 1 root root 7163 Oct 24 07:49 llap-daemon-log4j2.properties.template -rw-r--r--. 1 root root 2662 Apr 4 2018 parquet-logging.properties
#複製配置文件
[root@PCS102 conf]# cp hive-default.xml.template hive-site.xml
#修改配置hive-site.xml <!--hive數據上傳到HDFS中的目錄--> <property> <name>hive.metastore.warehouse.dir</name> <value>/root/hive_remote/warehouse</value> </property> <!--hive是否本地模式--> <property> <name>hive.metastore.local</name> <value>false</value> </property> <!--hive鏈接mysql地址--> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://PCS101/hive_remote?createDatabaseIfNotExist=true</value> </property> <!--hive鏈接mysql驅動類--> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <!--hive鏈接mysql用戶名--> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <!--hive鏈接mysql 密碼--> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> :.,$-1d 刪除當前行到倒數第二行
步驟四:拷貝mysql驅動包mysql-connector-java-5.1.32-bin
/usr/local/apache-hive-3.1.1-bin/lib
步驟五:初始化數據庫 第一次啓動以前須要hive元數據庫初始化
[root@PCS102 bin]# /usr/local/apache-hive-3.1.1-bin/bin/schematool -dbType mysql -initSchema
未初始化 會報錯:
hive> create table test01(id int,age int);
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
步驟六:啓動 默認進入命令行
[root@PCS102 bin]# /usr/local/apache-hive-3.1.1-bin/bin/hive which: no hbase in (/usr/local/jdk1.8.0_65/bin:/home/cluster/subversion-1.10.3/bin:/home/cluster/apache-storm-0.9.2/bin:/usr/local/hadoop-3.1.1/bin:/usr/local/hadoop-3.1.1/sbin:/usr/local/apache-hive-3.1.1-bin/bin:/usr/local/jdk1.7.0_80/bin:/home/cluster/subversion-1.10.3/bin:/home/cluster/apache-storm-0.9.2/bin:/usr/local/hadoop-3.1.1/bin:/usr/local/hadoop-3.1.1/sbin:/usr/local/jdk1.7.0_80/bin:/home/cluster/subversion-1.10.3/bin:/home/cluster/apache-storm-0.9.2/bin:/usr/local/hadoop-3.1.1/bin:/usr/local/hadoop-3.1.1/sbin:/usr/local/jdk1.7.0_80/bin:/home/cluster/subversion-1.10.3/bin:/home/cluster/apache-storm-0.9.2/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/openssh/bin:/root/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.1-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = e08b6258-e2eb-40af-ba98-87abcb2d1728 Logging initialized using configuration in jar:file:/usr/local/apache-hive-3.1.1-bin/lib/hive-common-3.1.1.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> create table test01(id int,age int); OK Time taken: 1.168 seconds hive> desc test01; OK id int age int Time taken: 0.181 seconds, Fetched: 2 row(s) hive> insert into test01 values(1,23); Query ID = root_20190125164516_aa852f47-a9b0-4c59-9043-efb557965a5b Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1548397153910_0001, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0001/ Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job -kill job_1548397153910_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2019-01-25 16:45:26,923 Stage-1 map = 0%, reduce = 0% 2019-01-25 16:45:32,107 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.96 sec 2019-01-25 16:45:38,271 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.01 sec MapReduce Total cumulative CPU time: 8 seconds 10 msec Ended Job = job_1548397153910_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://PCS102:9820/root/hive_remote/warehouse/test01/.hive-staging_hive_2019-01-25_16-45-16_011_1396999443961154869-1/-ext-10000 Loading data to table default.test01 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.01 sec HDFS Read: 14187 HDFS Write: 236 SUCCESS Total MapReduce CPU Time Spent: 8 seconds 10 msec OK Time taken: 23.714 seconds hive>
查看HDFS:
[root@PCS102 bin]# hdfs dfs -cat /root/hive_remote/warehouse/test01/* 123
查看插入數據MR:
查看mysql:
字段:
表:
步驟七:退出
hive>exit;
或者
hive>quit;
三、遠程服務器模式/多用戶模式:用於非Java客戶端訪問元數據庫,在服務器端啓動MetaStoreServer,客戶端利用Thrift協議經過MetaStoreServer訪問元數據庫;
服務端須要啓動metastore服務
[root@PCS102 conf]# nohup hive --service metastore &
[root@PCS102 conf]# jps 24657 RunJar 29075 Jps 18534 NameNode 20743 NodeManager 18712 DataNode 23609 JobHistoryServer 28842 RunJar 20523 ResourceManager 19020 SecondaryNameNode
一、服務端和客戶端在同一個節點:
PCS101:mysql服務端
PCS102:hive服務端和客戶端
PCS102 配置文件:hive-site.xml
<configuration> <!--hive數據上傳到HDFS中的目錄--> <property> <name>hive.metastore.warehouse.dir</name> <value>/root/hive_remote/warehouse</value> </property> <!--hive是否本地模式--> <property> <name>hive.metastore.local</name> <value>false</value> </property> <!--hive鏈接mysql地址--> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://PCS101/hive_remote?createDatabaseIfNotExist=true</value> </property> <!--hive鏈接mysql驅動類--> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <!--hive鏈接mysql用戶名--> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <!--hive鏈接mysql 密碼--> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <!--hive meta store client地址--> <property> <name>hive.metastore.uris</name> <value>thrift://PCS102:9083</value> </property> </configuration>
二、服務端和客戶端在不一樣節點(客戶端 服務端都要依賴hadoop)
PCS101:mysql服務端
PCS102:hive服務端
PCS103:hive客戶端
PCS102:hive服務端配置文件:hive-site.xml
<configuration> <!--hive數據上傳到HDFS中的目錄--> <property> <name>hive.metastore.warehouse.dir</name> <value>/root/hive_remote/warehouse</value> </property> <!--hive鏈接mysql地址--> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://PCS101/hive_remote?createDatabaseIfNotExist=true</value> </property> <!--hive鏈接mysql驅動類--> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <!--hive鏈接mysql用戶名--> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <!--hive鏈接mysql 密碼--> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> </configuration>
PCS103:hive客戶端配置文件:hive-site.xml
<configuration> <!--hive數據上傳到HDFS中的目錄--> <property> <name>hive.metastore.warehouse.dir</name> <value>/root/hive_remote/warehouse</value> </property> <!--hive是否本地模式--> <property> <name>hive.metastore.local</name> <value>false</value> </property> <!--hive meta store client地址--> <property> <name>hive.metastore.uris</name> <value>thrift://PCS102:9083</value> </property> </configuration>
問題:/usr/local/hadoop-2.6.5/share/hadoop/yarn/lib 下jar包jline-0.9.94.jar 比較老致使 將hive下jline拷貝到hadoop下就能夠了
[root@node101 bin]# schematool -dbType mysql -initSchema 19/07/02 16:21:32 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist Metastore connection URL: jdbc:mysql://node102/hive_remote?createDatabaseIfNotExist=true Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: root Starting metastore schema initialization to 1.2.0 Initialization script hive-schema-1.2.0.mysql.sql [ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at org.apache.hive.beeline.BeeLineOpts.<init>(BeeLineOpts.java:74) at org.apache.hive.beeline.BeeLine.<init>(BeeLine.java:117) at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:346) at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:266) at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:243) at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:473) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at org.apache.hive.beeline.BeeLineOpts.<init>(BeeLineOpts.java:102) at org.apache.hive.beeline.BeeLine.<init>(BeeLine.java:117) at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:346) at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:266) at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:243) at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:473) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)