大數據教程（11.7）hadoop2.9.1平臺上倉庫工具hive1.2.2搭建

時間 2019-11-07

標籤數據教程 11.7 hadoop2.9.1 hadoop 平臺倉庫工具 hive1.2.2 hive 搭建欄目 Hadoop 简体版

原文原文鏈接

上一篇文章介紹了hive2.3.4的搭建，然而這個版本已經不能穩定的支持mapreduce程序。本篇博主將分享hive1.2.2工具搭建全過程。先說明：本節就直接在上一節的hadoop環境中搭建了！java

1、下載apache-hive-1.2.2-bin.tar.gznode

2、上傳hive包到namenode服務器mysql

3、解壓hive包web

tar -zxvf  apache-hive-1.2.2-bin.tar.gz  -C /home/hadoop/apps/

4、修改/etc/profile中hive的配置文件sql

#export HIVE_HOME=/home/hadoop/apps/apache-hive-2.3.4-bin
export HIVE_HOME=/home/hadoop/apps/apache-hive-1.2.2-bin
export PATH=${HIVE_HOME}/bin:$PATH

#保存後執行source /etc/profile生效

5、修改hive配置文件數據庫

cd /home/hadoop/apps/apache-hive-1.2.2-bin/conf/
cp hive-env.sh.template hive-env.sh

#新增如下三行內容並保存
vi hive-env.sh
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.9.1
export HIVE_CONF_DIR=/home/hadoop/apps/apache-hive-1.2.2-bin/conf
export HIVE_AUX_JARS_PATH=/home/hadoop/apps/apache-hive-1.2.2-bin/lib

6、修改log4j日誌配置apache

cp hive-log4j.properties.template hive-log4j.properties

將EventCounter修改爲org.apache.hadoop.log.metrics.EventCounter
#log4j.appender.EventCounter=org.apache.hadoop.hive.shims.HiveEventCounter
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

7、配置hive元數據庫mysqlcentos

vi hive-site.xml
#將如下信息寫入到hive-site.xml文件中
<configuration>
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://192.168.29.131:3306/hivedb?createDatabaseIfNotExist=true</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.jdbc.Driver</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>123456</value>
        </property>
</configuration>

8、將jdbc驅動類拷貝到hive的lib目錄下服務器

cp ~/mysql-connector-java-5.1.28.jar  $HIVE_HOME/lib/

9、刪除以前hive2.3.4在hdfs中留下的歷史文件(此步驟必定要作)app

hdfs dfs -rm -r /tmp/hive
 hdfs dfs -rm -r /user/hive

10、初始化hive

[hadoop@centos-aaron-h1 bin]$ schematool -initSchema -dbType mysql
Metastore connection URL:        jdbc:mysql://192.168.29.131:3306/hivedb?createDatabaseIfNotExist=true
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       root
Starting metastore schema initialization to 1.2.0
Initialization script hive-schema-1.2.0.mysql.sql
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***
[hadoop@centos-aaron-h1 bin]$ schematool -initSchema -dbType mysql
Metastore connection URL:        jdbc:mysql://192.168.29.131:3306/hivedb?createDatabaseIfNotExist=true
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       root
Starting metastore schema initialization to 1.2.0
Initialization script hive-schema-1.2.0.mysql.sql
Initialization script completed
schemaTool completed

11、啓動hive而且完成建庫、建表，數據上傳

#此句須要在建庫建表作好才執行
hdfs dfs -put bbb_hive.txt /user/hive/warehouse/wcc_log.db/t_web_log01

[hadoop@centos-aaron-h1 bin]$ hive

Logging initialized using configuration in file:/home/hadoop/apps/apache-hive-1.2.2-bin/conf/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.679 seconds, Fetched: 1 row(s)
hive>  create database wcc_log;
OK
Time taken: 0.104 seconds
hive> use wcc_log;
OK
Time taken: 0.03 seconds
hive> create table t_web_log01(id int,name string)
    > row format delimited
    > fields terminated by ',';
OK
Time taken: 0.159 seconds
hive> select * from t_web_log01;
OK
1       張三
2       李四
3       王二
4       麻子
5       隔壁老王
Time taken: 0.274 seconds, Fetched: 5 row(s)
hive> select count(*) from t_web_log01;
Query ID = hadoop_20190121080409_dfb157d9-0a79-4784-9ea4-111d0ad4cc92
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1548024929599_0003, Tracking URL = http://centos-aaron-h1:8088/proxy/application_1548024929599_0003/
Kill Command = /home/hadoop/apps/hadoop-2.9.1/bin/hadoop job  -kill job_1548024929599_0003
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2019-01-21 08:04:25,271 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1548024929599_0003 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
hive> select count(id) from t_web_log01;
Query ID = hadoop_20190121080455_b3eb8d25-2d10-46c6-b4f3-bfcdab904b92
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1548024929599_0004, Tracking URL = http://centos-aaron-h1:8088/proxy/application_1548024929599_0004/
Kill Command = /home/hadoop/apps/hadoop-2.9.1/bin/hadoop job  -kill job_1548024929599_0004
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2019-01-21 08:05:09,771 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1548024929599_0004 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

執行報錯FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

11、解決上面的報錯

查詢yarn日誌：http://centos-aaron-h1:8088/proxy/application_1548024929599_0004/

分析緣由：hive遠程去調用yarn後，或出現一些環境變量丟失的狀況；

解決方案：修改mapred-site.xml 新增下面內容，而且分發到全部hadoop集羣，並重啓集羣

<property>
<name>mapreduce.application.classpath</name>
<value>/home/hadoop/apps/hadoop-2.9.1/share/hadoop/mapreduce/*, /home/hadoop/apps/hadoop-2.9.1/share/hadoop/mapreduce/lib/*</value>
</property>

12、再次運行hive查詢【select count(id) from t_web_log01;】

[hadoop@centos-aaron-h1 bin]$ hive

Logging initialized using configuration in file:/home/hadoop/apps/apache-hive-1.2.2-bin/conf/hive-log4j.properties
hive> use wcc_log
    > ;
OK
Time taken: 0.487 seconds
hive> show tables;
OK
t_web_log01
Time taken: 0.219 seconds, Fetched: 1 row(s)
hive> select count(id) from t_web_log01;
Query ID = hadoop_20190121082042_c5392e1c-8db8-4329-bcdf-b0c332fcfe4f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1548029911300_0001, Tracking URL = http://centos-aaron-h1:8088/proxy/application_1548029911300_0001/
Kill Command = /home/hadoop/apps/hadoop-2.9.1/bin/hadoop job  -kill job_1548029911300_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-01-21 08:21:05,410 Stage-1 map = 0%,  reduce = 0%
2019-01-21 08:21:14,072 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.38 sec
2019-01-21 08:21:21,290 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.32 sec
MapReduce Total cumulative CPU time: 3 seconds 320 msec
Ended Job = job_1548029911300_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.32 sec   HDFS Read: 6642 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 320 msec
OK
5
Time taken: 40.218 seconds, Fetched: 1 row(s)
hive> [hadoop@centos-aaron-h1 bin]$

從上面能夠看到執行成功，結果：5條記錄

最後寄語，以上是博主本次文章的所有內容，若是你們以爲博主的文章還不錯，請點贊；若是您對博主其它服務器大數據技術或者博主本人感興趣，請關注博主博客，而且歡迎隨時跟博主溝通交流。