Hive安裝使用

時間 2019-11-26

標籤 hive 安裝使用欄目 Hadoop 简体版

原文原文鏈接

1、安裝

1.解壓hive安裝包

tar -zxvf apatch-hive-2.1.1-bin.tar.gz
mv apatch-hive-2.1.1 hive-2.1.1

2.配置hive環境變量

export HIVE_HOME=/opt/soft/hive-2.1.1
export PATH=${PATH}:${HIVE_HOME}/bin

3.修改配置文件

cp hive-default.xml.template hive-site.xml
cp hive-log4j.properties.template hive-log4j.properties

修改hive-site.xmljava

<!-- 配置路徑， 在hive-site.xml 文件頭配置 -->
<property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
<property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>

<!-- MySQL配置 -->
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localwork:3306/hive_metadata?createDatabaseIfNotExist=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. 
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
    <description>password to use against metastore database</description>
  </property>
<!-- 數據表在hdfs存放位置 -->
 <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>

修改hive-log4j.propertiesnode

<!-- log4j -->
可選擇性修改

4.在hdfs建立須要目錄

hadoop fs -mkdir /hive/warehouse
hadoop fs -mkdir /tmp
hadoop fs -chmod -R 755 /hive/warehouse
hadoop fs -chmod -R 755 /tmp

5.在MySQL數據庫建立數據庫

CREATE DATABASE `hive_metadata` /*!40100 COLLATE 'utf8_general_ci' */

上傳MySQL驅動到hive的lib目錄下mysql

初始化hive數據庫配置到MySQLsql

./bin/schematool -initSchema -dbType mysql

6.啓動hive測試

./bin/hive

3、hive命令

1.Hive交互式模式

quit,exit: 退出交互式shell
reset: 重置配置爲默認值
set <key>=<value> : 修改特定變量的值(若是變量名拼寫錯誤，不會報錯)
set : 輸出用戶覆蓋的hive配置變量
set -v : 輸出全部Hadoop和Hive的配置變量
add FILE[S] *, add JAR[S] *, add ARCHIVE[S] * : 添加一個或多個 file, jar, archives到分佈式緩存
list FILE[S], list JAR[S], list ARCHIVE[S] : 輸出已經添加到分佈式緩存的資源。
list FILE[S] *, list JAR[S] *,list ARCHIVE[S] * : 檢查給定的資源是否添加到分佈式緩存
delete FILE[S] *,delete JAR[S] *,delete ARCHIVE[S] * : 從分佈式緩存刪除指定的資源
! <command> : 從Hive shell執行一個shell命令
dfs <dfs command> : 從Hive shell執行一個dfs命令
<query string> : 執行一個Hive 查詢，而後輸出結果到標準輸出
source FILE <filepath>: 在CLI裏執行一個hive腳本文件

4、數據導入導出

sqoop中文手冊：http://blog.csdn.net/myrainblues/article/details/43673129shell

1.hive操做：

hive新建表數據庫

create table t_test(id bigint,name string, age int) row format delimited fields terminated by '\t';

導入本地數據apache

load data local inpath '/opt/hadoop-test/test.txt' overwrite into table t_test;

導入hdfs數據緩存

load data inpath '/data/test.txt' overwrite into table t_test;

hive新建表導入其餘表數據bash

create table t_test2 as select * from t_test;

hive已存在表導入其餘表分佈式

insert overwrite table t_test2 select * from t_test;

僅複製表結構不導數據

create table t_test3 like t_test;

經過Hive導出到本地文件系統

insert overwrite local directory '/tmp/t_crm/t_test' select * from t_test2

分區（分區字段不能存在新建時表結構定義裏邊）

create table t_user(id bigint,name string, age int) partitioned by (gender int) row format delimited fields terminated by '\t';

查詢聚合查詢等相似參考MySQL

2.sqoop導入MySQL數據到hive

(1)導入MySQL查詢結果

注意：查詢結果有重複的，(1)先在hive建表，而後導入； (2)或者查詢結果指定別名

hive建立外部表

create external table qf_sale_follow (id string,name string,short_name string,sale_id string,sale_name string,sale_city string,follow_type string,follow_status string,schedule_rate string,abandon_reason string,create_time string,create_user_id string,create_user_name string,detail string) partitioned by (logdate string) row format delimited fields terminated by '\t' location '/qf_dev/qf_sale_follow';

導入hive數據（導入query查詢時，必須攜帶 where $CONDITIONS，而且必須指定--target-dir[hdfs 地址] ；若是查詢有重複名稱，則必須查詢結果指定別名，或者先在數據庫新建表）：

sqoop import --connect jdbc:mysql://localwork:3306/qf_dev --username root --password root --query 'SELECT b.id, b.name, b.short_name, b.sale_id, b.sale_name, b.sale_city, a.follow_type, a.follow_status, a.schedule_rate, a.abandon_reason, a.create_time, a.create_user_id, a.create_user_name, a.detail FROM t_customer_follow_record a JOIN t_customer b ON a.customer_id=b.id where $CONDITIONS ORDER BY a.create_time ASC' --hive-import --hive-overwrite --hive-table qf_sale_follow -m 1  --target-dir /qf_dev/qf_sale_follow2/ --fields-terminated-by '\t';

導入hive（指定表）：

sqoop import --connect jdbc:mysql://localwork:3306/qf_dev --username root --password root --table t_workflow_report  --hive-import --hive-overwrite --hive-database qf_dev --hive-table t_workflow_report -m 1  --fields-terminated-by '\t';

注意：target-dir 臨時目標目錄，必定不要和hive 中 location 位置相同，由於target-dir是第一步MySQL數據導入到hdfs中臨時存儲目錄，導入hdfs成功後，會再把 hdfs數據導入到hive，導入成功後，把臨時文件刪掉；

空值處理：--null-string '\\N' --null-non-string '\\N'

3.導出hive數據到MySQL

sqoop export --connect jdbc:mysql://localwork:3306/test  --username root --password root --table t_workflow_report --export-dir /hive/warehouse/qf_dev.db/t_workflow_report2 --input-null-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\t';

5、遇到問題

1.用sqoop將MySQL數據導入hive

（1）導入 hive IOException running import job: java.io.IOException: Hive exited with status 1

解決辦法：http://blog.csdn.net/wind520/article/details/39128399

（2）找不到數據庫表生成的jar文件

解決方法：修改hadoop配置文件 yarn-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

（3）org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService: mapreduce_shuffle do

解決方法：修改hadoop配置文件 yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

http://blog.csdn.net/baiyangfu_love/article/details/13504849

（4）root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxr-xr-x

修改hdfs tmp文件權限， hadoop fs -chmod -R 755 /tmp

（5）hive中導入的數據只有id一行

解決辦法：先在hive中建立表，創建好目錄結構後，再執行import導入；

（6）java鏈接hive root is not allowed to impersonate root

修改hadoop的配置文件core-site.xml

<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
<description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
<description>The superuser can connect only from host1 and host2 to impersonate a user</description>
</property>

hiverserver2 : http://blog.csdn.net/gamer_gyt/article/details/52062460

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。