Install hadoop, install hive, 及hive的使用

時間 2019-11-10

原文原文鏈接

hadoop , hive 安裝過程和配置文件（附件）。
注意：

hadoop Name Node未作ha.
Hive，仍是基本的hive on MR, 未使用on tez, on spark，未配置LLAP、 HCatalog and WebHCat。

安裝完以後，如下是hive使用例子：

apache

從本地系統導入文件

LOAD DATA LOCAL INPATH '/tmp/student.csv' OVERWRITE INTO TABLE student_csv

從hdfs文件中導入數據到表

LOAD DATA INPATH '/tmp/student.csv' OVERWRITE INTO TABLE student_csv

1 create csv file.

student.csv

4,Rose,M,78,77,765,Mike,F,99,98,98

2 put it to hdfs. （這一步非必須, hive也能夠從本地文件系統中導放）

# hdfs dfs -put student.csv /input

3 create table in hive.

create table student_csv(sid int, sname string, gender string, language int, math int, english int)row format delimited fields terminated by ',' stored as textfile;

4 load hdfs file to hive.

load data inpath '/input/student.csv' into table student_csv;

5 verify.

hive> select * from student_csv;OK4 Rose M 78 77 765 Mike F 99 98 98

四、數據導入到SEQUENCEFILE

SequenceFile是Hadoop API提供的一種二進制文件支持，其具備使用方便、可分割、可壓縮的特色。

SequenceFile支持三種壓縮選擇：NONE, RECORD, BLOCK。 Record壓縮率低，通常建議使用BLOCK壓縮。

示例：

create table test2(str STRING) STORED AS SEQUENCEFILE;

Time taken: 5.526 seconds

hive> SET hive.exec.compress.output=true;

hive> SET io.seqfile.compression.type=BLOCK;

hive> INSERT OVERWRITE TABLE test2 SELECT * FROM test1;

INSERT OVERWRITE TABLE student_csv_orc SELECT * FROM student_csv;

把一個textfile 格式的表，轉化成orc格式的表

hive> INSERT OVERWRITE TABLE student_csv_orc SELECT * FROM student_csv;

執行命令的打印:

Query ID = hadoop_20180722122259_3a968951-7388-4f67-ba90-8ad47ffaa7d7

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1532216763790_0001, Tracking URL = http://serv10.bigdata.com:8088/p ... 1532216763790_0001/

Kill Command = /opt/hadoop/bin/mapred job -kill job_1532216763790_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

注意：

只有TEXTFILE表能直接加載數據，必須，本地load數據，和external外部表直接加載運路徑數據，都只能用TEXTFILE表。更深一步，hive默認支持的壓縮文件（hadoop默認支持的壓縮格式），也只能用TEXTFILE表直接讀取。其餘格式不行。能夠經過TEXTFILE表加載後insert到其餘表中。換句話說，SequenceFile、RCFile表不能直接加載數據，數據要先導入到textfile表，再從textfile表經過insert select from 導入到SequenceFile,RCFile表。 SequenceFile、RCFile表的源文件不能直接查看，在hive中用select看。RCFile源文件能夠用 hive --service rcfilecat /xxxxxxxxxxxxxxxxxxxxxxxxxxx/000000_0查看，可是格式不一樣，很亂。 hive默認支持壓縮文件格式參考 http://blog.csdn.net/longshenlmj/article/details/50550580

ORC格式

ORC是RCfile的升級版，性能有大幅度提高，並且數據能夠壓縮存儲，壓縮比和Lzo壓縮差很少，比text文件壓縮比能夠達到70%的空間。並且讀性能很是高，能夠實現高效查詢。具體介紹 https://cwiki.apache.org/conflue ... /LanguageManual+ORC建表語句以下：同時，將ORC的表中的NULL取值，由默認的\N改成'',

1. mac OSX 上 brew install hive
2. hue install
3. Hadoop(Install)
4. hadoop安裝hive及java調用hive
5. Hadoop之Hive的使用
6. AirFlow-Install
7. hadoop平臺使用以及hive命令
8. Hive及Hive安裝
9. hadoop mysql install (5)
10. hadoop hbase install (2)
更多相關文章...
• Spring Bean的配置及常用屬性 - Spring教程
• TortoiseSVN 使用教程 - SVN 教程
• Composer 安裝與使用
• Java Agent入門實戰（一）-Instrumentation介紹與使用

相關標籤/搜索

hadoop+spark+hive+mysql

hadoop+hive+hbase+spark

hadoop+hive+spark+hbase

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。