hadoop , hive 安裝過程和配置文件(附件)。
注意:
hadoop Name Node未作ha.
Hive, 仍是基本的hive on MR, 未使用on tez, on spark, 未配置LLAP、 HCatalog and WebHCat。
安裝完以後,如下是hive使用例子:
apache
從本地系統導入文件
LOAD DATA
LOCAL INPATH '/tmp/student.csv' OVERWRITE INTO TABLE student_csv
從hdfs文件中導入數據到表
LOAD DATA INPATH '/tmp/student.csv' OVERWRITE INTO TABLE student_csv
1 create csv file.
student.csv
4,Rose,M,78,77,765,Mike,F,99,98,98
2 put it to hdfs. (這一步非必須, hive也能夠從本地文件系統中導放)
# hdfs dfs -put student.csv /input
3 create table in hive.
create table student_csv(sid int, sname string, gender string, language int, math int, english int)row format delimited fields terminated by ',' stored as textfile;
4 load hdfs file to hive.
load data inpath '/input/student.csv' into table student_csv;
5 verify.
hive> select * from student_csv;OK4 Rose M 78 77 765 Mike F 99 98 98
四、數據導入到SEQUENCEFILE
SequenceFile是Hadoop API提供的一種二進制文件支持,其具備使用方便、可分割、可壓縮的特色。
SequenceFile支持三種壓縮選擇:NONE, RECORD, BLOCK。 Record壓縮率低,通常建議使用BLOCK壓縮。
示例:
create table test2(str STRING) STORED AS SEQUENCEFILE;
OK
Time taken: 5.526 seconds
hive> SET hive.exec.compress.output=true;
hive> SET io.seqfile.compression.type=BLOCK;
hive> INSERT OVERWRITE TABLE test2 SELECT * FROM test1;
INSERT OVERWRITE TABLE student_csv_orc SELECT * FROM student_csv;
把一個textfile 格式的表,轉化成orc格式的表
hive> INSERT OVERWRITE TABLE student_csv_orc SELECT * FROM student_csv;
執行命令的打印:
Query ID = hadoop_20180722122259_3a968951-7388-4f67-ba90-8ad47ffaa7d7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Kill Command = /opt/hadoop/bin/mapred job -kill job_1532216763790_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
注意:
只有TEXTFILE表能直接加載數據,必須,本地load數據,和external外部表直接加載運路徑數據,都只能用TEXTFILE表。 更深一步,hive默認支持的壓縮文件(hadoop默認支持的壓縮格式),也只能用TEXTFILE表直接讀取。其餘格式不行。能夠經過TEXTFILE表加載後insert到其餘表中。 換句話說,SequenceFile、RCFile表不能直接加載數據,數據要先導入到textfile表,再從textfile表經過insert select from 導入到SequenceFile,RCFile表。 SequenceFile、RCFile表的源文件不能直接查看,在hive中用select看。RCFile源文件能夠用 hive --service rcfilecat /xxxxxxxxxxxxxxxxxxxxxxxxxxx/000000_0查看,可是格式不一樣,很亂。 hive默認支持壓縮文件格式參考
http://blog.csdn.net/longshenlmj/article/details/50550580
ORC格式