hive 數據導入

時間 2019-12-08

標籤 hive 數據導入欄目 Hadoop 简体版

原文原文鏈接

Hive的幾種常見的數據導入方式
這裏介紹四種：
（1）、從本地文件系統中導入數據到Hive表；
（2）、從HDFS上導入數據到Hive表；
（3）、從別的表中查詢出相應的數據並導入到Hive表中；
（4）、在建立表的時候經過從別的表中查詢出相應的記錄並插入到所建立的表中。

1、從本地文件系統中導入數據到Hive表oop

先在Hive裏面建立好表，以下：

hive> create table wyp
> (id int, name string,
> age int, tel string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;
OK
Time taken: 2.832 seconds

本地文件系統裏面有個/home/wyp/wyp.txt文件，內容以下：

[wyp@master ~]$ cat wyp.txt
1 wyp 25 13188888888888
2 test 30 13888888888888
3 zs 34 899314121

wyp.txt文件中的數據列之間是使用\t分割的，能夠經過下面的語句將這個文件裏面的數據導入到wyp表裏面，操做以下：

hive> load data local inpath 'wyp.txt' into table wyp;
Copying data from file:/home/wyp/wyp.txt
Copying file: file:/home/wyp/wyp.txt
Loading data to table default.wyp
Table default.wyp stats:
[num_partitions: 0, num_files: 1, num_rows: 0, total_size: 67]
OK
Time taken: 5.967 seconds

能夠到wyp表的數據目錄下查看，以下命令：

hive> dfs -ls /user/hive/warehouse/wyp ;
Found 1 items
-rw-r--r--3 wyp supergroup 67 2014-02-19 18:23 /hive/warehouse/wyp/wyp.txt

須要注意的是： Hive並不支持INSERT INTO …. VALUES形式的語句。

2、HDFS上導入數據到Hive表post

　　從本地文件系統中將數據導入到Hive表的過程當中，實際上是先將數據臨時複製到HDFS的一個目錄下（典型的狀況是複製到上傳用戶的HDFS home目錄下,好比/home/wyp/），而後再將數據從那個臨時目錄下移動（注意，這裏說的是移動，不是複製！）到對應的Hive表的數據目錄裏面。既然如此，那麼Hive確定支持將數據直接從HDFS上的一個目錄移動到相應Hive表的數據目錄下，假設有下面這個文件/home/wyp/add.txt，具體的操做以下：

[wyp@master /home/q/hadoop-2.2.0]$ bin/hadoop fs -cat /home/wyp/add.txt
5 wyp1 23 131212121212
6 wyp2 24 134535353535
7 wyp3 25 132453535353
8 wyp4 26 154243434355

這個文件是存放在HDFS上/home/wyp目錄（和一中提到的不一樣，一中提到的文件是存放在本地文件系統上）裏面，咱們能夠經過下面的命令將這個文件裏面的內容導入到Hive表中，具體操做以下：

hive> load data inpath '/home/wyp/add.txt' into table wyp;
Loading data to table default.wyp
Table default.wyp stats:
[num_partitions: 0, num_files: 2, num_rows: 0, total_size: 215]
OK
Time taken: 0.47 seconds
hive> select * from wyp;
OK
5 wyp1 23 131212121212
6 wyp2 24 134535353535
7 wyp3 25 132453535353
8 wyp4 26 154243434355
1 wyp 25 13188888888888
2 test 30 13888888888888
3 zs 34 899314121
Time taken: 0.096 seconds, Fetched: 7 row(s)

從上面的執行結果咱們能夠看到，數據的確導入到wyp表中了！請注意load data inpath ‘/home/wyp/add.txt’ into table wyp; 裏面是沒有local這個單詞的，這個是和一中的區別。

3、從別的表中查詢出相應的數據並導入到Hive表中spa

假設Hive中有test表，其建表語句以下所示：

hive> create table test(
> id int, name string
> ,tel string)
> partitioned by
> (age int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;
OK
Time taken: 0.261 seconds

大致和wyp表的建表語句相似，只不過test表裏面用age做爲了分區字段。對於分區，這裏在作解釋一下：

分區：在Hive中，表的每個分區對應表下的相應目錄，全部分區的數據都是存儲在對應的目錄中。好比wyp表有dt和city兩個分區，則對應dt=20131218,city=BJ對應表的目錄爲/user/hive/warehouse/dt=20131218/city=BJ，全部屬於這個分區的數據都存放在這個目錄中。

下面語句就是將wyp表中的查詢結果並插入到test表中：