Hive 複雜數據類型的使用

時間 2019-12-13

標籤 hive 複雜數據類型使用欄目 Hadoop 简体版

原文原文鏈接

Hive複雜數據類型

一、Array數據類型的使用

1.一、建立數據庫表，以array做爲數據類型

hive (hive_demo1)> create table stu_test(name array<string>,phone array<string>) 
                 > row format delimited fields terminated by'\t'                
                 > collection items terminated by',';                           
OK

1.二、在/opt/datas/test目錄下建立stu_info.txt文件，並將文件內容導入hive的stu_test表中

[liupeng@tonyliu test]$ pwd
/opt/datas/test
[liupeng@tonyliu test]$ ls
person.txt  stu_info.txt
[liupeng@tonyliu test]$ more stu_info.txt             //建立數據並查看
小明,小王,小張	15975319964,18665851264,13278659963
tony, tom,jack	18677549911,15923458765,18665851989
[liupeng@tonyliu test]$

hive (hive_demo1)> load data local inpath'/opt/datas/test/stu_info.txt' into table stu_test;    //load數據到stu_test表中
Copying data from file:/opt/datas/test/stu_info.txt
Copying file: file:/opt/datas/test/stu_info.txt
Loading data to table hive_demo1.stu_test
Table hive_demo1.stu_test stats: [numFiles=1, numRows=0, totalSize=108, rawDataSize=0]
OK
Time taken: 0.439 seconds

1.三、查詢stu_info表

hive (hive_demo1)> select * from stu_test;              //查看stu_test全部數據
OK
stu_test.name	stu_test.phone
["小明","小王","小張"]	["15975319964","18665851264","13278659963"]
["tony"," tom","jack"]	["18677549911","15923458765","18665851989"]
Time taken: 0.057 seconds, Fetched: 2 row(s)

1.四、查詢stu_info表中array數據類型字段的指定列

hive (hive_demo1)> select name[0],phone[0] from stu_test;    //顯示stu_info中的name,phone arraylist中的第一個元素
OK
_c0	_c1
小明	15975319964
tony	18677549911
Time taken: 0.117 seconds, Fetched: 2 row(s)

1.五、查詢array數據類型字段的長度java

hive (hive_demo1)> select name,size(phone) from stu_test;        //size()是用來判斷長度的
OK
name	_c1
["小明","小王","小張"]	3
["tony"," tom","jack"]	3
Time taken: 0.071 seconds, Fetched: 2 row(s)

hive (hive_demo1)> select size(name),size(phone) from stu_test;
OK
_c0	_c1
3	3
3	3
Time taken: 0.08 seconds, Fetched: 2 row(s)

1.六、查詢包含array數據類型字段指定列的一行數據


hive (hive_demo1)> select name[1],phone[1] from stu_test where array_contains(name,'小王'); //具體指定arraylist中第2個元素並指定符合條件的contains條目。  
OK
_c0	_c1
小王	18665851264
Time taken: 0.079 seconds, Fetched: 1 row(s)

1.七、查看錶結構

hive (hive_demo1)> desc stu_test;
OK
col_name	data_type	comment
name                	array<string>       	                    
phone               	array<string>       	                    
Time taken: 0.095 seconds, Fetched: 2 row(s)

二、Map數據類型的使用

2.一、建立表的同時使用Map數據類型

//建立 per_test表
hive (hive_demo1)> create table per_test(name string,info map<string,string>)
                 > row format delimited fields terminated by'\t'
                 > collection items terminated by'\073' //由於個人數據字段分隔符中含有';'，由於hdfs文件的的格式就是用分號隔開的。所以衝突狀況下會報錯。爲了解決這個問題找到分號的asc碼值 ： http://blog.csdn.net/lxpbs8851/article/details/11525501

（其餘字符有一樣問題也能夠這樣作）找到的是073 那麼將定義表的語句修改成：row format delimited fields terminated by '\073' 數據庫

                 > map keys terminated by':';
OK
Time taken: 0.09 seconds

2.二、在/opt/datas/test中編輯person.txt文件

[liupeng@tonyliu test]$ pwd
/opt/datas/test
[liupeng@tonyliu test]$ ls
person.txt  stu_info.txt
[liupeng@tonyliu test]$ more person.txt 
小明	年齡:18;身高:1米8;地址:北京
小紅	年齡:30;身高:1米72;地址:上海
小李	年齡:27;身高:1米90;地址:深圳
[liupeng@tonyliu test]$

2.三、將person.txt文件中的數據導入hive中的per_test表中

hive (hive_demo1)> load data local inpath'/opt/datas/test/person.txt'into table per_test;
Copying data from file:/opt/datas/test/person.txt
Copying file: file:/opt/datas/test/person.txt
Loading data to table hive_demo1.per_test
Table hive_demo1.per_test stats: [numFiles=1, numRows=0, totalSize=134, rawDataSize=0]
OK
Time taken: 0.269 seconds

2.四、查詢per_test表中所有數據

hive (hive_demo1)> select * from per_test;
OK
per_test.name	per_test.info
小明	{"年齡":"18","身高":"1米8","地址":"北京"}
小紅	{"年齡":"30","身高":"1米72","地址":"上海"}
小李	{"年齡":"27","身高":"1米90","地址":"深圳"}
Time taken: 0.049 seconds, Fetched: 3 row(s)

2.五、查詢per_test表中數據

//取per_test表中某個字段的值（name）

hive (hive_demo1)> select name from per_test;
OK
name
小明
小紅
小李
Time taken: 0.062 seconds, Fetched: 3 row(s)


//取per_test表中某個字段的值（info）
//由於info在咱們數據中有多個字段，中間是經過,號作了分割。所以直接取info的話會把全部字段返回。

hive (hive_demo1)> select info from per_test;
OK
info
{"年齡":"18","身高":"1米8","地址":"北京"}
{"年齡":"30","身高":"1米72","地址":"上海"}
{"年齡":"27","身高":"1米90","地址":"深圳"}
Time taken: 0.039 seconds, Fetched: 3 row(s)

//也能夠指定具體字段，以及字段中子字段的value進行輸出。子字段是經過指定key的值來識別並輸出value的

hive (hive_demo1)> select name,info['年齡']from per_info;
OK
name	_c1
小明	18
小紅	30
小李	27
Time taken: 0.049 seconds, Fetched: 3 row(s)

//同上，某個字段中也能夠輸出多個子字段的value值。經過指定key


hive (hive_demo1)> select name,info['年齡'],info['身高'],info['地址']from per_info;
OK
name	_c1	_c2	_c3
小明	18	1米8	北京
小紅	30	1米72	上海
小李	27	1米90	深圳
Time taken: 0.051 seconds, Fetched: 3 row(s)

三、Struct數據類型的使用

3.一、建立表的同時使用struct數據類型

hive (hive_demo1)> create table struct_info(                                                    
                 > id int,info struct<key:string,value:int>)               //info爲字段標示名，struct<key,value>  key指定子字段的鍵，value指定子字段對應鍵的值                   
                 > row format delimited fields terminated by'.'                                 
                 > collection items terminated by':';                                           
OK
Time taken: 0.125 seconds

3.二、建立stu_struct.txt文件，並將文件數據導入到hive的stu_struct表中

[liupeng@tonyliu test]$ pwd
/opt/datas/test
[liupeng@tonyliu test]$ ls
person_map.txt  stu_list.txt  stu_struct.txt
[liupeng@tonyliu test]$ more stu_struct.txt 
1.小明:90
2.小紅:100
3.小方:70
4.小白:50
5.小蘭:60
6.小花:85
[liupeng@tonyliu test]$

hive (hive_demo1)> load data local inpath'/opt/datas/test/stu_struct.txt'into table struct_info;
Copying data from file:/opt/datas/test/stu_struct.txt
Copying file: file:/opt/datas/test/stu_struct.txt
Loading data to table hive_demo1.struct_info
Table hive_demo1.struct_info stats: [numFiles=1, numRows=0, totalSize=73, rawDataSize=0]
OK
Time taken: 0.256 seconds

3.三、查詢struct_info表中數據（所有查詢，部分查詢及子元素的查詢）

(1) 顯示全表spa

hive (hive_demo1)> select * from struct_info;
OK
struct_info.id	struct_info.info
1	{"key":"小明","value":90}
2	{"key":"小紅","value":100}
3	{"key":"小方","value":70}
4	{"key":"小白","value":50}
5	{"key":"小蘭","value":60}
6	{"key":"小花","value":85}
Time taken: 0.059 seconds, Fetched: 6 row(s)

(2) 顯示錶中字段　　.net

hive (hive_demo1)> select id from struct_info;   //顯示id 這個字段的信息
OK
id
1
2
3
4
5
6
Time taken: 0.065 seconds, Fetched: 6 row(s)
hive (hive_demo1)> select info from struct_info;    //顯示info這個字段的信息
OK
info
{"key":"小明","value":90}
{"key":"小紅","value":100}
{"key":"小方","value":70}
{"key":"小白","value":50}
{"key":"小蘭","value":60}
{"key":"小花","value":85}
Time taken: 0.056 seconds, Fetched: 6 row(s)

(3) 顯示子字段key與value的字段信息orm

hive (hive_demo1)> select info.key from struct_info;     //顯示key的信息
OK
key
小明
小紅
小方
小白
小蘭
小花
Time taken: 0.063 seconds, Fetched: 6 row(s)

hive (hive_demo1)> select info.value from struct_info;   //顯示value的信息
OK
value
90
100
70
50
60
85
Time taken: 0.056 seconds, Fetched: 6 row(s)

(4) 經過where條件語句過濾出指定顯示的語句blog

hive (hive_demo1)> select id,info from struct_info where id=1;    //加上where條件語句顯示其中1條指定信息
OK
id	info
1	{"key":"小明","value":90}
Time taken: 0.112 seconds, Fetched: 1 row(s)

(5) 選擇value做爲範圍取指定key的值get

hive (hive_demo1)> select info from struct_info where info.key='小明';
OK
info
{"key":"小明","value":90}
Time taken: 0.042 seconds, Fetched: 1 row(s)

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。