Hive-explode[列轉行]關鍵字使用

本文討論 Hive explode 關鍵字使用, 並使用一個簡單案例來驗證結果.java

Hive 支持 array 和 map 類型, 可是如何統計 array 或 map 裏的值, 一直沒有找到好的方法. Pig 有行轉列關鍵字 flatten. 查閱了不少 Hive 資料, 找到了 explode 關鍵字. 謹以此例來驗證 Hive explode 功能.python

hive> create table if not exists explode_array
> (
>     userId string,
>     userName string,
>     tags   array<string>
> ) 
> ROW FORMAT DELIMITED 
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ',';
OK
Time taken: 0.942 seconds

數據樣例:日誌

00001	zhzhenqin	80,90
00002	hello	java, 女	
00003	world	java,python,90

查詢:code

hive> select * from explode_array limit 10;                                                    
OK
00001	zhzhenqin	["80","90"]
00002	hello	["java"," 女"]
00003	world	["java","python","90"]
Time taken: 0.041 seconds, Fetched: 3 row(s)

使用 explode 查詢:string

hive> select userId,userName,tagId from explode_array lateral view explode(tags) tags as tagId;
Total jobs = 1
... 省略部分日誌
Total MapReduce CPU Time Spent: 0 msec
OK
00001	zhzhenqin	80
00001	zhzhenqin	90
00002	hello	java
00002	hello	 女
00003	world	java
00003	world	python
00003	world	90
Time taken: 20.104 seconds, Fetched: 7 row(s)

二次嵌套查詢, 並統計:it

hive> select user_tag.tagId, count(*) as count from (select userId,userName,tagId from explode_array lateral view explode(tags) tags as tagId) as user_tag group by user_tag.tagId order by count DESC;
Total jobs = 2
... 省略日誌
Total MapReduce CPU Time Spent: 0 msec
OK
java	2
90	2
python	1
80	1
 女	1
Time taken: 39.994 seconds, Fetched: 5 row(s)

該例子是使用 array 類型, 咱們的用戶和標籤在 Hive 存儲的是 map 類型, Map 的 key 爲 tagid, value 爲 weight. explode 也是支持 map 類型的.table

explode 在轉 array 時, 輸出一列; 轉 map 時, 是輸出2列, key 和 value 當作 2列輸出.select

下面演示 Map 類型的 explode 用法:map

建表語句, 以及導入的數據
create table if not exists explode_map
(
    userId string,
    userName string,
    tags   map<string, int>
) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS  TERMINATED BY ':';

數據
00001	zhzhenqin	80:1,90:2
00002	hello	java:10,女:2
00003	world	java:1,python:3,90:1

查詢驗證:方法

hive> select * from explode_map;
OK
00001	zhzhenqin	{"80":1,"90":2}
00002	hello	{"java":10,"女":2}
00003	world	{"java":1,"python":3,"90":1}
Time taken: 0.04 seconds, Fetched: 3 row(s)

使用 explode 關鍵字查詢:

hive>select userId,userName,tagId,weight from explode_map lateral view explode(tags) tags as tagId, weight;
00001	zhzhenqin	80	1
00001	zhzhenqin	90	2
00002	hello	java	10
00002	hello	女	2
00003	world	java	1
00003	world	python	3
00003	world	90	1

內嵌查詢及統計:

hive> select user_tag.tagId, count(*) as count from (select userId,userName,tagId,weight from explode_map lateral view explode(tags) tags as tagId, weight) as user_tag group by user_tag.tagId order by count DESC;
java	2
90	2
女	1
python	1
80	1
相關文章
相關標籤/搜索