本文討論 Hive explode 關鍵字使用, 並使用一個簡單案例來驗證結果.java
Hive 支持 array 和 map 類型, 可是如何統計 array 或 map 裏的值, 一直沒有找到好的方法. Pig 有行轉列關鍵字 flatten. 查閱了不少 Hive 資料, 找到了 explode 關鍵字. 謹以此例來驗證 Hive explode 功能.python
hive> create table if not exists explode_array > ( > userId string, > userName string, > tags array<string> > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > COLLECTION ITEMS TERMINATED BY ','; OK Time taken: 0.942 seconds
數據樣例:日誌
00001 zhzhenqin 80,90 00002 hello java, 女 00003 world java,python,90
查詢:code
hive> select * from explode_array limit 10; OK 00001 zhzhenqin ["80","90"] 00002 hello ["java"," 女"] 00003 world ["java","python","90"] Time taken: 0.041 seconds, Fetched: 3 row(s)
使用 explode 查詢:string
hive> select userId,userName,tagId from explode_array lateral view explode(tags) tags as tagId; Total jobs = 1 ... 省略部分日誌 Total MapReduce CPU Time Spent: 0 msec OK 00001 zhzhenqin 80 00001 zhzhenqin 90 00002 hello java 00002 hello 女 00003 world java 00003 world python 00003 world 90 Time taken: 20.104 seconds, Fetched: 7 row(s)
二次嵌套查詢, 並統計:it
hive> select user_tag.tagId, count(*) as count from (select userId,userName,tagId from explode_array lateral view explode(tags) tags as tagId) as user_tag group by user_tag.tagId order by count DESC; Total jobs = 2 ... 省略日誌 Total MapReduce CPU Time Spent: 0 msec OK java 2 90 2 python 1 80 1 女 1 Time taken: 39.994 seconds, Fetched: 5 row(s)
該例子是使用 array 類型, 咱們的用戶和標籤在 Hive 存儲的是 map 類型, Map 的 key 爲 tagid, value 爲 weight. explode 也是支持 map 類型的.table
explode 在轉 array 時, 輸出一列; 轉 map 時, 是輸出2列, key 和 value 當作 2列輸出.select
下面演示 Map 類型的 explode 用法:map
建表語句, 以及導入的數據 create table if not exists explode_map ( userId string, userName string, tags map<string, int> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':'; 數據 00001 zhzhenqin 80:1,90:2 00002 hello java:10,女:2 00003 world java:1,python:3,90:1
查詢驗證:方法
hive> select * from explode_map; OK 00001 zhzhenqin {"80":1,"90":2} 00002 hello {"java":10,"女":2} 00003 world {"java":1,"python":3,"90":1} Time taken: 0.04 seconds, Fetched: 3 row(s)
使用 explode 關鍵字查詢:
hive>select userId,userName,tagId,weight from explode_map lateral view explode(tags) tags as tagId, weight; 00001 zhzhenqin 80 1 00001 zhzhenqin 90 2 00002 hello java 10 00002 hello 女 2 00003 world java 1 00003 world python 3 00003 world 90 1
內嵌查詢及統計:
hive> select user_tag.tagId, count(*) as count from (select userId,userName,tagId,weight from explode_map lateral view explode(tags) tags as tagId, weight) as user_tag group by user_tag.tagId order by count DESC; java 2 90 2 女 1 python 1 80 1