Lateral View是Hive中提供給UDTF的conjunction,它能夠解決UDTF不能添加額外的select列的問題。sql
1. Why we need Lateral View?
當咱們想對hive表中某一列進行split以後,想對其轉換成1 to N的模式,即一行轉多列。
hive不容許咱們在UDTF函數以外,再添加其它select語句。
以下,咱們想將登陸某個遊戲的用戶id放在一個字段user_ids裏,對每一行數據用UDTF後輸出多行。express
select game_id, explode(split(user_ids,'\\[\\[\\[')) as user_id from login_game_log where dt='2014-05-15' FAILED: Error in semantic analysis: UDTF's are not supported outside the SELECT clause, nor nested in expressions。
提示語法分析錯誤,UDTF不支持函數以外的select 語句,真無語。。。
若是咱們想支持怎麼辦呢?接下來就是Lateral View 登場的時候了。app
2. Lateral View explain
2.1 單個Lateral View
Lateral view is used in conjunction with user-defined table generatingfunctions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows foreach input row. A lateral view first applies the UDTF to each row of base tableand then joins resulting output rows to the input rows to form a virtual tablehaving the supplied table alias.jsp
解釋一下:
Lateral view 其實就是用來和像相似explode這種UDTF函數聯用的。lateral view 會將UDTF生成的結果放到一個虛擬表中,而後這個虛擬表會和輸入行即每一個game_id進行join 來達到鏈接UDTF外的select字段的目的。ide
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)* fromClause: FROM baseTable (lateralView)*
1. 在udtf前面用能夠看出,能夠在2個地方用Lateral view:
2. 在from baseTable後面用函數
舉個例子:
1. 先建立一個文件,裏面2列用\t分割,game_id和user_ids;oop
hive> create table test_lateral_view_shengli(game_id string,userl_ids string) row format delimited fields terminated by '\t' stored as textfile; OK Time taken: 2.451 seconds hive> load data local inpath '/home/hadoop/test_lateral' into table test_lateral_view_shengli; Copying data from file:/home/hadoop/test_lateral Copying file: file:/home/hadoop/test_lateral Loading data to table dw.test_lateral_view_shengli OK Time taken: 6.716 seconds; hive> select * from test_lateral_view_shengli; OK game101 15358083654[[[ab33787873[[[zjy18052480603[[[shlg1881826[[[lxqab110 game66 winning1ren[[[13810537508 game101 hu330602003[[[hu330602004[[[hu330602005[[[15967506560
下面使用lateral_viewui
hive> select game_id, user_id > from test_lateral_view_shengli > lateral view explode(split(userl_ids,'\\[\\[\\[')) snTable as user_id > ; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403301416_445839, Tracking URL = http://10.1.9.10:50030/jobdetails.jsp?jobid=job_201403301416_445839 Kill Command = /app/home/hadoop/src/hadoop-0.20.2-cdh3u5/bin/../bin/hadoop job -Dmapred.job.tracker=10.1.9.10:9001 -kill job_201403301416_445839 2014-05-16 17:39:19,108 Stage-1 map = 0%, reduce = 0% 2014-05-16 17:39:28,157 Stage-1 map = 100%, reduce = 0% 2014-05-16 17:39:38,830 Stage-1 map = 100%, reduce = 100% Ended Job = job_201403301416_445839 OK game101 hu330602003 game101 hu330602004 game101 hu330602005 game101 15967506560 game101 15358083654 game101 ab33787873 game101 zjy18052480603 game101 shlg1881826 game101 lxqab110 game66 winning1ren game66 13810537508
2.2 多個Lateral View
From語句後能夠跟多個Lateral View。
A FROM clause can have multiple LATERAL VIEW clauses. Subsequent LATERAL VIEWS can reference columns from any of the tables appearing to the left of the LATERAL VIEW.
給定數據:spa
Array<int> col1 | Array<string> col2 |
[1, 2] | [a", "b", "c"] |
[3, 4] | [d", "e", "f"] |
轉換目標:
想同時把第一列和第二列拆開,相似作笛卡爾乘積。.net
int myCol1 | string myCol2 |
---|---|
1 | "a" |
1 | "b" |
1 | "c" |
2 | "a" |
2 | "b" |
2 | "c" |
3 | "d" |
3 | "e" |
3 | "f" |
4 | "d" |
4 | "e" |
4 | "f" |
咱們能夠這樣寫:
SELECT myCol1, myCol2 FROM baseTable LATERAL VIEW explode(col1) myTable1 AS myCol1 LATERAL VIEW explode(col2) myTable2 AS myCol2;
還有一種狀況,若是UDTF轉換的Array是空的怎麼辦呢?
在Hive0.12裏面會支持outer關鍵字,若是UDTF的結果是空,默認會被忽略輸出。
若是加上outer關鍵字,則會像left outer join 同樣,仍是會輸出select出的列,而UDTF的輸出結果是NULL。
hive> select * FROM test_lateral_view_shengli LATERAL VIEW explode(array()) C AS a ;
結果是什麼都不輸出。
若是加上outer關鍵字:
SELECT * FROM src LATERAL VIEW OUTER explode(array()) C AS a limit 10;
238 val_238 NULL 86 val_86 NULL 311 val_311 NULL 27 val_27 NULL 165 val_165 NULL 409 val_409 NULL 255 val_255 NULL 278 val_278 NULL 98 val_98 NULL ...
Lateral View一般和UDTF一塊兒出現,爲了解決UDTF不容許在select字段的問題。 Multiple Lateral View能夠實現相似笛卡爾乘積。 Outer關鍵字能夠把不輸出的UDTF的空結果,輸出成NULL,防止丟失數據。 原創文章,轉載請註明出自:http://blog.csdn.net/oopsoom/article/details/26001307