hive中解析通常的json是很容易的,get_json_object就能夠了。sql
但若是字段是json數組,好比json
[{"bssid":"6C:59:40:21:05:C4","ssid":"MERCURY_05C4"},{"bssid":"AC:9C:E4:04:EE:52","appid":"10003","ssid":"and-Business"}],數組
直接調用get_json_object返回空值。這樣的話對於不會寫UDF的同窗來講,解析json數組就變得很棘手,好在hive中自帶了explode函數,從而讓解析json數組變得有可能了。這裏先介紹一下explode的使用方法。app
explode(array)函數
select explode(array('A','B','C')) as col; select tf.* from (select 0 from dual) t lateral view explode(array('A','B','C')) tf as col; 運行結果: col C B A
函數說明:explode的參數是數組,提供了相似於列轉的功能;假如參數數組長度爲3,則返回的記錄會是3行,且每列爲各個數組項,如上。回到
[{"bssid":"6C:59:40:21:05:C4","ssid":"MERCURY_05C4"},
{"bssid":"AC:9C:E4:04:EE:52","appid":"10003","ssid":"and-Business"}],
怎麼解析出bssid?思路是經過explode把原數據變成2行數據
({"bssid":"6C:59:40:21:05:C4","ssid":"MERCURY_05C4"}和
{"bssid":"AC:9C:E4:04:EE:52","appid":"10003","ssid":"and-Business"}),spa
而後再使用get_json_object解析。 具體代碼以下: select ss.col from ( select split(regexp_replace(regexp_extract( '[{"bssid":"6C:59:40:21:05:C4","ssid":"MERCURY_05C4"}, {"bssid":"AC:9C:E4:04:EE:52","appid":"10003","ssid":"and-Business"}]', '^\\[(.+)\\]$',1), '\\}\\,\\{', '\\}\\|\\|\\{'), '\\|\\|' ) as str from dual) pp lateral view explode(pp.str) ss as col ; 運行結果: col {"bssid":"AC:9C:E4:04:EE:52","appid":"10003","ssid":"and-Business"} {"bssid":"6C:59:40:21:05:C4","ssid":"MERCURY_05C4"}
說明:由於原數據是string(並非真正的數組類型)類型的,因此沒法直接使用explode函數。
1.regexp_extract('xxx','^\\[(.+)\\]$',1) 這裏是把須要解析的json數組去除左右中括號,須要注意的是這裏的中括號須要兩個轉義字符\\[。
2.regexp_replace('xxx','\\}\\,\\{', '\\}\\|\\|\\{') 把json數組的逗號分隔符變成兩根豎線||,能夠自定義分隔符只要不在json數組項出現就能夠。Note:odps中實操須要改爲:regexp_replace('xxx','\\}\\,\\{', '}||{') 把json數組的逗號分隔符變成兩根豎線||,能夠自定義分隔符只要不在json數組項出現就能夠。
3.使用split函數返回的數組,分隔符爲上面定義好的。
4.lateral view explode處理3中返回的數組。
另外,hive中的json_tuple解析json比get_json_object更方便。code
select ss.col,rr.appid,rr.ssid,rr.bssid from ( select split(regexp_replace(regexp_extract(' [{"bssid":"6C:59:40:21:05:C4","ssid":"MERCURY_05C4"}, {"bssid":"AC:9C:E4:04:EE:52","appid":"10003","ssid":"and-Business"}]', '^\\[(.+)\\]$',1), '\\}\\,\\{', '\\}\\|\\|\\{'), '\\|\\|' ) as str from dual) pp lateral view explode(pp.str) ss as col lateral view json_tuple(ss.col,'appid','ssid','bssid') rr as appid,ssid,bssid;
運行結果:
col appid ssid bssid
{"bssid":"AC:9C:E4:04:EE:52","appid":"10003","ssid":"and-Business"}10003and-BusinessAC:9C:E4:04:EE:52
{"bssid":"6C:59:40:21:05:C4","ssid":"MERCURY_05C4"}\NMERCURY_05C46C:59:40:21:05:C4
json_tuple能夠一次性解析多個字段,而get_json_object一次只能解析一個字段。regexp