注意:java
重要的是每行必須是一個完整的JSON,一個JSON不能跨越多行,也就是說,serde不會對多行的Json有效。 由於這是由Hadoop處理文件的工做方式決定,文件必須是可拆分的,例如,Hadoop將在行尾分割文本文件。apache
// this will work { "key" : 10 } // this will not work { "key" : 10
add jar json-serde-1.3.7-jar-with-dependencies.jar;數組
{"country":"Switzerland","languages":["German","French","Italian"]} {"country":"China","languages":["chinese"]}
CREATE TABLE tmp_json_array ( country string, languages array<string> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE; // 導入數據到表中 LOAD DATA LOCAL INPATH '/home/xiaosi/a.txt' OVERWRITE INTO TABLE tmp_json_array;
hive> select languages[0] from tmp_json_array; OK German chinese Time taken: 0.096 seconds, Fetched: 2 row(s)
{"country":"Switzerland","languages":["German","French","Italian"],"religions":{"catholic":[6,7]}} {"country":"China","languages":["chinese"],"religions":{"catholic":[10,20],"protestant":[40,50]}}
CREATE TABLE tmp_json_nested ( country string, languages array<string>, religions map<string,array<int>>) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE; // 加載數據 LOAD DATA LOCAL INPATH '/home/xiaosi/a.txt' OVERWRITE INTO TABLE tmp_json_nested ;
hive> select * from tmp_json_nested; OK Switzerland ["German","French","Italian"] {"catholic":[6,7]} China ["chinese"] {"catholic":[10,20],"protestant":[40,50]} Time taken: 0.113 seconds, Fetched: 2 row(s) hive> select languages[0] from tmp_json_nested; OK German chinese Time taken: 0.122 seconds, Fetched: 2 row(s) hive> select religions['catholic'][0] from tmp_json_nested; OK 6 10 Time taken: 0.111 seconds, Fetched: 2 row(s)
格式錯誤的數據的默認行爲是拋出異常。 例如,對於格式不正確的json(languages後缺乏':'):app
{"country":"Italy","languages"["Italian"],"religions":{"protestant":[40,50]}}
hive> LOAD DATA LOCAL INPATH '/home/xiaosi/a.txt' OVERWRITE INTO TABLE tmp_json_nested ; Loading data to table default.tmp_json_nested OK Time taken: 0.23 seconds hive> select * from tmp_json_nested; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a ':' after a key at 31 [character 32 line 1] Time taken: 0.096 seconds
ALTER TABLE json_table SET SERDEPROPERTIES ( "ignore.malformed.json" = "true");
hive> ALTER TABLE tmp_json_nested SET SERDEPROPERTIES ( "ignore.malformed.json" = "true"); OK Time taken: 0.122 seconds hive> select * from tmp_json_nested; OK Switzerland ["German","French","Italian"] {"catholic":[6,7]} China ["chinese"] {"catholic":[10,20],"protestant":[40,50]} NULL NULL NULL Time taken: 0.103 seconds, Fetched: 3 row(s)
如今不會致使查詢失敗,可是壞數據記錄將變爲NULL NULL NULL。oop
{"country":"Italy","languages":"Italian","religions":{"catholic":"90"}}
hive> ALTER TABLE tmp_json_nested SET SERDEPROPERTIES ( "ignore.malformed.json" = "true"); OK Time taken: 0.081 seconds hive> select * from tmp_json_nested; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.lang.String cannot be cast to org.openx.data.jsonserde.json.JSONArray Time taken: 0.097 seconds
這是一個常見的問題,某一個字段有時是一個標量,有時是一個數組,例如:this
{ field: "hello", .. } { field: [ "hello", "world" ], ...
在這種狀況下,若是將表聲明爲array<string>,若是SerDe找到一個標量,它將返回一個單元素的數組,從而有效地將標量提高爲數組。 可是標量必須是正確的類型。.net
有時可能發生的是,JSON數據具備名爲hive中的保留字的屬性。 例如,您可能有一個名爲「timestamp」的JSON屬性,它是hive中的保留字,當發出CREATE TABLE時,hive將失敗。 此SerDe可使用SerDe屬性將hive列映射到名稱不一樣的屬性。code
{"country":"Switzerland","exec_date":"2017-03-14 23:12:21"} {"country":"China","exec_date":"2017-03-16 03:22:18"}
CREATE TABLE tmp_json_mapping ( country string, dt string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("mapping.dt"="exec_date") STORED AS TEXTFILE;
hive> select * from tmp_json_mapping; OK Switzerland 2017-03-14 23:12:21 China 2017-03-16 03:22:18 Time taken: 0.081 seconds, Fetched: 2 row(s)
「mapping.dt」,表示dt列讀取JSON屬性爲exec_date的值。orm