您可能有大量應用程序產生的JSON數據,您可能須要對這些JSON數據進行整理,去除不想要的字段,或者只保留想要的字段,或者僅僅是進行數據查詢。數據庫
那麼,利用阿里雲Data Lake Analytics或許是目前能找到的雲上最爲便捷的達到這一目標的服務了。僅僅須要3步,就能夠完成對海量JSON數據的處理,或者更爲複雜的ETL流程。json
利用各類手段,將JSON數據投遞到OSS中。
一般,對於雲上日誌鏈路,還有一種JSON到OSS的投遞鏈路,能夠參考「雲原生日誌數據分析上手指南」其中的JSON部分。函數
參考上述「雲原生日誌數據分析上手指南」,其中已經有海量JSON數據的分區模式建表方法了。本例中,以非分區表爲例,假設,數據文件中每一行一個JSON數據,JSON數據放置的OSS路徑爲:阿里雲
oss://your_bucket/json_data/...
則,在DLA中執行建表:url
CREATE EXTERNAL TABLE simple_json ( data STRING ) STORED AS TEXTFILE LOCATION 'oss://your_bucket/json_data/';
json_remove
從JSON中去除指定JSON Path的數據。能夠一次處理一個JSON path,也能夠一次處理多個JSON path。注意:目前還不支持「..」等JSON path的模糊匹配,不久後會支持。spa
json_remove(json_string, json_path_string) -> json_string json_remove(json_string, array[json_path_string]) -> json_string
示例:日誌
select json_remove( '{ "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } }' , '$.glossary.GlossDiv') a; -> {"glossary":{"title":"example glossary"}} select json_remove( '{ "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } }' , array['$.glossary.title', '$.glossary.GlossDiv.title']) a; {"glossary":{"GlossDiv":{"GlossList":{"GlossEntry":{"GlossTerm":"Standard Generalized Markup Language","GlossSee":"markup","SortAs":"SGML","GlossDef":{"para":"A meta-markup language, used to create markup languages such as DocBook.","GlossSeeAlso":["GML","XML"]},"ID":"SGML","Acronym":"SGML","Abbrev":"ISO 8879:1986"}}}}}
json_reserve
從JSON中保留指定JSON Path的數據,去除其餘的數據。能夠一次處理一個JSON path,也能夠一次處理多個JSON path。注意:目前還不支持「..」等JSON path的模糊匹配,不久後會支持。code
json_reserve(json_string, json_path_string) -> json_string json_reserve(json_string, array[json_path_string]) -> json_string
示例:rem
select json_reserve( '{ "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } }' , array['$.glossary.title']) a; -> {"glossary":{"title":"example glossary"}} select json_reserve( '{ "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } }' , array['$.glossary.title', '$.glossary.GlossDiv.title', '$.glossary.GlossDiv.GlossList.GlossEntry.ID']) a; -> "glossary":{"title":"example glossary","GlossDiv":{"GlossList":{"GlossEntry":{"ID":"SGML"}},"title":"S"}}}
還能夠利用Data Lake Analytics強大的雲上數據處理能力,進行多源數據融合處理、分析,迴流到其餘數據庫、存儲系統中。get
原文連接 本文爲雲棲社區原創內容,未經容許不得轉載。