在數據庫SQL處理中,經常有行轉列(Pivot)和列轉行(Unpivot)的數據處理需求。本文以示例說明在Data Lake Analytics中,如何使用SQL的一些技巧,達到行轉列(Pivot)和列轉行(Unpivot)的目的。另外,DLA支持函數式表達式的處理邏輯、豐富的JSON數據處理函數和UNNEST的SQL語法,結合這些功能,可以實現很是豐富、強大的SQL數據處理語義和能力,本文也以JSON數據列展開爲示例,說明在DLA中使用這種SQL的技巧。數據庫
test_pivot表內容:json
+------+----------+---------+--------+ | id | username | subject | source | +------+----------+---------+--------+ | 1 | 張三 | 語文 | 60 | | 2 | 李四 | 數學 | 70 | | 3 | 王五 | 英語 | 80 | | 4 | 王五 | 數學 | 75 | | 5 | 王五 | 語文 | 57 | | 6 | 李四 | 語文 | 80 | | 7 | 張三 | 英語 | 100 | +------+----------+---------+--------+
SQL語句:數據結構
SELECT username, max(CASE WHEN subject = '語文' THEN source END) AS `語文`, max(CASE WHEN subject = '數學' THEN source END) AS `數學`, max(CASE WHEN subject = '英語' THEN source END) AS `英語` FROM test_pivot GROUP BY username ORDER BY username;
結果:app
+----------+--------+--------+--------+ | username | 語文 | 數學 | 英語 | +----------+--------+--------+--------+ | 張三 | 60 | NULL | 100 | | 李四 | 80 | 70 | NULL | | 王五 | 57 | 75 | 80 | +----------+--------+--------+--------+
該方法思路上分爲兩個步驟:
第一步,經過map_agg函數把兩個列的多行的值,映射爲map;
第二步,經過map的輸出,達到多列輸出的目的。函數
第一步SQL:spa
SELECT username, map_agg(subject, source) kv FROM test_pivot GROUP BY username ORDER BY username;
第一步輸出:code
+----------+-----------------------------------+ | username | kv | +----------+-----------------------------------+ | 張三 | {語文=60, 英語=100} | | 李四 | {數學=70, 語文=80} | | 王五 | {數學=75, 語文=57, 英語=80} | +----------+-----------------------------------+
能夠看到map_agg的輸出效果。orm
最終,該方法的SQL:element
SELECT username, if(element_at(kv, '語文') = null, null, kv['語文']) AS `語文`, if(element_at(kv, '數學') = null, null, kv['數學']) AS `數學`, if(element_at(kv, '英語') = null, null, kv['英語']) AS `英語` FROM ( SELECT username, map_agg(subject, source) kv FROM test_pivot GROUP BY username ) t ORDER BY username;
結果:字符串
+----------+--------+--------+--------+ | username | 語文 | 數學 | 英語 | +----------+--------+--------+--------+ | 張三 | 60 | NULL | 100 | | 李四 | 80 | 70 | NULL | | 王五 | 57 | 75 | 80 | +----------+--------+--------+--------+
test_unpivot表內容:
+----------+--------+--------+--------+ | username | 語文 | 數學 | 英語 | +----------+--------+--------+--------+ | 張三 | 60 | NULL | 100 | | 李四 | 80 | 70 | NULL | | 王五 | 57 | 75 | 80 | +----------+--------+--------+--------+
SQL語句:
SELECT username, subject, source FROM ( SELECT username, '語文' AS subject, `語文` AS source FROM test_unpivot WHERE `語文` is not null UNION SELECT username, '數學' AS subject, `數學` AS source FROM test_unpivot WHERE `數學` is not null UNION SELECT username, '英語' AS subject, `英語` AS source FROM test_unpivot WHERE `英語` is not null ) ORDER BY username;
結果:
+----------+---------+--------+ | username | subject | source | +----------+---------+--------+ | 張三 | 語文 | 60 | | 張三 | 英語 | 100 | | 李四 | 語文 | 80 | | 李四 | 數學 | 70 | | 王五 | 英語 | 80 | | 王五 | 語文 | 57 | | 王五 | 數學 | 75 | +----------+---------+--------+
SQL語句:
SELECT t1.username, t2.subject, t2.source FROM test_unpivot t1 CROSS JOIN UNNEST ( array['語文', '數學', '英語'], array[`語文`, `數學`, `英語`] ) t2 (subject, source) WHERE t2.source is not null
結果:
+----------+---------+--------+ | username | subject | source | +----------+---------+--------+ | 張三 | 語文 | 60 | | 張三 | 英語 | 100 | | 李四 | 語文 | 80 | | 李四 | 數學 | 70 | | 王五 | 語文 | 57 | | 王五 | 數學 | 75 | | 王五 | 英語 | 80 | +----------+---------+--------+
JSON數據的表達能力很是靈活,所以在數據庫和SQL中,經常須要處理JSON數據,經常碰到稍複雜的需求,就是將JSON數據中的某些屬性字段,進行展開轉換,轉成行、列的關係型表達。
下面以多個示例說明。
SQL示例:
SELECT t.m, t.n FROM ( SELECT MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) as map_data ) CROSS JOIN unnest(map_data) AS t(m, n);
結果:
+------+------+ | m | n | +------+------+ | foo | 1 | | bar | 2 | +------+------+
SQL示例:
SELECT json_extract(t.a, '$.a') AS a, json_extract(t.a, '$.b') AS b FROM ( SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') AS array<JSON>) AS package_array ) CROSS JOIN UNNEST(package_array) AS t(a);
結果:
+------+------+ | a | b | +------+------+ | 1 | 2 | | 3 | 4 | +------+------+
SQL示例:
SELECT t.m AS _col1, t.n AS _col2 FROM ( SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') AS array<JSON>) AS array_1, cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x') AS array<JSON>) AS array_2 ) CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);
結果:
+---------------+-----------------+ | _col1 | _col2 | +---------------+-----------------+ | {"a":1,"b":2} | {"a":5,"b":6} | | {"a":3,"b":4} | {"a":7,"b":8} | | NULL | {"a":9,"b":10} | | NULL | {"a":11,"b":12} | +---------------+-----------------+
SQL示例:
SELECT json_extract(t.m, '$.a') AS _col1, json_extract(t.m, '$.b') AS _col2, json_extract(t.n, '$.a') AS _col3, json_extract(t.n, '$.b') AS _col4 FROM ( SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') AS array<JSON>) AS array_1, cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x') AS array<JSON>) AS array_2 ) CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);
結果:
+-------+-------+-------+-------+ | _col1 | _col2 | _col3 | _col4 | +-------+-------+-------+-------+ | 1 | 2 | 5 | 6 | | 3 | 4 | 7 | 8 | | NULL | NULL | 9 | 10 | | NULL | NULL | 11 | 12 | +-------+-------+-------+-------+
SQL示例:
SELECT count(*) AS cnt, package_name FROM ( SELECT t.a AS package_name FROM ( SELECT transform(packages_map_array, x -> Element_at(x, 'packageName')) AS package_array FROM ( SELECT cast(Json_extract(data_json, '$.packages') AS array<map<VARCHAR, VARCHAR>>) AS packages_map_array FROM ( SELECT json_parse(data) AS data_json FROM ( SELECT '{ "packages": [ { "appName": "鐵路12306", "packageName": "com.MobileTicket", "versionName": "4.1.9", "versionCode": "194" }, { "appName": "QQ飛車", "packageName": "com.tencent.tmgp.speedmobile", "versionName": "1.11.0.13274", "versionCode": "1110013274" }, { "appName": "掌閱", "packageName": "com.chaozh.iReaderFree", "versionName": "7.11.0", "versionCode": "71101" } ] }' AS data ) ) ) ) AS x (package_array) CROSS JOIN UNNEST(package_array) AS t (a) ) GROUP BY package_name ORDER BY cnt DESC;
結果:
+------+------------------------------+ | cnt | package_name | +------+------------------------------+ | 1 | com.MobileTicket | | 1 | com.tencent.tmgp.speedmobile | | 1 | com.chaozh.iReaderFree | +------+------------------------------+
原文連接 本文爲雲棲社區原創內容,未經容許不得轉載。