DLA SQL技巧:行、列轉換和JSON數據列展開

1. 簡介

在數據庫SQL處理中,經常有行轉列(Pivot)和列轉行(Unpivot)的數據處理需求。本文以示例說明在Data Lake Analytics中,如何使用SQL的一些技巧,達到行轉列(Pivot)和列轉行(Unpivot)的目的。另外,DLA支持函數式表達式的處理邏輯、豐富的JSON數據處理函數和UNNEST的SQL語法,結合這些功能,可以實現很是豐富、強大的SQL數據處理語義和能力,本文也以JSON數據列展開爲示例,說明在DLA中使用這種SQL的技巧。數據庫

2. 行轉列(Pivot)

2.1 樣例數據

test_pivot表內容:json

+------+----------+---------+--------+
| id   | username | subject | source |
+------+----------+---------+--------+
| 1    | 張三     | 語文    | 60     |
| 2    | 李四     | 數學    | 70     |
| 3    | 王五     | 英語    | 80     |
| 4    | 王五     | 數學    | 75     |
| 5    | 王五     | 語文    | 57     |
| 6    | 李四     | 語文    | 80     |
| 7    | 張三     | 英語    | 100    |
+------+----------+---------+--------+

2.2 方法一:經過CASE WHEN語句

SQL語句:數據結構

SELECT 
   username,
   max(CASE WHEN subject = '語文' THEN source END) AS `語文`,
   max(CASE WHEN subject = '數學' THEN source END) AS `數學`,
   max(CASE WHEN subject = '英語' THEN source END) AS `英語`
FROM test_pivot
GROUP BY username
ORDER BY username;

結果:app

+----------+--------+--------+--------+
| username | 語文   | 數學   | 英語   |
+----------+--------+--------+--------+
| 張三     | 60     | NULL   | 100    |
| 李四     | 80     | 70     | NULL   |
| 王五     | 57     | 75     | 80     |
+----------+--------+--------+--------+

2.3 方法二:經過map_agg函數

該方法思路上分爲兩個步驟:
第一步,經過map_agg函數把兩個列的多行的值,映射爲map;
第二步,經過map的輸出,達到多列輸出的目的。函數

第一步SQL:spa

SELECT username, map_agg(subject, source) kv
FROM test_pivot
GROUP BY username
ORDER BY username;

第一步輸出:code

+----------+-----------------------------------+
| username | kv                                |
+----------+-----------------------------------+
| 張三     | {語文=60, 英語=100}               |
| 李四     | {數學=70, 語文=80}                |
| 王五     | {數學=75, 語文=57, 英語=80}       |
+----------+-----------------------------------+

能夠看到map_agg的輸出效果。orm

最終,該方法的SQL:element

SELECT
  username,
  if(element_at(kv, '語文') = null, null, kv['語文']) AS `語文`,
  if(element_at(kv, '數學') = null, null, kv['數學']) AS `數學`,
  if(element_at(kv, '英語') = null, null, kv['英語']) AS `英語`
FROM (
  SELECT username, map_agg(subject, source) kv
  FROM test_pivot
  GROUP BY username
) t
ORDER BY username;

結果:字符串

+----------+--------+--------+--------+
| username | 語文   | 數學   | 英語   |
+----------+--------+--------+--------+
| 張三     | 60     | NULL   | 100    |
| 李四     | 80     | 70     | NULL   |
| 王五     | 57     | 75     | 80     |
+----------+--------+--------+--------+

3. 列轉行(Unpivot)

3.1 樣例數據

test_unpivot表內容:

+----------+--------+--------+--------+
| username | 語文   | 數學   | 英語   |
+----------+--------+--------+--------+
| 張三     | 60     | NULL   | 100    |
| 李四     | 80     | 70     | NULL   |
| 王五     | 57     | 75     | 80     |
+----------+--------+--------+--------+

3.2 方法一:經過UNION語句

SQL語句:

SELECT username, subject, source
FROM (
  SELECT username, '語文' AS subject, `語文` AS source FROM test_unpivot WHERE `語文` is not null
  UNION
  SELECT username, '數學' AS subject, `數學` AS source FROM test_unpivot WHERE `數學` is not null
  UNION
  SELECT username, '英語' AS subject, `英語` AS source FROM test_unpivot WHERE `英語` is not null
)
ORDER BY username;

結果:

+----------+---------+--------+
| username | subject | source |
+----------+---------+--------+
| 張三     | 語文    | 60     |
| 張三     | 英語    | 100    |
| 李四     | 語文    | 80     |
| 李四     | 數學    | 70     |
| 王五     | 英語    | 80     |
| 王五     | 語文    | 57     |
| 王五     | 數學    | 75     |
+----------+---------+--------+

3.3 方法二:經過CROSS JOIN UNNEST語句

SQL語句:

SELECT t1.username, t2.subject, t2.source
FROM test_unpivot t1
CROSS JOIN UNNEST (
  array['語文', '數學', '英語'],
  array[`語文`, `數學`, `英語`]
) t2 (subject, source)
WHERE t2.source is not null

結果:

+----------+---------+--------+
| username | subject | source |
+----------+---------+--------+
| 張三     | 語文    | 60     |
| 張三     | 英語    | 100    |
| 李四     | 語文    | 80     |
| 李四     | 數學    | 70     |
| 王五     | 語文    | 57     |
| 王五     | 數學    | 75     |
| 王五     | 英語    | 80     |
+----------+---------+--------+

4. JSON數據列展開

JSON數據的表達能力很是靈活,所以在數據庫和SQL中,經常須要處理JSON數據,經常碰到稍複雜的需求,就是將JSON數據中的某些屬性字段,進行展開轉換,轉成行、列的關係型表達。

4.1 基本思路和步驟

  • 使用JSON函數,對JSON字符串進行解析和數據提取;
  • 提取、轉換爲ARRAY或者MAP的數據結構,若有須要,能夠使用Lambda函數式表達式進行轉換處理;
  • 利用UNNEST語法進行列展開。

下面以多個示例說明。

4.2 用UNNEST對MAP進行關係型展開

SQL示例:

SELECT t.m, t.n
FROM (
  SELECT MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) as map_data
)
CROSS JOIN unnest(map_data) AS t(m, n);

結果:

+------+------+
| m    | n    |
+------+------+
| foo  |    1 |
| bar  |    2 |
+------+------+

4.3 用UNNEST對JSON數據進行關係型展開

SQL示例:

SELECT json_extract(t.a, '$.a') AS a, 
       json_extract(t.a, '$.b') AS b
FROM (
    SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') 
           AS array<JSON>) AS package_array
)
CROSS JOIN UNNEST(package_array) AS t(a);

結果:

+------+------+
| a    | b    |
+------+------+
| 1    | 2    |
| 3    | 4    |
+------+------+

SQL示例:

SELECT t.m AS _col1, t.n AS _col2
FROM (
    SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') 
           AS array<JSON>) AS array_1, 
           cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x') 
           AS array<JSON>) AS array_2
)
CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);

結果:

+---------------+-----------------+
| _col1         | _col2           |
+---------------+-----------------+
| {"a":1,"b":2} | {"a":5,"b":6}   |
| {"a":3,"b":4} | {"a":7,"b":8}   |
| NULL          | {"a":9,"b":10}  |
| NULL          | {"a":11,"b":12} |
+---------------+-----------------+

SQL示例:

SELECT json_extract(t.m, '$.a') AS _col1, 
       json_extract(t.m, '$.b') AS _col2, 
       json_extract(t.n, '$.a') AS _col3, 
       json_extract(t.n, '$.b') AS _col4 
FROM (
    SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') 
           AS array<JSON>) AS array_1, 
           cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x') 
           AS array<JSON>) AS array_2
)
CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);

結果:

+-------+-------+-------+-------+
| _col1 | _col2 | _col3 | _col4 |
+-------+-------+-------+-------+
| 1     | 2     | 5     | 6     |
| 3     | 4     | 7     | 8     |
| NULL  | NULL  | 9     | 10    |
| NULL  | NULL  | 11    | 12    |
+-------+-------+-------+-------+

4.4 結合Lambda表達式,用UNNEST對JSON數據進行關係型展開

SQL示例:

SELECT count(*) AS cnt, 
       package_name 
FROM ( 
    SELECT t.a AS package_name 
    FROM ( 
        SELECT transform(packages_map_array, x -> Element_at(x, 'packageName')) 
               AS package_array 
        FROM (
            SELECT cast(Json_extract(data_json, '$.packages') 
                   AS array<map<VARCHAR, VARCHAR>>) AS packages_map_array
            FROM (
                SELECT json_parse(data) AS data_json
                FROM ( 
                    SELECT '{
                              "packages": [
                                {
                                  "appName": "鐵路12306",
                                  "packageName": "com.MobileTicket",
                                  "versionName": "4.1.9",
                                  "versionCode": "194"
                                },
                                {
                                  "appName": "QQ飛車",
                                  "packageName": "com.tencent.tmgp.speedmobile",
                                  "versionName": "1.11.0.13274",
                                  "versionCode": "1110013274"
                                },
                                {
                                  "appName": "掌閱",
                                  "packageName": "com.chaozh.iReaderFree",
                                  "versionName": "7.11.0",
                                  "versionCode": "71101"
                                }
                             ]
                           }'
                    AS data 
                )
            )
        ) 
    ) AS x (package_array)
    CROSS JOIN UNNEST(package_array) AS t (a)
)
GROUP BY package_name 
ORDER BY cnt DESC;

結果:

+------+------------------------------+
| cnt  | package_name                 |
+------+------------------------------+
|    1 | com.MobileTicket             |
|    1 | com.tencent.tmgp.speedmobile |
|    1 | com.chaozh.iReaderFree       |
+------+------------------------------+


原文連接 本文爲雲棲社區原創內容,未經容許不得轉載。

相關文章
相關標籤/搜索