Java 嵌入 SPL 輕鬆實現數據分組

時間 2019-11-16

標籤 java 嵌入 spl 輕鬆實現數據分組欄目 Java 简体版

原文原文鏈接

問題介紹

要在 Java 代碼中實現相似 SQL 中的 GroupBy 分組聚合運算，是比較繁瑣的，一般先要聲明數據結構（Java 實體類），而後用 Java 集合進行循環遍歷，最後根據分組條件添加到某個子集合中。Java 8 有了 Lambda（stream）代碼簡潔了許多，分組後每每還要跟着聚合操做，仍然須要單寫聚合函數 sum(),count(*),topN()等。這些還都是最常規的分組和聚合運算，遇到對位分組、枚舉分組、多重分組等很是規分組加上其餘彙集函數 (FIRST，LAST…)，代碼就變得很是冗長且不通用。若是能有一箇中間件專門負責這類計算，採用相似 SQL 腳本作算法描述，在 Java 中直接調用腳本並返回結果集就行了。Java 版集算器和 SPL 腳本，就是這樣的機制，下面舉例說明如何使用。算法

SPL 實現

常規分組

duty.xlsx 文件中保存着每一個人的加班記錄:數據庫

彙總每一個人的值班天數：服務器

保存腳本文件CountName.dfx(嵌入 Java 會用到)數據結構

每組 TopN

取每月、每一個人、頭三天的加班記錄函數

保存腳本文件RecMonTop3.dfx(嵌入 Java 會用到)ui

Java 調用

SPL 嵌入到 Java 應用程序十分方便，經過 JDBC 調用存儲過程方法加載，用常規分組保存的文件CountName.dfx，示例調用以下：spa

...
 Connection con = null;
 Class.forName("com.esproc.jdbc.InternalDriver");
 con= DriverManager.getConnection("jdbc:esproc:local://");
//調用存儲過程，其中CountName是dfx的文件名
 st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call CountName()");
 //執行存儲過程
 st.execute();
 //獲取結果集
 ResultSet rs = st.getResultSet();
... 

...
 Connection con = null;
 Class.forName("com.esproc.jdbc.InternalDriver");
 con= DriverManager.getConnection("jdbc:esproc:local://");
//調用存儲過程，其中CountName是dfx的文件名
 st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call CountName()");
 //執行存儲過程
 st.execute();
 //獲取結果集
 ResultSet rs = st.getResultSet();
...

替換成 RecMonTop3.dfx 是一樣的道理，只需 call RecMonTop3() 便可，也可同時返回兩個結果集。這裏只用 Java 片斷粗略解釋瞭如何嵌入 SPL，詳細步驟請參閱 Java 如何調用 SPL 腳本，也很是簡單，再也不贅述。同時，SPL 也支持 ODBC 驅動，集成到支持 ODBC 的語言，嵌入過程相似。3d

拓展節選

以前沒有相關的總結，其實關於數據分組，細分起來其實還有不少種，對位分組、枚舉分組、多重分組…，在乾學院 SPL 官方論壇都有總結和示例，這裏節選其中兩種。code

SPL 對位分組

示例 1：按順序分別列出使用 Chinese、English、French 做爲官方語言的國家數量中間件

MySQL8:
with t(name,ord) as (select 'Chinese',1
union all select 'English',2
union all select 'French',3)
select t.name, count(countrycode) cnt
from t left join world.countrylanguage s on t.name=s.language
where s.isofficial='T'
group by name,ord
order by ord; 

MySQL8:
with t(name,ord) as (select 'Chinese',1
union all select 'English',2
union all select 'French',3)
select t.name, count(countrycode) cnt
from t left join world.countrylanguage s on t.name=s.language
where s.isofficial='T'
group by name,ord
order by ord;

注意：表的字符集和數據庫會話的字符集要保持一致。

(1) show variables like ’character_set_connection’查看當前會話字符集

(2) show create table world.countrylanguage 查看錶的字符集

(3) set character_set_connection=[字符集] 更新當前會話字符集

集算器 SPL:

A1: 鏈接數據庫

A2: 查詢出全部官方語言的記錄

A3: 須要列出的語言

A4: 將全部記錄按 Language 對位到 A3 相應位置

A5: 構造以語言和使用此語言爲官方語言的國家數量的序表

示例 2：按順序分別列出使用 Chinese、English、French 及其它語言做爲官方語言的國家數量

MySQL8:
with t(name,ord) as (select 'Chinese',1 union all select 'English',2
union all select 'French',3 union all select 'Other', 4),
s(name, cnt) as (
select language, count(countrycode) cnt
from world.countrylanguage s
where s.isofficial='T' and language in ('Chinese','English','French')
group by language
union all
select 'Other', count(distinct countrycode) cnt
from world.countrylanguage s
where isofficial='T' and language not in ('Chinese','English','French')
)
select t.name, s.cnt
from t left join s using (name)
order by t.ord; 

MySQL8:
with t(name,ord) as (select 'Chinese',1 union all select 'English',2
union all select 'French',3 union all select 'Other', 4),
s(name, cnt) as (
select language, count(countrycode) cnt
from world.countrylanguage s
where s.isofficial='T' and language in ('Chinese','English','French')
group by language
union all
select 'Other', count(distinct countrycode) cnt
from world.countrylanguage s
where isofficial='T' and language not in ('Chinese','English','French')
)
select t.name, s.cnt
from t left join s using (name)
order by t.ord;

集算器 SPL:

A4: 將全部記錄按 Language 對位到 A3.to(3) 相應位置，並追加一組用於存放不能對位的記錄

A5: 第 4 組計算不一樣 CountryCode 的數量

SPL 枚舉分組

示例 1：按順序列出各種型城市的數量

MySQL8:
with t as (select * from world.city where CountryCode='CHN'),
segment(class,start,end) as (select 'tiny', 0, 200000
union all select 'small',  200000, 1000000
union all select 'medium', 1000000, 2000000
union all select 'big', 2000000, 100000000
)
select class, count(1) cnt
from segment s join t on t.population>=s.start and t.population<s.end
group by class, start
order by start; 

MySQL8:
with t as (select * from world.city where CountryCode='CHN'),
segment(class,start,end) as (select 'tiny', 0, 200000
union all select 'small',  200000, 1000000
union all select 'medium', 1000000, 2000000
union all select 'big', 2000000, 100000000
)
select class, count(1) cnt
from segment s join t on t.population>=s.start and t.population<s.end
group by class, start
order by start;

集算器 SPL:

A3: ${…} 宏替換，以大括號內表達式的結果做爲新表達式進行計算，結果爲序列 [「?<200000」,「?<1000000」,「?<2000000」,「?<100000000」]

A5: 針對 A2 中每條記錄，尋找 A3 中第 1 個成立的條件，並追加到對應的組中

示例 2：列出華東地區大型城市數量、其它地區大型城市數量、非大型城市數量

MySQL8:
with t as (select * from world.city where CountryCode='CHN')
select 'East&Big' class, count(*) cnt
from t
where population>=2000000
and district in ('Shanghai','Jiangshu', 'Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Other&Big', count(*)
from t
where population>=2000000
and district not in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Not Big', count(*)
from t
where population<2000000; 

MySQL8:
with t as (select * from world.city where CountryCode='CHN')
select 'East&Big' class, count(*) cnt
from t
where population>=2000000
and district in ('Shanghai','Jiangshu', 'Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Other&Big', count(*)
from t
where population>=2000000
and district not in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Not Big', count(*)
from t
where population<2000000;

集算器 SPL:

A5: enum@n 將不知足 A4 中全部條件的記錄存放到追加的最後一組中

示例 3：列出全部地區大型城市數量、華東地區大型城市數量、非大型城市數量

MySQL8:
with t as (select * from world.city where CountryCode='CHN')
select 'Big' class, count(*) cnt
from t
where population>=2000000
union all
select 'East&Big' class, count(*) cnt
from t
where population>=2000000
and district in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Not Big' class, count(*) cnt
from t
where population<2000000; 

MySQL8:
with t as (select * from world.city where CountryCode='CHN')
select 'Big' class, count(*) cnt
from t
where population>=2000000
union all
select 'East&Big' class, count(*) cnt
from t
where population>=2000000
and district in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Not Big' class, count(*) cnt
from t
where population<2000000;

集算器 SPL:

A6: 若 A2 中記錄知足 A4 中多個條件時，enum@r 會將其追加到對應的每一個組中

優點總結

有庫寫 SQL，沒庫寫 SPL
用 Java 程序直接彙總計算數據，仍是比較累的，代碼很長，而且不可複用，不少狀況數據也不在數據庫裏，有了 SPL，就能像在 Java 中用 SQL 同樣了，十分方便。
經常使用無憂，不花錢就能取得終身使用權的入門版
若是要分析的數據是一次性或臨時性的，潤乾集算器每月都提供免費試用受權，能夠循環無償使用。但要和 Java 應用程序集成起來部署到服務器上長期使用，按期更換試用受權仍是比較麻煩，潤乾提供了有終身使用權的入門版，解決了這個後顧之憂，得到方式參考如何無償使用潤乾集算器？
技術文檔和社區支持
官方提供的集算器技術文檔自己就有不少現成的例子，常規問題從文檔裏都能找到解決方法。若是得到了入門版，不只可以使用 SPL 的常規功能，碰到任何問題均可以去乾學院上去諮詢，官方經過該社區對入門版用戶提供免費的技術支持。