開發交流QQ羣: 941879291html
SQLflow based on python development, support to Spark, as the underlying distributed computing engine, through a set of unified configuration file to complete the batch, flow calculation, the Rest service development.前端
主頁:
<div align="center">
<img src="https://upload-images.jianshu...; alt="SQLflow Logo" width="500px"></img>
</div>
結果頁:
<div align="center">
<img src="https://upload-images.jianshu...; alt="SQLflow Logo" width="500px"></img>
</div>python
SQLflow 基於python開發, 支持經過寫sql的方式操做分佈式集羣, 數據處理, 機器學習、深度學習模型訓練, 模型部署, 分佈式爬蟲, 數據可視化等。git
python3.6github
git clone https://github.com/lqkweb/sql...web
pip install -r requirements.txtajax
python manage.py算法
主頁:http://127.0.0.1:5000
腳本頁面:http://127.0.0.1:5000/script
單sql頁面:http://127.0.0.1:5000/sql sql
【注意:一、下載apache spark文件配置manage.py中的SPARK_HOME路徑。二、data.csv是放到sqlflow/data目錄中】apache
在腳本執行頁面:http://127.0.0.1:5000/script 輸入 select from A limit 3; 或者 select from A limit 3 as B; 生成臨時表A或者B
生成臨時表A數據:
select * from A limit 3;
生成臨時表B數據:
select * from A limit 3 as B;
打開單sql執行頁面:http://127.0.0.1:5000/sql, 直接就能夠用spark sql任意語法操做數據表A和數據表B了:
desc A select * from A limit 2 select * from B limit 2
[注] "as B" 至關於建立了一個 B 臨時表。
一個簡單的sql操做spark集羣的Demo,是否是很簡單。
[附] sparksql doc: https://spark.apache.org/docs...