此文已由做者嶽猛受權網易雲社區發佈。
html
歡迎訪問網易雲社區,瞭解更多網易技術產品運營經驗。sql
實時計算將來會成爲一個趨勢,基本上全部的離線計算任務都能經過實時計算來完成,對於實時計算來算,除了性能,延遲性和吞吐量這些硬指標要求之外,我以爲易用性上面應該是將來的一個發展方向,畢竟如今的實時計算入storm,flink,sparkstreaming等都是經過API來進行的,這些使用起來都不太方便,後續更大的一個側重方向應該是SQL ON STREAMING,對storm瞭解不是不少,可是有些公司已經針對storm進行了sql封裝,下面只想談下兩個比較流行的開源流計算引擎對SQL的封裝粒度。apache
SQL on Streaming Tables安全
code examplesapp
val env = StreamExecutionEnvironment.getExecutionEnvironment val tEnv = TableEnvironment.getTableEnvironment(env)// read a DataStream from an external sourceval ds: DataStream[(Long, String, Integer)] = env.addSource(...)// register the DataStream under the name "Orders"tableEnv.registerDataStream("Orders", ds, 'user, 'product, 'amount)// run a SQL query on the Table and retrieve the result as a new Table val result = tableEnv.sql( "SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'")
1.2版本 只支持SELECT, FROM, WHERE, and UNION,不支持聚合,join操做,感受離真正的使用仍是有一段距離要走。
code examplesoop
import org.apache.spark.sql.functions._import org.apache.spark.sql.SparkSession val input = spark.readStream.text("file:///home/hadoop/data/") val words = input.as[String].flatMap(_.split(" ")) val wordCounts = words.groupBy("value").count() val query = wordCounts.writeStream.outputMode("complete").format("console").start query.awaitTermination
output mode只實現了兩種,且有限制性能
Append mode (default) This is the default mode, where only the new rows added to the result table since the last trigger will be outputted to the sink. This is only applicable to queries that do not have any aggregations (e.g. queries with only select, where, map, flatMap, filter,join, etc.). Complete mode The whole result table will be outputted to the sink.This is only applicable to queries that have aggregations
不支持update模式spa
連接:https://www.jianshu.com/p/9a9f8675bb3ecode
更多網易技術、產品、運營經驗分享請點擊。
相關文章:
【推薦】 關於評審--從思想到落地