spark 經過 jdbc 寫入 clickhouse 須要注意的點

最近在用 spark 經過 jdbc 寫入 clickhouse 的時候,遇到一些坑,這裏分享下,造福人民羣衆。html

一個 WARNjava

WARN JdbcUtils: Requested isolation level 1, but transactions are unsupported

這是由於 clickhouse 不支持事務形成的,解決方案,jdbc 加入 isolationLevel 等於 NONE 的選項,isolationLevel 詳解sql

The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.apache

一個報錯架構

merges are processing significantly slower than inserts

這是由於 spark 多個 partition 同時併發寫引起的錯誤,解決方案 jdbc 加入 numPartitions 等於 1 的選項控制併發數,numPartitions 詳解併發

The maximum number of partitions that can be used for parallelism in table reading and writing. This also determines the maximum number of concurrent JDBC connections. If the number of partitions to write exceeds this limit, we decrease it to this limit by calling coalesce(numPartitions) before writing.app

完整 scala 代碼this

spark.createDataFrame(data)
      .write
      .mode(SaveMode.Append)
      .option("batchsize", "50000")
      .option("isolationLevel", "NONE") // 設置事務
      .option("numPartitions", "1") // 設置併發
      .jdbc(dbUrl,
        "table",
        dbProp)

更多 spark jdbc 選項,參考 spark 官方文檔 更多架構、PHP、GO相關踩坑實踐技巧請關注個人公衆號 spa

相關文章
相關標籤/搜索