最近在用 spark 經過 jdbc 寫入 clickhouse 的時候,遇到一些坑,這裏分享下,造福人民羣衆。html
一個 WARNjava
WARN JdbcUtils: Requested isolation level 1, but transactions are unsupported
這是由於 clickhouse 不支持事務形成的,解決方案,jdbc 加入 isolationLevel 等於 NONE 的選項,isolationLevel 詳解sql
The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.apache
merges are processing significantly slower than inserts
這是由於 spark 多個 partition 同時併發寫引起的錯誤,解決方案 jdbc 加入 numPartitions 等於 1 的選項控制併發數,numPartitions 詳解併發
The maximum number of partitions that can be used for parallelism in table reading and writing. This also determines the maximum number of concurrent JDBC connections. If the number of partitions to write exceeds this limit, we decrease it to this limit by calling coalesce(numPartitions) before writing.app
完整 scala 代碼this
spark.createDataFrame(data) .write .mode(SaveMode.Append) .option("batchsize", "50000") .option("isolationLevel", "NONE") // 設置事務 .option("numPartitions", "1") // 設置併發 .jdbc(dbUrl, "table", dbProp)
更多 spark jdbc 選項,參考 spark 官方文檔