spark 2.1.1html
beeline鏈接spark thrift以後,執行use database有時會卡住,而use database 在server端對應的是 setCurrentDatabase,java
通過排查發現當時spark thrift正在執行insert操做,sql
org.apache.spark.sql.hive.execution.InsertIntoHiveTableapache
protected override def doExecute(): RDD[InternalRow] = { sqlContext.sparkContext.parallelize(sideEffectResult.asInstanceOf[Seq[InternalRow]], 1) } ... @transient private val externalCatalog = sqlContext.sharedState.externalCatalog protected[sql] lazy val sideEffectResult: Seq[InternalRow] = { ... externalCatalog.loadDynamicPartitions( externalCatalog.getPartitionOption( externalCatalog.loadPartition( externalCatalog.loadTable(
可見insert操做中可能會調用loadDynamicPartitions、getPartitionOption、loadPartition、loadTable等方法,ide
org.apache.spark.sql.hive.client.HiveClientImpl大數據
def loadTable( loadPath: String, // TODO URI tableName: String, replace: Boolean, holdDDLTime: Boolean): Unit = withHiveState { ... def loadPartition( loadPath: String, dbName: String, tableName: String, partSpec: java.util.LinkedHashMap[String, String], replace: Boolean, holdDDLTime: Boolean, inheritTableSpecs: Boolean): Unit = withHiveState { ... override def setCurrentDatabase(databaseName: String): Unit = withHiveState {
而HiveClientImpl中對應的方法都會執行withHiveState,而withHiveState有synchronized,因此insert操做中的部分代碼(好比loadPartition)和use database操做會被同步執行,當insert執行很慢時就會卡住全部的其餘操做;spa
spark thrift中實現原理詳見 http://www.javashuo.com/article/p-oebtjwii-de.htmlcode