【原創】大叔經驗分享（12）如何程序化kill提交到spark thrift上的sql

時間 2019-11-21

標籤原創經驗分享如何程序 kill 提交 spark thrift sql 欄目 Spark 简体版

原文原文鏈接

spark 2.1.1sql

hive正在執行中的sql能夠很容易的停止，由於能夠從console輸出中拿到當前在yarn上的application id，而後就能夠kill任務，apache

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20181218163113_65da7e1f-b4b8-4cb8-86cc-236c37aea682
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1544593827645_9409, Tracking URL = http://rm1:8088/proxy/application_1544593827645_9409/
Kill Command = /export/App/hadoop-2.6.1/bin/hadoop job -kill job_1544593827645_9409session

可是相同的sql，提交到spark thrift以後，想kill就沒那麼容易了，須要到spark thrift的頁面手工找到那個sql而後kill對應的job：app

1 找到sqldom

2 kill對應的jobcurl

注意到spark thrift頁面還能夠查看當前全部session，ide

而且能夠查看一個session中全部執行job的狀況，oop

若是可以每次鏈接spark thrift時記下當前的session id，就能夠經過session id找到當前session正在執行的job，查看代碼發現，只須要增長一行便可this

org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperationurl

  private def execute(): Unit = {
    statementId = UUID.randomUUID().toString
    logInfo(s"Running query '$statement' with $statementId")

    //modify here
    this.operationLog.writeOperationLog("session id : " + this.getParentSession.getSessionState.getSessionId)

    setState(OperationState.RUNNING)

修改後從新打包，用beeline鏈接spark thrift執行sql效果以下：

0: jdbc:hive2://spark_thrift:11111> select * from test_table;
session id : 0bc63382-a54a-41f8-8c2e-0323f4ebbde6
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.277 seconds)

經過session id找到job id後，就能夠經過url來kill job