hive-jdbc獲取查詢日誌慢的問題發現與解決

時間 2019-12-11

標籤 hive jdbc 獲取查詢日誌問題發現解決欄目 Hadoop 简体版

原文原文鏈接

一、問題描述：

公司的數據平臺的HiveJdbc查詢一直有一個問題，就是日誌獲取太慢了，有時候差很少和結果一塊兒出來的，這就很是影響用戶的體驗，半天都沒任何輸出。另外一個是Beeline客戶端不一致，beeline客戶端每次都能很快的獲取日誌。
這裏首先咱們普及一個經驗就是第一批日誌獲取的快慢，很是影響用戶體驗。若是第一批日誌來的快，用戶能夠確認任務已經開始跑了，MR的JobId也會返回。日誌的示例以下：java

INFO  : Compiling command(queryId=app_20180412185224_ebd3d373-31bb-430b-9daf-44f01049a9d4): select count(*) from ods.team
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=app_20180412185224_ebd3d373-31bb-430b-9daf-44f01049a9d4); Time taken: 0.057 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=app_20180412185224_ebd3d373-31bb-430b-9daf-44f01049a9d4): select count(*) from ods.team
INFO  : Query ID = app_20180412185224_ebd3d373-31bb-430b-9daf-44f01049a9d4
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:3
INFO  : Submitting tokens for job: job_1523499276700_0219

hive-jdbc查詢後端的流程（hive-jdbc版本2.1.0）：

一、啓動一個線程T1，T1用於阻塞獲取結果，T1還用於啓動日誌獲取線程T2
二、T2啓動獲取Hive的查詢日誌
詳細的代碼請看下面這個DEMO， T1就是main線程獲取hive日誌的結果，T2就是LogTask獲取hive的查詢日誌sql

public class HiveJdbcQueryLog {
  public static void main(String[] args) throws Exception {
      Class.forName("org.apache.hive.jdbc.HiveDriver");
      Connection connection = DriverManager.getConnection("jdbc:hive2://hive-server0:10000", "app", "");
      HiveStatement stmt = (HiveStatement) connection.createStatement();

      String sql = "select count(*) from table";

      try {
          Thread logThread = new Thread(new LogTask(stmt));
          logThread.setDaemon(true);
          logThread.start();
          boolean hasResult = stmt.execute(sql);
          logThread.interrupt();
          if (hasResult) {
              ResultSet resultSet = stmt.getResultSet();
              ResultSetMetaData metaData = resultSet.getMetaData();
              int colCount = metaData.getColumnCount();
              List<String> res = Lists.newArrayList();
              while (resultSet.next()) {
                  List<String> row = Lists.newArrayList();
                  for (int i = 1; i <= colCount; i++) {
                      row.add(resultSet.getString(i));
                  }
                  res.addAll(row);
              }
              System.out.println(JSON.toJSONString(res));
          } else {
              stmt.getUpdateCount();
          }
      } catch (Exception ex) {
          ex.printStackTrace();
      }finally {
          stmt.close();
          connection.close();
      }
  }

  public static class LogTask implements Runnable {
      private HiveStatement stmt;
      public LogTask(HiveStatement stmt) {
          this.stmt = stmt;
      }
      @Override
      public void run() {
          try {
              while (stmt.hasMoreLogs()) {
                  try {
                      for (String line : stmt.getQueryLog(true, 1000)) {
                          System.out.println(line);
                      }
                      Thread.sleep(200);
                  } catch (SQLException e) {
                      e.printStackTrace();
                  }
              }
          } catch (Exception e) {
              e.printStackTrace();
          }
      }
  }
}

幾個比較奇怪的現象：

一、在數據平臺執行查詢任務，有時候獲取hive日誌也能很快，可是大部分的時候都很慢
二、當我把DEMO裏面的獲取日誌的線程，也就是上述的T2線程（LogTask），裏面的Sleep去掉時，日誌能比較快的返回。可是HiveServer2上的日誌會被打爆，由於不Sleep會獲取幾十萬次，HiveServer2容易宕機。apache

2個懷疑的點：

1）有多是HiveServer2堵塞，負載高或者日誌文件很大，read IO須要很長時間
2）網絡問題，HiveJdbc和HiveServer2之間是使用Thrift通訊的，有可能網絡緣由，致使很慢。後端

二、排查過程

排查過程能夠說是很是曲折，剛開始中間試過打日誌，用Greys看耗時等等，可是一直沒什麼結果，直到最近一次排查纔有突破！！安全

1）首先比較比較快速的是在DP服務器上用Greys查看了HiveStatement.getQueryLog方法耗時，確認第一批日誌獲取有快有慢的現象，有的時候是5S左右就有返回，有的時候須要15-20S左右纔有返回。結果以下：服務器

// 一次很慢的日誌獲取的，結果出來了，日誌纔出來
`---+Tracing for : thread_name="Thread-78" thread_id=0x84;is_daemon=true;priority=5;
    `---+[17331,17331ms]org.apache.hive.jdbc.HiveStatement:getQueryLog()
        +---[0,0ms]org.apache.hive.jdbc.HiveStatement:checkConnection(@858)
        +---[0,0ms]java.util.ArrayList:<init>(@864)
        +---[0,0ms]org.apache.hive.jdbc.HiveStatement:getFetchOrientation(@868)
        +---[0,0ms]org.apache.hive.service.rpc.thrift.TFetchResultsReq:<init>(@868)
        +---[0,0ms]org.apache.hive.service.rpc.thrift.TFetchResultsReq:setFetchType(@870)
        +---[17330,17330ms]org.apache.hive.service.rpc.thrift.TCLIService$Iface:FetchResults(@871)
        +---[17330,0ms]org.apache.hive.service.rpc.thrift.TFetchResultsResp:getStatus(@872)
        +---[17330,0ms]org.apache.hive.jdbc.Utils:verifySuccessWithInfo(@872)
        +---[17330,0ms]org.apache.hive.service.rpc.thrift.TFetchResultsResp:getResults(@895)
        +---[17330,0ms]org.apache.hive.jdbc.HiveConnection:getProtocol(@895)
        +---[17330,0ms]org.apache.hive.service.cli.RowSetFactory:create(@895)
        +---[17330,0ms]org.apache.hive.service.cli.RowSet:iterator(@896)


// 一次正常的日誌獲取，大概點擊執行以後5s有了數據（5s大概是一次RPC耗時）
`---+Tracing for : thread_name="Thread-71" thread_id=0x79;is_daemon=true;priority=5;
    `---+[4557,4557ms]org.apache.hive.jdbc.HiveStatement:getQueryLog()
        +---[0,0ms]org.apache.hive.jdbc.HiveStatement:checkConnection(@858)
        +---[0,0ms]java.util.ArrayList:<init>(@864)
        +---[0,0ms]org.apache.hive.jdbc.HiveStatement:getFetchOrientation(@868)
        +---[0,0ms]org.apache.hive.service.rpc.thrift.TFetchResultsReq:<init>(@868)
        +---[0,0ms]org.apache.hive.service.rpc.thrift.TFetchResultsReq:setFetchType(@870)
        +---[4554,4554ms]org.apache.hive.service.rpc.thrift.TCLIService$Iface:FetchResults(@871)
        +---[4554,0ms]org.apache.hive.service.rpc.thrift.TFetchResultsResp:getStatus(@872)
        +---[4554,0ms]org.apache.hive.jdbc.Utils:verifySuccessWithInfo(@872)
        +---[4554,0ms]org.apache.hive.service.rpc.thrift.TFetchResultsResp:getResults(@895)
        +---[4554,0ms]org.apache.hive.jdbc.HiveConnection:getProtocol(@895)
        +---[4554,0ms]org.apache.hive.service.cli.RowSetFactory:create(@895)
        +---[4555,0ms]org.apache.hive.service.cli.RowSet:iterator(@896)

在調用hive-jdbc的client端，用Greys查詢看的耗時都在TCLIService$Iface:FetchResults方法裏面。看到這個信息的時候，確定第一反應是RPC獲取結果耗時在HiveServer端，要麼是在網絡上。這就是上面我基本懷疑的2個點。很惋惜，這2個懷疑的點最後都被證實是錯的！！！！網絡

2）而後在HiveServer端和Client端，分別開起了tcpdump監聽網絡事件，而後在T2線程（Hive日誌獲取的線程）上也打上了queryLog，開始獲取log的時間，第一次收到queryLog的時間等等。開始debug！！！
不得不說tcpdump日誌須要比對時間很是麻煩，下面我就直接貼出來我整理的事件時間軸：
多線程

上圖能夠看出來，時間上HiveServer端收到FetchResults事件已是19s以後了，更奇怪的是dp端經過tcpdump能夠看到發出fetchResult事件也在19S以後了。因此基本排除了HiveServer的耗時操做的可能性，還有網絡超時的可能性。
基本就能夠肯定了，就是Client端的hive-jdbc的問題，那麼到底DP端這段時間幹了什麼事情，還沒rpc請求就要耗時19S麼？？？？雷政富tm才12s啊！！app

三、翻看hive-jdbc源碼

HiveStatement和HiveConnectiontcp

// HiveStatement
public List<String> getQueryLog(boolean incremental, int fetchSize)
        throws SQLException, ClosedOrCancelledStatementException {
    List<String> logs = new ArrayList<String>();
    TFetchResultsResp tFetchResultsResp = null;
    try {
        if (stmtHandle != null) {
            TFetchResultsReq tFetchResultsReq = new TFetchResultsReq(stmtHandle,
                    getFetchOrientation(incremental), fetchSize);
            tFetchResultsReq.setFetchType((short)1);
            tFetchResultsResp = client.FetchResults(tFetchResultsReq);
            Utils.verifySuccessWithInfo(tFetchResultsResp.getStatus());
        } else {
      ...........
    } catch (SQLException e) {
        throw e;
    } catch (Exception e) {
        throw new SQLException("Error when getting query log: " + e, e);
    }
  ..........
    return logs;
}

client.FetchResults方法完整的是 TCLIService.Iface.FetchResults方法，是一個Thrift接口方法。不過debug過程發現client 實際是對TCLIService.Iface的一個JDK動態代理。這樣作是爲了能對client對象作同步管理，代碼以下:

// HiveConnection
public static TCLIService.Iface newSynchronizedClient(
      TCLIService.Iface client) {
    return (TCLIService.Iface) Proxy.newProxyInstance(
        HiveConnection.class.getClassLoader(),
      new Class [] { TCLIService.Iface.class },
      new SynchronizedHandler(client));
}
private static class SynchronizedHandler implements InvocationHandler {
  private final TCLIService.Iface client;

  SynchronizedHandler(TCLIService.Iface client) {
    this.client = client;
  }

  @Override
  public Object invoke(Object proxy, Method method, Object [] args)
      throws Throwable {
    try {
      synchronized (client) {
        return method.invoke(client, args);
      }
    } catch (InvocationTargetException e) {
      .......
    } catch (Exception e) {
      throw new TException("Error in calling method " + method.getName(), e);
    }
  }
}

看到這裏突然明白，頗有多是這19S都在這裏等鎖了！！！

由於在結果獲取的線程T1裏面也須要使用client進行RPC調用，獲取狀態client.GetOperationStatus(statusReq)，因此這裏就存在鎖競爭問題啊，並且hive結果獲取的線程T1調用waitForOperationToComplete方法時，獲取狀態client.GetOperationStatus並不Sleep （T2線程獲取日誌是Sleep的），因此T1(Main)線程極可能會始終佔用client的鎖，致使T2(LogTask)線程獲取不到鎖。只有當T1(Main)線程獲取到告終果才釋放鎖，T2(LogTask)線程才能獲取到鎖，而後獲取日誌，就形成告終果出來了才能看到日誌的現象

void waitForOperationToComplete() throws SQLException {
    TGetOperationStatusReq statusReq = new TGetOperationStatusReq(stmtHandle);
    TGetOperationStatusResp statusResp;
    while (!isOperationComplete) {
        try {
            statusResp = client.GetOperationStatus(statusReq);
            Utils.verifySuccessWithInfo(statusResp.getStatus());
            if (statusResp.isSetOperationState()) {
                switch (statusResp.getOperationState()) {
                case CLOSED_STATE:
                case FINISHED_STATE:
                    isOperationComplete = true;
                    isLogBeingGenerated = false;
                    break;
                    .......
            }
        } catch (SQLException e) {
            .......
        } catch (Exception e) {
            .......
        }
    }
}

四、總結

這確定是一個bug，找了一下Hive的jira果真已經有人提出了用ReentrantLock的公平鎖來作同步，Jira地址。ReentrantLock的公平鎖保證了T1(Main)線程用完client以後，T2(LogTask)線程是等的時間最長的，T2(LogTask)線程能得到鎖！從而保證T2(LogTask)線程獲取日誌不會在有結果以後才獲取到！！
ReentrantLock的公平鎖是一個解決方案，可是若是有些場景下查詢在同一個連接上執行的SQL都是順序的，可不能夠不要加鎖了呢？
答案是不能夠！！！
由於thrift就是垃圾，client是線程不安全的，看這個Patch就知道了，任何想多線程操做client的，都要用同步鎖或者同步代碼塊

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。