hive0.13 mapjoin hashtable找不到的bug

線上job報錯:
java

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: /home/vipshop/hard_disk/1/yarn/local/usercache/hdfs/appcache/application_1420458339569_0548/container_1420458339569_0548_01_000005/Stage-5.tar.gz/MapJoin-mapfile12--.hashtable (No such file or directory)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:160)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:155)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: /home/vipshop/hard_disk/1/yarn/local/usercache/hdfs/appcache/application_1420458339569_0548/container_1420458339569_0548_01_000005/Stage-5.tar.gz/MapJoin-mapfile12--.hashtable (No such file or directory)
        at org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(HashTableLoader.java:104)
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:152)
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:178)
        at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1029)
        at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1033)
        at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1033)
        at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1033)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:505)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
        ... 8 more
Caused by: java.io.FileNotFoundException: /home/vipshop/hard_disk/1/yarn/local/usercache/hdfs/appcache/application_1420458339569_0548/container_1420458339569_0548_01_000005/Stage-5.tar.gz/MapJoin-mapfile12--.hashtable (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at java.io.FileInputStream.<init>(FileInputStream.java:101)
        at org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(HashTableLoader.java:95)
        ... 16 more

這實際上是mapjoin的一個bug,mapjoin時會經過小表生成hashtable,而後放到distributecache中,後面的task會經過distributecache下載到本地使用。
這裏是因爲job含有兩個mapjoin可是在HashTableSinkOperator中只生成了第一個hashtable,致使在HashTableLoader中進行load hashtable時報錯。  
bug觸發條件:
1.兩個以上的mapjoin
2.其中一個表爲空
Bugid:
https://issues.apache.org/jira/browse/HIVE-6913
這個bug hive0.14已經fix
解決方法:
apache

./ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
Operator<? extends OperatorDesc> forwardOp = work.getAliasToWork().get(alias);
if (fetchOp.isEmptyTable()) {
  //generate empty hashtable for empty table
  this.generateDummyHashTable(alias, bigTableBucket);
  forwardOp.close(false);
  continue;
}

關於mapjoin的整個流程和觸發條件放在後面寫。
app

相關文章
相關標籤/搜索