報錯信息以下:java
Diagnostics report from attempt_1479210500211_159364_m_000003_0: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) ... 8 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 19 at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.setBatch(VectorExtractRow.java:706) at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRowDynBatch.setBatchOnEntry(VectorExtractRowDynBatch.java:34) at org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:89) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:117) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:164) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45) ... 9 more
sql內容:sql
select device_id,
row_number() over(partition by device_id order by action_timestamp) cn
from edw_log.dwd_esf_edw_service_log_di
where dt = '20160519'express
sql執行計劃:apache
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: dwd_esf_edw_service_log_di filterExpr: (dt = '20160519') (type: boolean) Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: device_id (type: string), action_timestamp (type: string) sort order: ++ Map-reduce partition columns: device_id (type: string) Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE Execution mode: vectorized Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: string) outputColumnNames: _col0, _col8 Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE PTF Operator Function definitions: Input definition input alias: ptf_0 output shape: _col0: string, _col8: string type: WINDOWING Windowing table definition input alias: ptf_1 name: windowingtablefunction order by: _col8 partition by: _col0 raw input shape: window functions: window function definition alias: row_number_window_0 name: row_number window function: GenericUDAFRowNumberEvaluator window frame: PRECEDING(MAX)~FOLLOWING(MAX) isPivotResult: true Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), row_number_window_0 (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
分析執行計劃,發現存在矢量化查詢模式,以下圖:app
Execution mode: vectorized
其餘都很正常,問題可能出如今這裏,並且報錯信息也有不少和矢量化查詢有關oop
經過參數調整,關閉矢量化查詢功能lua
set hive.vectorized.execution.enabled=false;orm
再次查看執行計劃,執行計劃發生變化,sql也能夠正常執行了hadoop
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: dwd_esf_edw_service_log_di filterExpr: (dt = '20160519') (type: boolean) Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: device_id (type: string), action_timestamp (type: string) sort order: ++ Map-reduce partition columns: device_id (type: string) Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: string) outputColumnNames: _col0, _col8 Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE PTF Operator Function definitions: Input definition input alias: ptf_0 output shape: _col0: string, _col8: string type: WINDOWING Windowing table definition input alias: ptf_1 name: windowingtablefunction order by: _col8 partition by: _col0 raw input shape: window functions: window function definition alias: row_number_window_0 name: row_number window function: GenericUDAFRowNumberEvaluator window frame: PRECEDING(MAX)~FOLLOWING(MAX) isPivotResult: true Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), row_number_window_0 (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 5578978 Data size: 7566973806 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
具體緣由懷疑是在矢量化過程當中,出現null越界產生錯誤,具體須要驗證input
參考網址
https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution