hive報lzo Premature EOF from inputStream錯誤

今天dw組同事發郵件說有一個問題讓幫解決一下。他們本身沒能搞得定。下面問題解決過程:html

一、hqljava

insert overwrite table mds_prod_silent_atten_user partition (dt=20141110) select uid, host, atten_time from (select uid, host, atten_time from (select case when t2.uid is null then t1.uid else t2.uid end uid, case when t2.uid is null and t2.host is null then t1.host else t2.host end host, case when t2.atten_time is null or t1.atten_time > t2.atten_time then t1.atten_time else t2.atten_time end atten_time from (select uid, findid(extend,'uids') host, dt atten_time, sum(case when (mode = '1' or mode = '3') then 1 else -1 end) num from ods_bhv_tblog where behavior = '14000076' and dt = '20141115' and (mode = '1' or mode = '3' or mode = '2') and status = '1' group by uid,findid(extend,'uids'),dt) t1 full outer join (select uid, attened_uid host, atten_time from mds_prod_silent_atten_user where dt='20141114') t2 on t1.uid = t2.uid and t1.host = t2.host where t1.uid is null or t1.num > 0) t3 union all select t5.uid, t5.host, t5.atten_time from (select uid, host, atten_time from (select uid, findid(extend,'uids') host, dt atten_time, sum(case when (mode = '1' or mode = '3') then 1 else -1 end) num from ods_bhv_tblog where behavior = '14000076' and dt = '20141115' and (mode = '1' or mode = '3' or mode = '2') and status = '1' group by uid,findid(extend,'uids'),dt) t4 where num = 0) t5 join (select uid, attened_uid host, atten_time from mds_prod_silent_atten_user where dt='20141114') t6 on t6.uid = t5.uid and t6.host = t5.host) t7


以上是詳細出錯的hql。看着很是複雜,事實上邏輯比較簡單,僅僅涉及到兩個表的關聯: mds_prod_silent_atten_user和 ods_bhv_tblog。

二、報錯日誌:express

Error: java.io.IOException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:302)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:249)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:363)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:591)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1550)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:288)
	... 11 more
Caused by: java.io.EOFException: Premature EOF from inputStream
	at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
	at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
	at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54)
	at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
	at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.<init>(RCFile.java:667)
	at org.apache.hadoop.hive.ql.io.RCFile$Reader.<init>(RCFile.java:1431)
	at org.apache.hadoop.hive.ql.io.RCFile$Reader.<init>(RCFile.java:1342)
	at org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileBlockMergeRecordReader.<init>(RCFileBlockMergeRecordReader.java:46)
	at org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileBlockMergeInputFormat.getRecordReader(RCFileBlockMergeInputFormat.java:38)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
	... 16 more

日誌顯示,在使用LZO進行壓縮時出現Premature EOF from inputStream錯誤,該錯誤出現在stage-3apache

三、stage-3的運行計劃信息例如如下:app

Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            Union
              Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.RCFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.RCFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
                      name: default.mds_prod_silent_atten_user
          TableScan
            Union
              Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.RCFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.RCFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
                      name: default.mds_prod_silent_atten_user

stage-3僅僅有map。沒有reduce,而且map階段僅僅是簡單的進行union,看不錯有什麼特殊的地方。

四、問題查找oop

依據lzo Premature EOF from inputStream錯誤信息google了一把。果真有人遇到過相似的問題,連接:ui

http://www.cnblogs.com/aprilrain/archive/2013/03/06/2946326.html
google

問題緣由:編碼

假設輸出格式是TextOutputFormat,要用LzopCodec,對應的讀取這個輸出的格式是LzoTextInputFormat。spa

假設輸出格式用SequenceFileOutputFormat,要用LzoCodec。對應的讀取這個輸出的格式是SequenceFileInputFormat。

假設輸出使用SequenceFile配上LzopCodec的話。那就等着用SequenceFileInputFormat讀取這個輸出時收到「java.io.EOFException: Premature EOF from inputStream」吧。


以上連接相應的描寫敘述和咱們這個問題有相似狀況。咱們的表輸出格式是RCFileOutputFormat,不是普通文本,壓縮編碼不能用LzopCodec。應該用LzoCodec,而報錯信息印證了這一點。在讀取上一個job採用LzopCodec壓縮生成的rcfile文件時報錯。

既然找到了問題的解決辦法,那下一步就是找相應的參數,這個參數應該是控制reduce輸出壓縮編碼的參數。將其相應的lzo壓縮編碼換成LzoCodec,依據出問題job的配置信息:



果真。mapreduce.output.fileoutputformat.compress.codec選項被設置成了LzopCodec。將該選項改動mapreduce.output.fileoutputformat.compress.codec的值便可了,改動成org.apache.hadoop.io.compress.DefaultCodec,默認使用LzoCodec。

相關文章
相關標籤/搜索