近期因爲運維重啓集羣致使Flume異常關閉,總結一下緣由和處理方法:java
現象:查詢Hive外部表失敗,外部表數據有Flume寫入node
錯誤日誌:apache
2017-11-01 00:15:37,077 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hive (auth:SIMPLE) cause:java.io.IOException: java.lang.reflect.InvocationTargetException 2017-11-01 00:15:37,079 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:212) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:332) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:721) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251) ... 11 more Caused by: java.io.IOException: Cannot obtain block length for LocatedBlock{BP-1461471655-10.2.35.25-1489124540398:blk_1176728682_102999133; getBlockSize()=2414080; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.2.35.52:50010,DS-d3f75610-c89b-4617-8c97-4460066476ad,DISK], DatanodeInfoWithStorage[10.2.35.53:50010,DS-ad8c89d8-5f5d-4491-9aef-602bce3c244a,DISK], DatanodeInfoWithStorage[10.2.35.29:50010,DS-e77934f2-799d-4c65-8715-59378e689e93,DISK], DatanodeInfoWithStorage[10.2.35.54:50010,DS-75fdf7a7-7177-4b0a-8939-720b718623e5,DISK]]} at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:427) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:335) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:263) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1585) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:67) ... 16 more
問題緣由:CM重啓前,未關閉Flume Agent,致使文件未被正常關閉。bash
處理方法:app
方法一:暴力刪除狀態異常文件運維
hadoop fsck <ath-of-the-file> -openforwrite | egrep -v '^\.+$' | egrep "MISSING|OPENFORWRITE" | grep -o "/[^ ]*" | sed -e "s/:$//" | xargs -i hadoop fs -rmr {};
方法二:恢復租約oop
hdfs debug recoverLease -path <path-of-the-file> -retries <retry times>