HBase在0.90以後的版本提供Replication功能,這些天本人在測試這個功能時發如今大量數據(>100W)寫入時會出現RegionServer崩潰的狀況。異常日誌以下: java
2014-10-29 10:40:44,225 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2223802775658985697_1410java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2967) 2014-10-29 10:40:44,225 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-2223802775658985697_1410 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry... 2014-10-29 10:40:44,228 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-2223802775658985697_1410 bad datanode[0] 192.168.11.55:40010 2014-10-29 10:40:44,232 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing java.io.IOException: All datanodes 192.168.11.55:40010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3096) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2014-10-29 10:40:44,235 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. Requesting close of hlog java.io.IOException: Reflection at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:310) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1366) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1476) at org.apache.hadoop.hbase.regionserver.HRegion.syncOrDefer(HRegion.java:5970) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2490) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2190) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3888) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:323) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1434) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:308) ... 11 more Caused by: java.io.IOException: DFSOutputStream is closed at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3669) at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:995) ... 15 more
實際上,這個問題並非由Replication功能引發的,而由客戶端在數據密集寫入時超時引發的 node
============================如下內容來源於網絡=================================== apache
正常狀況下DFSClient寫block數據的過程是: 網絡
1. DFSClient端在hdfs端和HBase端,給dfs.socket.timeout設置一個較大的值,好比300000(300秒)【注意兩處設置的值要相等】 異步