緣由:新集羣(cdh)搭建好,打算測試一下新集羣的hdfs性能,因而使用hdfs自帶的測試 hadoop-test-2.6.0-mr1-cdh5.6.1.jar,計劃產生10T的數據量進行測試,hadoop jar hadoop-test-2.6.0-mr1-cdh5.6.1.jar TestDFSIO -write -nrFiles 10 -fileSize 10000000 -resFile /tmp/TestDFSIO_results.log;java
因爲時間問題,在數據量產生到2T多的時候,集羣機器直接關機了,等再次開機時,hdfs啓動上有兩個datanode起不來,包以下錯:node
ERROR DataNode
laydca10:1004:DataXceiver error processing WRITE_BLOCK operation src: /192.168.1.150:33090 dst: /192.168.1.151:1004
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:901)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:808)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:748)apache
,後來檢查發現,啓動不了的這兩臺datanode下存在幾百GB的數據沒刪除,而後把兩臺機器上的datanode角色刪除,而後把這兩臺datanode下的幾百GB的數據刪除,從新添加這兩臺機器上的datanode角色,發現仍是啓動不了;再查看日誌發現192.168.1.150和192.168.1.151上的yarn下面有幾百MB的數據,懷疑多是因爲數據已經刪除,可是yarn中的任務註冊信息還存在,致使的錯誤;而後把150和151上的nodemanager角色刪除,而後把150和151上的yarn下的數據所有清空,而後從新添加這兩個nodemanager角色,在重啓yarn和hdfs,hdfs的全部節點就均可以正常啓動了。oop