春節假期間,接連收到監控程序發出的數據異常問題,趕緊鏈接上跳板機檢查各服務間的狀態,發現Datanode在第二臺、第三臺從節點都掉線了,經過查看Datanode和Namenode運行日誌,發現了問題所在,記錄下此次驚心的處理過程,供參考。java
Namonode主節點運行時報出內存溢出的問題,截取運行日誌以下:node
java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.Long.valueOf(Long.java:577) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$StorageBlockReportProto.<init>(DatanodeProtocolProtos.java:17327) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$StorageBlockReportProto.<init>(DatanodeProtocolProtos.java:17250) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$StorageBlockReportProto$1.parsePartialFrom(DatanodeProtocolProtos.java:17381) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$StorageBlockReportProto$1.parsePartialFrom(DatanodeProtocolProtos.java:17376) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
Datanode數據節點運行時報出Socket鏈接主節點Namenode超時異常,apache
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize command for block pool BP-029006-xxx WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.SocketTimeoutException: Call From xxx/xxx to xxx:xxx failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xxx:xxx remote=xxx/xxx]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751) at org.apache.hadoop.ipc.Client.call(Client.java:1480) at org.apache.hadoop.ipc.Client.call(Client.java:1407) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy13.sendHeartbeat(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:153) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:553) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xxx:xxx remote=xxx/xxx]
修改Hadoop集羣服務中各服務組件的內存配置,更新hadoop-env.sh文件服務器
其中hadoop-env.sh文件所在位置:app
$HADOOP_HOME/etc/hadoop/hadoop-env.shsocket
Hadoop爲各個守護進程(namenode、secondaryNamenode、jobtracker、datanode、tasktracker)統一分配的內存在hadoop-env.sh中設置,參數爲HADOOP_HEAPSIZE,默認大小爲1000MB。ide
大部分狀況下,這個統一設置的值可能並不適合。例如對於NameNode節點,1000M的內存只能存儲幾百萬個文件的數據塊的引用。若是我想單獨設置NameNode的內存,能夠通HADOOP_NAMENODE_OPTS來設置。一樣的,能夠經過HADOOP_SECONDARYNAMENODE_OPTS來設置SecondaryNamenode的內存,使得它與NameNode保持一致。固然,還有HADOOP_DATANODE_OPTS、HADOOP_BALANCER_OPTS、HADOOP_JOBTRACKER_OPTS變量供你使用。oop
針對上面提到的問題,咱們須要提升NameNode和SecondaryNamenode的內存,即修改HADOOP_NAMENODE_OPTS參數,添加配置 -Xmx2048m ,可設置爲2048MB,供參考。一樣經過設置HADOOP_SECONDARYNAMENODE_OPTS參數來提升SecondaryNamenode的使用內存,添加參數配置, -Xmx2048m ,也能夠設置爲2048MB,供參考。根據實際的數據量來調整,數據量越大可適當調高,另需注意服務器的實際內存大小。google
# Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} -Xmx2048m $HADOOP_NAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS -Xmx2048m $HADOOP_DATANODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} -Xmx2048m $HADOOP_SECONDARYNAMENODE_OPTS"