NameNode內存溢出和DataNode請求超時異常處理

問題背景

  春節假期間,接連收到監控程序發出的數據異常問題,趕緊鏈接上跳板機檢查各服務間的狀態,發現Datanode在第二臺、第三臺從節點都掉線了,經過查看Datanode和Namenode運行日誌,發現了問題所在,記錄下此次驚心的處理過程,供參考。java

問題描述

  Namonode主節點運行時報出內存溢出的問題,截取運行日誌以下:node

  
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.Long.valueOf(Long.java:577)
    at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$StorageBlockReportProto.<init>(DatanodeProtocolProtos.java:17327)
    at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$StorageBlockReportProto.<init>(DatanodeProtocolProtos.java:17250)
    at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$StorageBlockReportProto$1.parsePartialFrom(DatanodeProtocolProtos.java:17381)
    at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$StorageBlockReportProto$1.parsePartialFrom(DatanodeProtocolProtos.java:17376)
    at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
View Code

   Datanode數據節點運行時報出Socket鏈接主節點Namenode超時異常,apache

  
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize command for block pool BP-029006-xxx
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
java.net.SocketTimeoutException: Call From xxx/xxx to xxx:xxx failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xxx:xxx remote=xxx/xxx]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
    at org.apache.hadoop.ipc.Client.call(Client.java:1480)
    at org.apache.hadoop.ipc.Client.call(Client.java:1407)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy13.sendHeartbeat(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:153)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:553)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xxx:xxx remote=xxx/xxx]
View Code

解決方案

修改Hadoop集羣服務中各服務組件的內存配置,更新hadoop-env.sh文件服務器

  其中hadoop-env.sh文件所在位置:app

$HADOOP_HOME/etc/hadoop/hadoop-env.shsocket

  Hadoop爲各個守護進程(namenode、secondaryNamenode、jobtracker、datanode、tasktracker)統一分配的內存在hadoop-env.sh中設置,參數爲HADOOP_HEAPSIZE,默認大小爲1000MB。ide

  大部分狀況下,這個統一設置的值可能並不適合。例如對於NameNode節點,1000M的內存只能存儲幾百萬個文件的數據塊的引用。若是我想單獨設置NameNode的內存,能夠通HADOOP_NAMENODE_OPTS來設置。一樣的,能夠經過HADOOP_SECONDARYNAMENODE_OPTS來設置SecondaryNamenode的內存,使得它與NameNode保持一致。固然,還有HADOOP_DATANODE_OPTS、HADOOP_BALANCER_OPTS、HADOOP_JOBTRACKER_OPTS變量供你使用。oop

  針對上面提到的問題,咱們須要提升NameNode和SecondaryNamenode的內存,即修改HADOOP_NAMENODE_OPTS參數,添加配置 -Xmx2048m ,可設置爲2048MB,供參考。一樣經過設置HADOOP_SECONDARYNAMENODE_OPTS參數來提升SecondaryNamenode的使用內存,添加參數配置, -Xmx2048m ,也能夠設置爲2048MB,供參考。根據實際的數據量來調整,數據量越大可適當調高,另需注意服務器的實際內存大小。google

  
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} -Xmx2048m $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS -Xmx2048m $HADOOP_DATANODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} -Xmx2048m $HADOOP_SECONDARYNAMENODE_OPTS"
Hadoop-env.sh Update
相關文章
相關標籤/搜索