在hadoop2.9.0版本中,對namenode、yarn作了ha,隨後在某一臺namenode節點上運行自帶的wordcount程序出現偶發性的錯誤(有時成功,有時失敗),錯誤信息以下:html
18/08/16 17:02:42 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 18/08/16 17:02:42 INFO input.FileInputFormat: Total input files to process : 1 18/08/16 17:02:42 INFO mapreduce.JobSubmitter: number of splits:1 18/08/16 17:02:42 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address 18/08/16 17:02:42 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 18/08/16 17:02:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1534406793739_0005 18/08/16 17:02:42 INFO impl.YarnClientImpl: Submitted application application_1534406793739_0005 18/08/16 17:02:43 INFO mapreduce.Job: The url to track the job: http://HLJRslog2:8088/proxy/application_1534406793739_0005/ 18/08/16 17:02:43 INFO mapreduce.Job: Running job: job_1534406793739_0005 18/08/16 17:02:54 INFO mapreduce.Job: Job job_1534406793739_0005 running in uber mode : false 18/08/16 17:02:54 INFO mapreduce.Job: map 0% reduce 0% 18/08/16 17:02:54 INFO mapreduce.Job: Job job_1534406793739_0005 failed with state FAILED due to: Application application_1534406793739_0005 failed 2 times due to AM Container for appattempt_1534406793739_0005_000002 exited with exitCode: 1 Failing this attempt.Diagnostics: [2018-08-16 17:02:48.561]Exception from container-launch. Container id: container_e27_1534406793739_0005_02_000001 Exit code: 1 [2018-08-16 17:02:48.562] [2018-08-16 17:02:48.574]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. [2018-08-16 17:02:48.575] [2018-08-16 17:02:48.575]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
分析與解決:java
網上對相似問題解決辦法,主要就是添加對應的classpath,測試了一遍,都很差使,說明上訴問題並非classpath形成的,出錯的時候也查看了classpath,都有對應的值,這裏貼一下添加classpath的方法。node
一、# yarn classpath 注:查看對應的classpath的值web
/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/share/hadoop/common/lib/*:/data1/hadoop/hadoop/share/hadoop/common/*:/data1/hadoop/hadoop/share/hadoop/hdfs:/data1/hadoop/hadoop/share/hadoop/hdfs/lib/*:/data1/hadoop/hadoop/share/hadoop/hdfs/*:/data1/hadoop/hadoop/share/hadoop/yarn:/data1/hadoop/hadoop/share/hadoop/yarn/lib/*:/data1/hadoop/hadoop/share/hadoop/yarn/*:/data1/hadoop/hadoop/share/hadoop/mapreduce/lib/*:/data1/hadoop/hadoop/share/hadoop/mapreduce/*:/data1/hadoop/hadoop/contrib/capacity-scheduler/*.jar:/data1/hadoop/hadoop/share/hadoop/yarn/*:/data1/hadoop/hadoop/share/hadoop/yarn/lib/*
若是是上述類變量爲空,能夠經過下面三個步驟添加classpath。apache
2.修改mapred.site.xmlvim
添加:bash
<property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property>
3.yarn.site.xmlapp
添加:webapp
<property> <name>yarn.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property>
4.修改環境變量ide
#vim ~/.bashrc
在文件最後添加下述環境變量:
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
5. source ~/.bashrc
解決報錯問題:
從日誌能夠看出,發現是因爲跑AM的container退出了,並無爲任務去RM獲取資源,懷疑是AM和RM通訊有問題;一臺是備RM,一臺活動的RM,在yarn內部,當MR去活動的RM爲任務獲取資源的時候固然沒問題,可是去備RM獲取時就會出現這個問題了。
修改vim yarn-site.xml
<property> <!-- 客戶端經過該地址向RM提交對應用程序操做 --> <name>yarn.resourcemanager.address.rm1</name> <value>master:8032</value> </property> <property> <!--ResourceManager 對ApplicationMaster暴露的訪問地址。ApplicationMaster經過該地址向RM申請資源、釋放資源等。 --> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>master:8030</value> </property> <property> <!-- RM HTTP訪問地址,查看集羣信息--> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>master:8088</value> </property> <property> <!-- NodeManager經過該地址交換信息 --> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>master:8031</value> </property> <property> <!--管理員經過該地址向RM發送管理命令 --> <name>yarn.resourcemanager.admin.address.rm1</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm1</name> <value>master:23142</value> </property> <!-- <property> <name>yarn.resourcemanager.address.rm2</name> <value>slave1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>slave1:8030</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>slave1:8088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>slave1:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm2</name> <value>slave1:8033</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm2</name> <value>slave1:23142</value> </property> -->
注:標紅的地方就是AM向RM申請資源的rpc端口,出錯問題就在這裏。
紅框裏面是我在rm1機器(也就是master)上的yarn文件添加的;固然,若是是在slave1裏面添加的話就是添加紅框上面以.rm1結尾的那幾行,其實,說白點,就是要在yarn-site.xml這個配置文件裏面添加全部resourcemanager機器的通訊主機與端口。而後拷貝到其餘機器,從新啓動yarn。最後在跑wordcount或者其餘程序沒在出錯。其實這就是因爲MR與RM通訊的問題,因此在配置yarn-site.xml文件的時候,最好把主備的通訊端口都配置到改文件,防止出錯。