Hadoop(1)---運行Hadoop自帶的wordcount出錯問題。

    在hadoop2.9.0版本中,對namenode、yarn作了ha,隨後在某一臺namenode節點上運行自帶的wordcount程序出現偶發性的錯誤(有時成功,有時失敗),錯誤信息以下:html

18/08/16 17:02:42 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/08/16 17:02:42 INFO input.FileInputFormat: Total input files to process : 1
18/08/16 17:02:42 INFO mapreduce.JobSubmitter: number of splits:1
18/08/16 17:02:42 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
18/08/16 17:02:42 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/08/16 17:02:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1534406793739_0005
18/08/16 17:02:42 INFO impl.YarnClientImpl: Submitted application application_1534406793739_0005
18/08/16 17:02:43 INFO mapreduce.Job: The url to track the job: http://HLJRslog2:8088/proxy/application_1534406793739_0005/
18/08/16 17:02:43 INFO mapreduce.Job: Running job: job_1534406793739_0005
18/08/16 17:02:54 INFO mapreduce.Job: Job job_1534406793739_0005 running in uber mode : false
18/08/16 17:02:54 INFO mapreduce.Job: map 0% reduce 0%
18/08/16 17:02:54 INFO mapreduce.Job: Job job_1534406793739_0005 failed with state FAILED due to: Application application_1534406793739_0005 failed 2 times due to AM Container for appattempt_1534406793739_0005_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2018-08-16 17:02:48.561]Exception from container-launch.
Container id: container_e27_1534406793739_0005_02_000001
Exit code: 1
[2018-08-16 17:02:48.562]
[2018-08-16 17:02:48.574]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

[2018-08-16 17:02:48.575]
[2018-08-16 17:02:48.575]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

分析與解決:java

網上對相似問題解決辦法,主要就是添加對應的classpath,測試了一遍,都很差使,說明上訴問題並非classpath形成的,出錯的時候也查看了classpath,都有對應的值,這裏貼一下添加classpath的方法。node

一、# yarn classpath    注:查看對應的classpath的值web

/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/share/hadoop/common/lib/*:/data1/hadoop/hadoop/share/hadoop/common/*:/data1/hadoop/hadoop/share/hadoop/hdfs:/data1/hadoop/hadoop/share/hadoop/hdfs/lib/*:/data1/hadoop/hadoop/share/hadoop/hdfs/*:/data1/hadoop/hadoop/share/hadoop/yarn:/data1/hadoop/hadoop/share/hadoop/yarn/lib/*:/data1/hadoop/hadoop/share/hadoop/yarn/*:/data1/hadoop/hadoop/share/hadoop/mapreduce/lib/*:/data1/hadoop/hadoop/share/hadoop/mapreduce/*:/data1/hadoop/hadoop/contrib/capacity-scheduler/*.jar:/data1/hadoop/hadoop/share/hadoop/yarn/*:/data1/hadoop/hadoop/share/hadoop/yarn/lib/*

若是是上述類變量爲空,能夠經過下面三個步驟添加classpath。apache

2.修改mapred.site.xmlvim

添加:bash

<property> 
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>

 

3.yarn.site.xmlapp

添加:webapp

 

<property>
    <name>yarn.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>

 

 4.修改環境變量ide

#vim ~/.bashrc

在文件最後添加下述環境變量:

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

 5. source ~/.bashrc

 

解決報錯問題:

從日誌能夠看出,發現是因爲跑AM的container退出了,並無爲任務去RM獲取資源,懷疑是AM和RM通訊有問題;一臺是備RM,一臺活動的RM,在yarn內部,當MR去活動的RM爲任務獲取資源的時候固然沒問題,可是去備RM獲取時就會出現這個問題了。

修改vim yarn-site.xml

<property>
<!-- 客戶端經過該地址向RM提交對應用程序操做 -->
<name>yarn.resourcemanager.address.rm1</name>
<value>master:8032</value>
</property>
<property>
<!--ResourceManager 對ApplicationMaster暴露的訪問地址。ApplicationMaster經過該地址向RM申請資源、釋放資源等。 -->
<name>yarn.resourcemanager.scheduler.address.rm1</name>  
<value>master:8030</value>
</property>
<property>
<!-- RM HTTP訪問地址,查看集羣信息-->
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>master:8088</value>
</property>
<property>
<!-- NodeManager經過該地址交換信息 -->
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>master:8031</value>
</property>
<property>
<!--管理員經過該地址向RM發送管理命令 -->
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>master:23142</value>
</property>
<!--
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>slave1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>slave1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>slave1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>slave1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>slave1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>slave1:23142</value>
</property>
-->

注:標紅的地方就是AM向RM申請資源的rpc端口,出錯問題就在這裏。

 

       紅框裏面是我在rm1機器(也就是master)上的yarn文件添加的;固然,若是是在slave1裏面添加的話就是添加紅框上面以.rm1結尾的那幾行,其實,說白點,就是要在yarn-site.xml這個配置文件裏面添加全部resourcemanager機器的通訊主機與端口。而後拷貝到其餘機器,從新啓動yarn。最後在跑wordcount或者其餘程序沒在出錯。其實這就是因爲MR與RM通訊的問題,因此在配置yarn-site.xml文件的時候,最好把主備的通訊端口都配置到改文件,防止出錯。

相關文章
相關標籤/搜索