原創,專業,圖文 hadoop-HA集羣搭建,啓動DataNode,檢測啓動狀態,執行HDFS命令,啓動YARN,HD - 集羣,搭建,啓動,DataNode,檢測,狀態,執行,HDFS,命令,YARN,權限,配置,客戶端, 今日頭條,最新,最好,最優秀,最靠譜,最有用,最好看,最有效,最熱,排行榜,最牛,怎麼辦,怎麼弄,解決方案,解決方法,怎麼處理,如何處理,如何解決html
<?xml version="1.0"?>java <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>node
<configuration>git <property>程序員 <name>mapreduce.framework.name</name>github <value>yarn</value>web </property>spring </configuration>sql |
詳細配置可參考:docker
對yarn-site.xml文件的修改,涉及下表中的屬性:
屬性名 |
屬性值 |
涉及範圍 |
|
|
HA模式可不配置,但因爲其它配置項可能有引用它,建議保持值爲0.0.0.0,若是沒有被引用到,則可不配置。 |
yarn.nodemanager.hostname |
0.0.0.0 |
|
yarn.nodemanager.aux-services |
mapreduce_shuffle |
|
如下爲HA相關的配置,包括自動切換(可僅可在ResourceManager節點上配置) |
||
yarn.resourcemanager.ha.enabled |
true |
啓用HA |
yarn.resourcemanager.cluster-id |
yarn-cluster |
可不一樣於HDFS的 |
yarn.resourcemanager.ha.rm-ids |
rm1,rm2 |
注意NodeManager要和ResourceManager同樣配置 |
yarn.resourcemanager.hostname.rm1 |
hadoop1 |
|
yarn.resourcemanager.hostname.rm2 |
hadoop2 |
|
yarn.resourcemanager.webapp.address.rm1 |
hadoop1:8088 |
在瀏覽器上訪問:http://hadoop1:8088,能夠看到yarn的信息 |
yarn.resourcemanager.webapp.address.rm2 |
hadoop2:8088 |
在瀏覽器上訪問:http://hadoop2:8088,能夠看到yarn的信息 |
yarn.resourcemanager.zk-address |
hadoop11:2181,hadoop12:2182,hadoop13:2181 |
|
yarn.resourcemanager.ha.automatic-failover.enable |
true |
可不配置,由於當yarn.resourcemanager.ha.enabled爲true時,它的默認值即爲true |
如下爲NodeManager配置 |
||
yarn.nodemanager.vmem-pmem-ratio |
|
每使用1MB物理內存,最多可用的虛擬內存數,默認值爲2.1,在運行spark-sql時若是遇到「Yarn application has already exited with state FINISHED」,則應當檢查NodeManager的日誌,以查看是否該配置偏小緣由 |
yarn.nodemanager.resource.cpu-vcores |
|
NodeManager總的可用虛擬CPU個數,默認值爲8 |
yarn.nodemanager.resource.memory-mb |
|
該節點上YARN可以使用的物理內存總量,默認是8192(MB) |
yarn.nodemanager.pmem-check-enabled |
|
是否啓動一個線程檢查每一個任務正使用的物理內存量,若是任務超出分配值,則直接將其殺掉,默認是true |
yarn.nodemanager.vmem-check-enabled |
|
是否啓動一個線程檢查每一個任務正使用的虛擬內存量,若是任務超出分配值,則直接將其殺掉,默認是true |
如下爲ResourceManager配置 |
||
yarn.scheduler.minimum-allocation-mb |
|
單個容器可申請的最小內存 |
yarn.scheduler.maximum-allocation-mb |
|
單個容器可申請的最大內存 |
實際部署的時候一個參考配置:
<?xml version="1.0"?>
<configuration> <!--啓用HA--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property>
<!--指定RM的cluster id--> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property>
<!-- 指定RM的名字 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property>
<!--分別指定RM的地址--> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop2</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>hadoop1:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>hadoop2:8088</value> </property> <!--指定zk集羣地址--> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop11:2181,hadoop12:2182,hadoop13:2181</value> </property>
<!-- yarn中的nodemanager是否要提供一些輔助的服務 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
</configuration> |
yarn.nodemanager.hostname若是配置成具體的IP,則會致使每一個NodeManager的配置不一樣。詳細配置可參考:
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml。
Yarn HA的配置能夠參考:
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html。
在hadoop1上配置完成後執行:
scp -r /home/toto/software/hadoop-2.8.0/etc/hadoop/* root@hadoop2:/home/tuzq/software/hadoop-2.8.0/etc/hadoop
scp -r /home/toto/software/hadoop-2.8.0/etc/hadoop/* root@hadoop3:/home/tuzq/software/hadoop-2.8.0/etc/hadoop
scp -r /home/toto/software/hadoop-2.8.0/etc/hadoop/* root@hadoop4:/home/tuzq/software/hadoop-2.8.0/etc/hadoop
scp -r /home/toto/software/hadoop-2.8.0/etc/hadoop/* root@hadoop5:/home/tuzq/software/hadoop-2.8.0/etc/hadoop
Zookeeper -> JournalNode -> 格式化NameNode -> 初始化JournalNode
-> 建立命名空間(zkfc) -> NameNode -> DataNode -> ResourceManager -> NodeManager。
但請注意首次啓動NameNode以前,得先作format,也請注意備NameNode的啓動方法。
在啓動HDFS以前,須要先完成對NameNode的格式化。
mkdir -p /home/tuzq/software/hadoop-2.8.0/tmp/dfs/name
./zkServer.sh start
注意在啓動其它以前先啓動zookeeper。
在其中一個namenode(hadoop1)上執行:
cd $HADOOP_HOME
bin/hdfs zkfc -formatZK (第二次不用執行了)
10.4. 啓動全部JournalNode(hadoop1,hadoop2,hadoop3上執行)
NameNode將元數據操做日誌記錄在JournalNode上,主備NameNode經過記錄在JouralNode上的日誌完成元數據同步。
在全部JournalNode上執行:
cd $HADOOP_HOME
sbin/hadoop-daemon.sh start journalnode
執行完成以後執行下面的命令進行查看:
[root@hadoop2 hadoop-2.8.0]# jps
3314 Jps
3267 JournalNode
[root@hadoop2 hadoop-2.8.0]#
注意,在執行「hdfs namenode -format」以前,必須先啓動好JournalNode,而format又必須在啓動namenode以前。
10.5初始化namenode
進入hadoop1接着執行下面的命令(初始化namenode,若是以前已經初始化過了,此時不須要再次從新初始化namenode):
hdfs namenode -format (第二次不用執行了)
若是是非HA轉HA才須要這一步,在其中一個JournalNode(在hadoop1)上執行:
bin/hdfs namenode -initializeSharedEdits (第二次不用執行了):
此命令默認是交互式的,加上參數-force轉成非交互式。
在全部JournalNode建立以下目錄(第二次不用執行了):
mkdir -p /home/tuzq/software/hadoop-2.8.0/journal/mycluster/current
10.7. 啓動主NameNode
下面進入的是hadoop1這臺機器。關於啓動hadoop2上的namenode在下面的博文中有介紹。
1) 進入$HADOOP_HOME目錄
2) 啓動主NameNode:
sbin/hadoop-daemon.sh start namenode
啓動時,遇到以下所示的錯誤,則表示NameNode不能免密碼登陸本身。若是以前使用IP能夠免密碼登陸本身,則緣由通常是由於沒有使用主機名登陸過本身,所以解決辦法是使用主機名SSH一下
10.8. 啓動備NameNode
進入hadoop2,執行如下命令
bin/hdfs namenode -bootstrapStandby
出現:Re-format的都選擇N
sbin/hadoop-daemon.sh start namenode
若是沒有執行第1步,直接啓動會遇到以下錯誤:
No valid image files found
或者在該NameNode日誌會發現以下錯誤:
2016-04-08 14:08:39,745 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.IOException: NameNode is not formatted.
在全部NameNode(即hadoop1和hadoop2上都執行命令)上啓動主備切換進程:
sbin/hadoop-daemon.sh start zkfc
只有啓動了DFSZKFailoverController進程,HDFS才能自動切換主備。
注:zkfc是zookeeper failover controller的縮寫。
在各個DataNode上分別執行(即hadoop3,hadoop4,hadoop5上):
sbin/hadoop-daemon.sh start datanode
若是有發現DataNode進程並無起來,能夠試試刪除logs目錄下的DataNode日誌,再得啓看看。
1) 使用JDK提供的jps命令,查看相應的進程是否已啓動
2) 檢查$HADOOP_HOME/logs目錄下的log和out文件,看看是否有異常信息。
啓動後nn1和nn2都處於備機狀態,將nn1切換爲主機(下面的命令在hadoop1上執行):
bin/hdfs haadmin -transitionToActive nn1
執行jps命令(注:jps是jdk中的一個命令,不是jre中的命令),可看到DataNode進程:
$ jps 18669 DataNode 24542 Jps |
執行jps命令,可看到NameNode進程:
$ jps 18669 NameNode 24542 Jps |
執行HDFS命令,以進一步檢驗是否已經安裝成功和配置好。關於HDFS命令的用法,直接運行命令hdfs或hdfs dfs,便可看到相關的用法說明。
hdfs dfsadmin -report
注意若是core-site.xml中的配置項fs.default.name的值爲file:///,則會報:
report: FileSystem file:/// is not an HDFS file system
Usage: hdfs dfsadmin [-report] [-live] [-dead] [-decommissioning]
解決這個問題,只須要將fs.default.name的值設置爲和fs.defaultFS相同的值。
10.12.2啓動hdfs和yarn(在hadoop1,hadoop2上分別執行)
進入hadoop1機器,執行命令:
[root@hadoop1sbin]# sbin/start-dfs.sh
cd $HADOOP_HOME
# sbin/start-yarn.sh (注意:hadoop1和hadoop2都啓動)
在瀏覽器上訪問:http://hadoop1:50070/,界面以下:
上面顯示的是主的,是active狀態。
再在瀏覽器上訪問:http://hadoop2:50070/
經過上面,發現hadoop2是一種備用狀態。
訪問yarn(訪問地址能夠在yarn-site.xml中查找到),訪問以後的效果以下http://hadoop1:8088/cluster:
如查看NameNode1和NameNode2分別是主仍是備:
$ hdfs haadmin -getServiceState nn1 standby $ hdfs haadmin -getServiceState nn2 active |
10.12.3. hdfs dfs ls
注意:下面的命令只有在啓動了yarn以後纔會可用
「hdfs dfs -ls」帶一個參數,若是參數以「hdfs://URI」打頭表示訪問HDFS,不然至關於ls。其中URI爲NameNode的IP或主機名,能夠包含端口號,即hdfs-site.xml中「dfs.namenode.rpc-address」指定的值。
「hdfs dfs -ls」要求默認端口爲8020,若是配置成9000,則須要指定端口號,不然不用指定端口,這一點相似於瀏覽器訪問一個URL。示例:
> hdfs dfs -ls hdfs://hadoop1:8020/ |
8020後面的斜槓/是和必須的,不然被看成文件。若是不指定端口號8020,則使用默認的8020,「hadoop1:8020」由hdfs-site.xml中「dfs.namenode.rpc-address」指定。
不難看出「hdfs dfs -ls」能夠操做不一樣的HDFS集羣,只須要指定不一樣的URI。
若是想經過hdfs協議查看文件列表或者文件,可使用以下方式:
文件上傳後,被存儲在DataNode的data目錄下(由DataNode的hdfs-site.xml中的屬性「dfs.datanode.data.dir」指定),
如:$HADOOP_HOME/data/data/current/BP-472842913-192.168.106.91-1497065109036/current/finalized/subdir0/subdir0/blk_1073741825
文件名中的「blk」是block,即塊的意思,默認狀況下blk_1073741825即爲文件的一個完整塊,Hadoop未對它進額外處理。
上傳文件命令,示例:
> hdfs dfs -put /etc/SuSE-release hdfs://192.168.106.91/ |
刪除文件命令,示例:
> hdfs dfs -rm hdfs://192.168.106.91/SuSE-release Deleted hdfs://192.168.106.91/SuSE-release |
當有NameNode機器損壞時,必然存在新NameNode來替代。把配置修改爲指向新NameNode,而後以備機形式啓動新NameNode,這樣新的NameNode即加入到Cluster中:
1) bin/hdfs namenode -bootstrapStandby 2) sbin/hadoop-daemon.sh start namenode |
10.12.7. HDFS只容許有一主一備兩個NameNode
若是試圖配置三個NameNode,如:
dfs.ha.namenodes.test nm1,nm2,nm3
The prefix for a given nameservice, contains a comma-separated list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
|
則運行「hdfs namenode -bootstrapStandby」時會報以下錯誤,表示在同一NameSpace內不能超過2個NameNode:
16/04/11 09:51:57 ERROR namenode.NameNode: Failed to start namenode. java.io.IOException: java.lang.IllegalArgumentException: Expected exactly 2 NameNodes in namespace 'test'. Instead, got only 3 (NN ids were 'nm1','nm2','nm3' at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run(BootstrapStandby.java:425) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1454) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) Caused by: java.lang.IllegalArgumentException: Expected exactly 2 NameNodes in namespace 'test'. Instead, got only 3 (NN ids were 'nm1','nm2','nm3' at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115) |
10.12.8. 存儲均衡start-balancer.sh
示例:start-balancer.sh –t 10%
10%表示機器與機器之間磁盤使用率誤差小於10%時認爲均衡,不然作均衡搬動。「start-balancer.sh」調用「hdfs start balancer」來作均衡,能夠調用stop-balancer.sh中止均衡。
均衡過程很是慢,可是均衡過程當中,仍可以正常訪問HDFS,包括往HDFS上傳文件。
[VM2016@hadoop-030 /data4/hadoop/sbin]$ hdfs balancer # 能夠改成調用start-balancer.sh 16/04/08 14:26:55 INFO balancer.Balancer: namenodes = [hdfs://test] // test爲HDFS的cluster名 16/04/08 14:26:55 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 16/04/08 14:26:56 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.231:50010 16/04/08 14:26:56 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.229:50010 16/04/08 14:26:56 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.213:50010 16/04/08 14:26:56 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.208:50010 16/04/08 14:26:56 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.232:50010 16/04/08 14:26:56 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.207:50010 16/04/08 14:26:56 INFO balancer.Balancer: 5 over-utilized: [192.168.1.231:50010:DISK, 192.168.1.229:50010:DISK, 192.168.1.213:50010:DISK, 192.168.1.208:50010:DISK, 192.168.1.232:50010:DISK] 16/04/08 14:26:56 INFO balancer.Balancer: 1 underutilized(未充分利用的): [192.168.1.207:50010:DISK] # 數據將移向該節點 16/04/08 14:26:56 INFO balancer.Balancer: Need to move 816.01 GB to make the cluster balanced. # 須要移動816.01G數據達到平衡 16/04/08 14:26:56 INFO balancer.Balancer: Decided to move 10 GB bytes from 192.168.1.231:50010:DISK to 192.168.1.207:50010:DISK # 從192.168.1.231移動10G數據到192.168.1.207 16/04/08 14:26:56 INFO balancer.Balancer: Will move 10 GB in this iteration
16/04/08 14:32:58 INFO balancer.Dispatcher: Successfully moved blk_1073749366_8542 with size=77829046 from 192.168.1.231:50010:DISK to 192.168.1.207:50010:DISK through 192.168.1.213:50010 16/04/08 14:32:59 INFO balancer.Dispatcher: Successfully moved blk_1073749386_8562 with size=77829046 from 192.168.1.231:50010:DISK to 192.168.1.207:50010:DISK through 192.168.1.231:50010 16/04/08 14:33:34 INFO balancer.Dispatcher: Successfully moved blk_1073749378_8554 with size=77829046 from 192.168.1.231:50010:DISK to 192.168.1.207:50010:DISK through 192.168.1.231:50010 16/04/08 14:34:38 INFO balancer.Dispatcher: Successfully moved blk_1073749371_8547 with size=134217728 from 192.168.1.231:50010:DISK to 192.168.1.207:50010:DISK through 192.168.1.213:50010 16/04/08 14:34:54 INFO balancer.Dispatcher: Successfully moved blk_1073749395_8571 with size=134217728 from 192.168.1.231:50010:DISK to 192.168.1.207:50010:DISK through 192.168.1.231:50010 Apr 8, 2016 2:35:01 PM 0 478.67 MB 816.01 GB 10 GB 16/04/08 14:35:10 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.213:50010 16/04/08 14:35:10 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.229:50010 16/04/08 14:35:10 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.232:50010 16/04/08 14:35:10 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.231:50010 16/04/08 14:35:10 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.208:50010 16/04/08 14:35:10 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.207:50010 16/04/08 14:35:10 INFO balancer.Balancer: 5 over-utilized: [192.168.1.213:50010:DISK, 192.168.1.229:50010:DISK, 192.168.1.232:50010:DISK, 192.168.1.231:50010:DISK, 192.168.1.208:50010:DISK] 16/04/08 14:35:10 INFO balancer.Balancer: 1 underutilized(未充分利用的): [192.168.1.207:50010:DISK] 16/04/08 14:35:10 INFO balancer.Balancer: Need to move 815.45 GB to make the cluster balanced. 16/04/08 14:35:10 INFO balancer.Balancer: Decided to move 10 GB bytes from 192.168.1.213:50010:DISK to 192.168.1.207:50010:DISK 16/04/08 14:35:10 INFO balancer.Balancer: Will move 10 GB in this iteration
16/04/08 14:41:18 INFO balancer.Dispatcher: Successfully moved blk_1073760371_19547 with size=77829046 from 192.168.1.213:50010:DISK to 192.168.1.207:50010:DISK through 192.168.1.213:50010 16/04/08 14:41:19 INFO balancer.Dispatcher: Successfully moved blk_1073760385_19561 with size=77829046 from 192.168.1.213:50010:DISK to 192.168.1.207:50010:DISK through 192.168.1.213:50010 16/04/08 14:41:22 INFO balancer.Dispatcher: Successfully moved blk_1073760393_19569 with size=77829046 from 192.168.1.213:50010:DISK to 192.168.1.207:50010:DISK through 192.168.1.213:50010 16/04/08 14:41:23 INFO balancer.Dispatcher: Successfully moved blk_1073760363_19539 with size=77829046 from 192.168.1.213:50010:DISK to 192.168.1.207:50010:DISK through 192.168.1.213:50010 |
找一臺已有JournalNode節點,修改它的hdfs-site.xml,將新增的Journal包含進來,如在
qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster
的基礎上新增hadoop6和hadoop7兩個JournalNode:
qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485;hadoop6:8485;hadoop7:8485/mycluster
而後將安裝目錄和數據目錄(hdfs-site.xml中的dfs.journalnode.edits.dir指定的目錄)都複製到新的節點。
若是不復制JournalNode的數據目錄,則新節點上的JournalNode會報錯「Journal Storage Directory /data/journal/test not formatted」,未來的版本可能會實現自動同步。
接下來,就能夠在新節點上啓動好JournalNode(不須要作什麼初始化),並重啓下NameNode。注意觀察JournalNode日誌,查看是否啓動成功,當日志顯示爲如下這樣的INFO級別日誌則表示啓動成功:
2016-04-26 10:31:11,160 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /data/journal/test/current/edits_inprogress_0000000000000194269 -> /data/journal/test/current/edits_0000000000000194269-0000000000000194270
11. 啓動YARN
1) 進入$HADOOP_HOME/sbin目錄
2) 在主備兩臺都執行:start-yarn.sh,即開始啓動YARN
若啓動成功,則在Master節點執行jps,能夠看到ResourceManager:
> jps 24689 NameNode 30156 Jps 28861 ResourceManager |
在Slaves節點執行jps,能夠看到NodeManager:
$ jps 14019 NodeManager 23257 DataNode 15115 Jps |
若是隻須要單獨啓動指定節點上的ResourceManager,這樣:
sbin/yarn-daemon.sh start resourcemanager
對於NodeManager,則是這樣:
sbin/yarn-daemon.sh start nodemanager
列舉YARN集羣中的全部NodeManager,如(注意參數間的空格,直接執行yarn能夠看到使用幫助):
[root@hadoop1sbin]# yarn node –list
查看指定NodeManager的狀態(經過上面查出來的結果進行查詢),如:
[root@hadoop1 hadoop]# yarn node -status hadoop5:59894 Node Report : Node-Id : hadoop5:59894 Rack : /default-rack Node-State : RUNNING Node-Http-Address : hadoop5:8042 Last-Health-Update : 星期六 10/六月/17 12:30:38:20CST Health-Report : Containers : 0 Memory-Used : 0MB Memory-Capacity : 8192MB CPU-Used : 0 vcores CPU-Capacity : 8 vcores Node-Labels : Resource Utilization by Node : PMem:733 MB, VMem:733 MB, VCores:0.0 Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0
[root@hadoop1 hadoop]# |
11.2.3. yarn rmadmin -getServiceState rm1
查看rm1的主備狀態,即查看它是主(active)仍是備(standby)。
11.2.4. yarn rmadmin -transitionToStandby rm1
將rm1從主切爲備。
更多的yarn命令能夠參考:
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnCommands.html。
在安裝目錄的share/hadoop/mapreduce子目錄下,有現存的示例程序:
hadoop@VM-40-171-sles10-64:~/hadoop> ls share/hadoop/mapreduce hadoop-mapreduce-client-app-2.7.2.jar hadoop-mapreduce-client-jobclient-2.7.2-tests.jar hadoop-mapreduce-client-common-2.7.2.jar hadoop-mapreduce-client-shuffle-2.7.2.jar hadoop-mapreduce-client-core-2.7.2.jar hadoop-mapreduce-examples-2.7.2.jar hadoop-mapreduce-client-hs-2.7.2.jar lib hadoop-mapreduce-client-hs-plugins-2.7.2.jar lib-examples hadoop-mapreduce-client-jobclient-2.7.2.jar sources |
跑一個示例程序試試:
hdfs dfs -put /etc/hosts hdfs://test/in/ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hdfs://test/in/ hdfs://test/out/ |
運行過程當中,使用java的jps命令,能夠看到yarn啓動了名爲YarnChild的進程。
wordcount運行完成後,結果會保存在out目錄下,保存結果的文件名相似於「part-r-00000」。另外,跑這個示例程序有兩個需求注意的點:
1) in目錄下要有文本文件,或in即爲被統計的文本文件,能夠爲HDFS上的文件或目錄,也能夠爲本地文件或目錄
2) out目錄不能存在,程序會自動去建立它,若是已經存在則會報錯。
包hadoop-mapreduce-examples-2.7.2.jar中含有多個示例程序,不帶參數運行,便可看到用法:
> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount Usage: wordcount
> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files. |
修改日誌級別爲DEBBUG,並打屏:
export HADOOP_ROOT_LOGGER=DEBUG,console
13. HDFS權限配置
dfs.permissions.enabled = true dfs.permissions.superusergroup = supergroup dfs.cluster.administrators = ACL-for-admins dfs.namenode.acls.enabled = true dfs.web.ugi = webuser,webgroup |
fs.permissions.umask-mode = 022 hadoop.security.authentication = simple 安全驗證規則,可爲simple或kerberos |
// g++ -g -o x x.cpp -L$JAVA_HOME/lib/amd64/jli -ljli -L$JAVA_HOME/jre/lib/amd64/server -ljvm -I$HADOOP_HOME/include $HADOOP_HOME/lib/native/libhdfs.a -lpthread -ldl #include "hdfs.h" #include #include #include
int main(int argc, char **argv) { #if 0 hdfsFS fs = hdfsConnect("default", 0); // HA方式 const char* writePath = "hdfs://mycluster/tmp/testfile.txt"; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY |O_CREAT, 0, 0, 0); if(!writeFile) { fprintf(stderr, "Failed to open %s for writing!\n", writePath); exit(-1); } const char* buffer = "Hello, World!\n"; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); if (hdfsFlush(fs, writeFile)) { fprintf(stderr, "Failed to 'flush' %s\n", writePath); exit(-1); } hdfsCloseFile(fs, writeFile); #else struct hdfsBuilder* bld = hdfsNewBuilder(); hdfsBuilderSetNameNode(bld, "default"); // HA方式 hdfsFS fs = hdfsBuilderConnect(bld); if (NULL == fs) { fprintf(stderr, "Failed to connect hdfs\n"); exit(-1); } int num_entries = 0; hdfsFileInfo* entries; if (argc < 2) entries = hdfsListDirectory(fs, "/", &num_entries); else entries = hdfsListDirectory(fs, argv[1], &num_entries); fprintf(stdout, "num_entries: %d\n", num_entries); for (int i=0; i<num_entries; ++i) </num_entries; ++i)<> { fprintf(stdout, "%s\n", entries[i].mName); } hdfsFreeFileInfo(entries, num_entries); hdfsDisconnect(fs); //hdfsFreeBuilder(bld); #endif return 0; } |
運行以前須要設置好CLASSPATH,若是設置不當,可能會遇到很多困難,好比指望操做HDFS上的文件和目錄,卻變成了本地的文件和目錄,如者諸於「java.net.UnknownHostException」類的錯誤等。
爲避免出現錯誤,強烈建議使用命令「hadoop classpath --glob」取得正確的CLASSPATH值。
另外還須要設置好libjli.so和libjvm.so兩個庫的LD_LIBRARY_PATH,如:
export LD_LIBRARY_PATH=$JAVA_HOME/lib/amd64/jli:$JAVA_HOME/jre/lib/amd64/server:$LD_LIBRARY_PATH |
15.1. 執行「hdfs dfs -ls」時報ConnectException
緣由多是指定的端口號9000不對,該端口號由hdfs-site.xml中的屬性「dfs.namenode.rpc-address」指定,即爲NameNode的RPC服務端口號。
文件上傳後,被存儲在DataNode的data(由DataNode的hdfs-site.xml中的屬性「dfs.datanode.data.dir」指定)目錄下,如:
$HADOOP_HOME/data/current/BP-139798373-192.168.106.91-1397735615751/current/finalized/blk_1073741825
文件名中的「blk」是block,即塊的意思,默認狀況下blk_1073741825即爲文件的一個完整塊,Hadoop未對它進額外處理。
hdfs dfs -ls hdfs://192.168.106.91:9000 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/tuzq/software/hadoop-2.8.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'. 14/04/17 12:04:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/04/17 12:04:03 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/04/17 12:04:03 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. ls: Call From VM-40-171-sles10-64/192.168.106.91 to VM-40-171-sles10-64:9000 failed on connection exception: java.net.ConnectException: 拒絕鏈接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused |
15.2. Initialization failed for Block pool
多是由於對NameNode作format以前,沒有清空DataNode的data目錄。
「Incompatible clusterIDs」的錯誤緣由是在執行「hdfs namenode -format」以前,沒有清空DataNode節點的data目錄。
網上一些文章和帖子說是tmp目錄,它自己也是沒問題的,但Hadoop 2.7.2是data目錄,實際上這個信息已經由日誌的「/home/tuzq/software/hadoop-2.8.0/data」指出,因此不能死死的參照網上的解決辦法,遇到問題時多仔細觀察。
從上述描述不難看出,解決辦法就是清空全部DataNode的data目錄,但注意不要將data目錄自己給刪除了。 data目錄由core-site.xml文件中的屬性「dfs.datanode.data.dir」指定。
2014-04-17 19:30:33,075 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/tuzq/software/hadoop-2.8.0/data/in_use.lock acquired by nodename 28326@localhost 2014-04-17 19:30:33,078 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool (Datanode Uuid unassigned) service to /192.168.106.91:9001 java.io.IOException: Incompatible clusterIDs in /home/tuzq/software/hadoop-2.8.0/data: namenode clusterID = CID-50401d89-a33e-47bf-9d14-914d8f1c4862; datanode clusterID = CID-153d6fcb-d037-4156-b63a-10d6be224091 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:472) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:744) 2014-04-17 19:30:33,081 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid unassigned) service to /192.168.106.91:9001 2014-04-17 19:30:33,184 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837) at java.lang.Thread.run(Thread.java:744) 2014-04-17 19:30:33,184 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool (Datanode Uuid unassigned) 2014-04-17 19:30:33,184 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:861) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837) at java.lang.Thread.run(Thread.java:744) 2014-04-17 19:30:35,185 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode 2014-04-17 19:30:35,187 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0 2014-04-17 19:30:35,189 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at localhost/127.0.0.1 ************************************************************/ |
15.4. Inconsistent checkpoint fields
SecondaryNameNode中的「Inconsistent checkpoint fields」錯誤緣由,多是由於沒有設置好SecondaryNameNode上core-site.xml文件中的「hadoop.tmp.dir」。
2014-04-17 11:42:18,189 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Log Size Trigger :1000000 txns 2014-04-17 11:43:18,365 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -56 namespaceID = 1384221685 cTime = 0 ; clusterId = CID-319b9698-c88d-4fe2-8cb2-c4f440f690d4 ; blockpoolId = BP-1627258458-192.168.106.91-1397735061985. Expecting respectively: -56; 476845826; 0; CID-50401d89-a33e-47bf-9d14-914d8f1c4862; BP-2131387753-192.168.106.91-1397730036484. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:135) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:518) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:383) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:349) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:345) at java.lang.Thread.run(Thread.java:744)
另外,也請配置好SecondaryNameNode上hdfs-site.xml中的「dfs.datanode.data.dir」爲合適的值: hadoop.tmp.dir /home/tuzq/software/current/tmp A base for other temporary directories. |
15.5. fs.defaultFS is file:///
在core-site.xml中,當只填寫了fs.defaultFS,而fs.default.name爲默認的file:///時,會報此錯誤。解決方法是設置成相同的值。
15.6. a shared edits dir must not be specified if HA is not enabled
該錯誤多是由於hdfs-site.xml中沒有配置dfs.nameservices或dfs.ha.namenodes.mycluster。
15.7. /tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
只需按日誌中提示的,建立好相應的目錄。
15.8. The auxService:mapreduce_shuffle does not exist
問題緣由是沒有配置yarn-site.xml中的「yarn.nodemanager.aux-services」,將它的值配置爲mapreduce_shuffle,而後重啓yarn問題即解決。記住全部yarn節點都須要修改,包括ResourceManager和NodeManager,若是NodeManager上的沒有修改,仍然會報這個錯誤。
15.9. org.apache.hadoop.ipc.Client: Retrying connect to server
該問題,有多是由於NodeManager中的yarn-site.xml和ResourceManager上的不一致,好比NodeManager沒有配置yarn.resourcemanager.ha.rm-ids。
15.10. mapreduce.Job: Running job: job_1445931397013_0001
Hadoop提交mapreduce任務時,卡在mapreduce.Job: Running job: job_1445931397013_0001處。
問題緣由多是由於yarn的NodeManager沒起來,能夠用jdk的jps確認下。
該問題也有多是由於NodeManager中的yarn-site.xml和ResourceManager上的不一致,好比NodeManager沒有配置yarn.resourcemanager.ha.rm-ids。
15.11. Could not format one or more JournalNodes
執行「./hdfs namenode -format」時報「Could not format one or more JournalNodes」。
多是hdfs-site.xml中的dfs.namenode.shared.edits.dir配置錯誤,好比重複了,如:
qjournal://hadoop-168-254:8485;hadoop-168-254:8485;hadoop-168-253:8485;hadoop-168-252:8485;hadoop-168-251:8485/mycluster
修復後,重啓JournalNode,問題可能就解決了。
15.12. org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Already in standby state
遇到這個錯誤,多是yarn-site.xml中的yarn.resourcemanager.webapp.address配置錯誤,好比配置成了兩個yarn.resourcemanager.webapp.address.rm1,實際應當是yarn.resourcemanager.webapp.address.rm1和yarn.resourcemanager.webapp.address.rm2。
15.13. No valid image files found
若是是備NameNode,執行下「hdfs namenode -bootstrapStandby」再啓動。
2015-12-01 15:24:39,535 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.FileNotFoundException: No valid image files found
at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:623)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:644)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:795)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
2015-12-01 15:24:39,536 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2015-12-01 15:24:39,539 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
15.14. xceivercount 4097 exceeds the limit of concurrent xcievers 4096
此錯誤的緣由是hdfs-site.xml中的配置項「dfs.datanode.max.xcievers」值4096太小,須要改大一點。該錯誤會致使hbase報「notservingregionexception」。
16/04/06 14:30:34 ERROR namenode.NameNode: Failed to start namenode.
15.15. java.lang.IllegalArgumentException: Unable to construct journal, qjournal://hadoop-030:8485;hadoop-031:8454;hadoop-032
執行「hdfs namenode -format」遇到上述錯誤時,是由於hdfs-site.xml中的配置dfs.namenode.shared.edits.dir配置錯誤,其中的hadoop-032省了「:8454」部分。
15.16. Bad URI 'qjournal://hadoop-030:8485;hadoop-031:8454;hadoop-032:8454': must identify journal in path component
是由於配置hdfs-site.xml中的「dfs.namenode.shared.edits.dir」時,路徑少帶了cluster名。
15.17. 16/04/06 14:48:19 INFO ipc.Client: Retrying connect to server: hadoop-032/10.143.136.211:8454. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
檢查hdfs-site.xml中的「dfs.namenode.shared.edits.dir」值,JournalNode默認端口是8485,不是8454,確認是否有寫錯。JournalNode端口由hdfs-site.xml中的配置項dfs.journalnode.rpc-address決定。
15.18. Exception in thread "main" org.apache.hadoop.HadoopIllegalArgumentException: Could not get the namenode ID of this node. You may run zkfc on the node other than namenode.
執行「hdfs zkfc -formatZK」遇到上面這個錯誤,是由於尚未執行「hdfs namenode -format」。NameNode ID是在「hdfs namenode -format」時生成的。
15.19. 2016-04-06 17:08:07,690 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory [DISK]file:/data3/datanode/data/ has already been used.
以非root用戶啓動DataNode,但啓動不了,在它的日誌文件中發現以下錯誤信息:
2016-04-06 17:08:07,707 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-418073539-10.143.136.207-1459927327462
2016-04-06 17:08:07,707 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze storage directories for block pool BP-418073539-10.143.136.207-1459927327462
java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /data3/datanode/data/current/BP-418073539-10.143.136.207-1459927327462
繼續尋找,會發現還存在如何錯誤提示:
Invalid dfs.datanode.data.dir /data3/datanode/data:
EPERM: Operation not permitted
使用命令「ls -l」檢查目錄/data3/datanode/data的權限設置,發現owner爲root,緣由是由於以前使用root啓動過DataNode,將owner改過來便可解決此問題。
15.20. 2016-04-06 18:00:26,939 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-031/10.143.136.208:8020
DataNode的日誌文件不停地記錄以下日誌,是由於DataNode將做爲主NameNode,但實際上10.143.136.208並無啓動,主NameNode不是它。這個並不表示DataNode沒有起來,而是由於DataNode會同時和主NameNode和備NameNode創建心跳,當備NameNode沒有起來時,有這些日誌是正常現象。
2016-04-06 18:00:32,940 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-031/10.143.136.208:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-04-06 17:55:44,555 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Namenode Block pool BP-418073539-10.143.136.207-1459927327462 (Datanode Uuid 2d115d45-fd48-4e86-97b1-e74a1f87e1ca) service to hadoop-030/10.143.136.207:8020 trying to claim ACTIVE state with txid=1
「trying to claim ACTIVE state」出自於hadoop/hdfs/server/datanode/BPOfferService.java中的updateActorStatesFromHeartbeat()。
2016-04-06 17:55:49,893 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-031/10.143.136.208:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
「Retrying connect to server」出自於hadoop/ipc/Client.java中的handleConnectionTimeout()和handleConnectionFailure()。
15.21. ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
若是遇到這個錯誤,請檢查NodeManager日誌,若是發現有以下所示信息:
WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=26665,containerID=container_1461657380500_0020_02_000001] is running beyond virtual memory limits. Current usage: 345.0 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
則表示須要增大yarn-site.xmk的配置項yarn.nodemanager.vmem-pmem-ratio的值,該配置項默認值爲2.1。
16/10/13 10:23:19 ERROR client.TransportClient: Failed to send RPC 7614640087981520382 to /10.143.136.231:34800: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
16/10/13 10:23:19 ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 7614640087981520382 to /10.143.136.231:34800: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:249)
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:233)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
at io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
hadoop中訪問不了8088 相關內容
hadoop中訪問不了8088 相關內容
hadoop中訪問不了8088 相關內容
hadoop中訪問不了8088 相關內容