# 啓動app timeline server
在yarn的日誌中無最新日誌產生啓動error:html
```
resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT 'http://ab-01:50070/webhdfs/v1/ats/done?op=SETPERMISSION&user.name=hdfs&permission=755'' returned status_code=403.
{
"RemoteException": {
"exception": "SafeModeException",
"javaClassName": "org.apache.hadoop.hdfs.server.namenode.SafeModeException",
"message": "Cannot set permission for /ats/done. Name node is in safe mode.\nThe reported blocks 819 has reached the threshold 1.0000 of total blocks 819. The number of live datanodes 5 has reached the minimum number 0. Name node detected blocks with generation stamps in future. This means that Name node metadata is inconsistent.This can happen if Name node metadata files have been manually replaced. Exiting safe mode will cause loss of 664 byte(s). Please restart name node with right metadata or use \"hdfs dfsadmin -safemode forceExitif you are certain that the NameNode was started with thecorrect FsImage and edit logs. If you encountered this duringa rollback, it is safe to exit with -safemode forceExit."
}
}java
```node
- 查看ambari-agent日誌發現NameNode有一個alert:
```
ERROR 2017-03-27 10:11:23,325 script_alert.py:119 - [Alert][namenode_last_checkpoint] Failed with result CRITICAL: ['Last Checkpoint: [92 hours, 42 minutes, 1 transactions]']web
```
查看HDFS狀態,HDFS處於安全模式
HDFS在安全模式下,只能查看文件,不能寫文件。參考:
一、http://bbs.csdn.net/topics/390657293
二、Hadoop- The Definitive Guide, 4th Edition p317apache
HDFS安全模式相關指令:
查看安全模式狀態:
% hdfs dfsadmin -safemode get
進入安全模式:
% hdfs dfsadmin -safemode enter
退出安全模式:
% hdfs dfsadmin -safemode leave
等待退出安全模式:
% hdfs dfsadmin -safemode wait安全
執行推出安全模式指令後沒法推出安全模式,參考:
一、http://bbs.csdn.net/topics/390657293
二、http://www.360doc.com/content/11/1201/09/3294720_168811892.shtmlapp
緣由:
一、磁盤寫滿;
二、HDFS裏面的備份塊丟失過多curl
因爲HDFS中數據量沒有寫滿磁盤,判斷緣由爲HDFS中備份數據丟失致使沒法退出安全模式。
修復HDFS數據:
hsfs fsck /ide
```
Total symlinks: 0 (Files currently being written: 3)
Total blocks (validated): 819 (avg. block size 852783 B) (Total open file blocks (not validated): 2)
Minimally replicated blocks: 819 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 819 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 819 (33.333332 %)
Number of data-nodes: 5
Number of racks: 1
FSCK ended at Mon Mar 27 10:05:31 CST 2017 in 157 milliseconds
```oop
執行後沒法退出安全模式
- 嘗試經過遷移NameNode來解決HDFS的安全模式問題,遷移NameNode後,HDFS退出安全模式,可是DataNode沒法啓動,查看日誌輸出:
```
2017-03-29 10:59:45,884 WARN common.Storage (DataStorage.java:loadDataStorage(449)) - Failed to add storage directory [DISK]file:/hadoop/hdfs/data/
java.io.IOException: Incompatible clusterIDs in /hadoop/hdfs/data: namenode clusterID = CID-5a76cc80-cf88-45ce-ac13-d96fe83e8696; datanode clusterID = CID-c9db625f-6c6e-4562-9d98-5eec95a2121f
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:799)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:322)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:438)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:417)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:595)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1483)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1448)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:267)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:740)
at java.lang.Thread.run(Thread.java:745)
2017-03-29 10:59:45,887 ERROR datanode.DataNode (BPServiceActor.java:run(752)) - Initialization failed for Block pool <registering> (Datanode Uuid 316d12f9-5812-4d82-a71c-90e393c9452f) service to ab-05/192.168.1.105:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:596)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1483)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1448)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:267)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:740)
at java.lang.Thread.run(Thread.java:745)
2017-03-29 10:59:45,887 WARN datanode.DataNode (BPServiceActor.java:run(776)) - Ending block pool service for: Block pool <registering> (Datanode Uuid 316d12f9-5812-4d82-a71c-90e393c9452f) service to ab-05/192.168.1.105:8020
2017-03-29 10:59:45,991 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool <registering> (Datanode Uuid 316d12f9-5812-4d82-a71c-90e393c9452f)
2017-03-29 10:59:47,992 WARN datanode.DataNode (DataNode.java:secureMain(2637)) - Exiting Datanode
2017-03-29 10:59:47,994 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0
2017-03-29 10:59:47,997 INFO datanode.DataNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
```
- 緣由分析:datanode的clusterID 和 namenode的clusterID 不匹配。查看NameNode的VERSION對比:
```
#原節點
namespaceID=1746540282
clusterID=CID-c9db625f-6c6e-4562-9d98-5eec95a2121f
cTime=0
storageType=NAME_NODE
blockpoolID=BP-331978251-192.168.1.105-1489657175277
layoutVersion=-63
#新節點
namespaceID=1413170473
clusterID=CID-5a76cc80-cf88-45ce-ac13-d96fe83e8696
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1251469915-192.168.1.105-1490755308251
layoutVersion=-63
```
- 即對應的NameNode和DataNode實際上不屬於同一個集羣,所以啓動NameNode後沒法啓動DataNode,NameNode也沒法管理原有的DataNode。
- 處理方法:將ClusterID改成原來ClusterID,blockpoolID改成對應新節點的相應的ID,中止HDFS服務,執行```hadoop namenode format```指令從新初始化HDFS。
- 依次重啓相關service。