命令介紹node
命令hadoop fs –safemode get 查看安全模式狀態
命令hadoop fs –safemode enter 進入安全模式狀態
命令hadoop fs –safemode leave 離開安全模式狀態瀏覽器
用Hadoop fsck查看破壞丟失的文件位置安全
hadoop fsck Usage: DFSck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]] <path> 檢查這個目錄中的文件是否完整 -move 破損的文件移至/lost+found目錄 -delete 刪除破損的文件 -openforwrite 打印正在打開寫操做的文件 -files 打印正在check的文件名 -blocks 打印block報告 (須要和-files參數一塊兒使用) -locations 打印每一個block的位置信息(須要和-files參數一塊兒使用) -racks 打印位置信息的網絡拓撲圖 (須要和-files參數一塊兒使用)
[root@node03 export]# hadoop fsck / .................................................................................................... .............Status: CORRUPT #Hadoop狀態:不正常 Total size: 273821489 B Total dirs: 403 Total files: 213 Total symlinks: 0 Total blocks (validated): 201 (avg. block size 1362295 B) ******************************** UNDER MIN REPL'D BLOCKS: 2 (0.99502486 %) dfs.namenode.replication.min: 1 CORRUPT FILES: 2 #損壞了兩個文件 MISSING BLOCKS: 2 #丟失了兩個塊 MISSING SIZE: 6174 B CORRUPT BLOCKS: 2 ******************************** Minimally replicated blocks: 199 (99.004974 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.8208954 Corrupt blocks: 2 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Fri Aug 23 10:43:11 CST 2019 in 12 milliseconds
看到這些表明hadoop集羣不正常,有文件丟失:bash
.............Status: CORRUPT #Hadoop狀態:不正常網絡
CORRUPT FILES: 2 #損壞了兩個文件
MISSING BLOCKS: 2 #丟失了兩個塊oop
內容太多,截取了一部分信息debug
hadoop fsck / -files -blocks -locations -racks >/export/missingFile.txt 將檢查到的內容打印到/export/missingFile.txt文件中3d
[root@node03 export]# hadoop fsck / -files -blocks -locations -racks >/export/missingFile.txt /flink-checkpoint/11748bc079799f330078967fbf018a48/chk-74/_metadata 452 bytes, 1 block(s): OK 0. BP-2135962035-192.168.52.100-1562110398602:blk_1073742825_2005 len=452 Live_repl=1 [/default-rack/192.168.52.110:50010] /flink-checkpoint/11748bc079799f330078967fbf018a48/shared <dir> /flink-checkpoint/11748bc079799f330078967fbf018a48/taskowned <dir> /flink-checkpoint/42d81db182771fe71932120fa8933612 <dir> /flink-checkpoint/42d81db182771fe71932120fa8933612/chk-950 <dir> /flink-checkpoint/42d81db182771fe71932120fa8933612/chk-950/_metadata 337 bytes, 1 block(s): OK 0. BP-2135962035-192.168.52.100-1562110398602:blk_1073745657_4837 len=337 Live_repl=1 [/default-rack/192.168.52.120:50010] /flink-checkpoint/42d81db182771fe71932120fa8933612/chk-950/f59c63a0-a35d-4d4b-8e73-72c2aa1dd383 5657 bytes, 1 block(s): OK 0. BP-2135962035-192.168.52.100-1562110398602:blk_1073745656_4836 len=5657 Live_repl=1 [/default-rack/192.168.52.100:50010] /flink-checkpoint/42d81db182771fe71932120fa8933612/shared <dir> /flink-checkpoint/42d81db182771fe71932120fa8933612/taskowned <dir> /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01 <dir> /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01/chk-9 <dir> /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01/chk-9/_metadata 451 bytes, 1 block(s): OK 0. BP-2135962035-192.168.52.100-1562110398602:blk_1073742843_2023 len=451 Live_repl=1 [/default-rack/192.168.52.100:50010] /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01/chk-9/c58c8c49-8782-41b4-a3df-2fa7ff1d1eba 5663 bytes, 1 block(s): OK 0. BP-2135962035-192.168.52.100-1562110398602:blk_1073742842_2022 len=5663 Live_repl=1 [/default-rack/192.168.52.120:50010] /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01/shared <dir> /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01/taskowned <dir> /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995 <dir> /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175 <dir> /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080 5663 bytes, 1 block(s): /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080: CORRUPT blockpool BP-2135962035-192.168.52.100-1562110398602 block blk_1073743749 MISSING 1 blocks of total size 5663 B 0. BP-2135962035-192.168.52.100-1562110398602:blk_1073743749_2929 len=5663 MISSING! /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata 511 bytes, 1 block(s): /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata: CORRUPT blockpool BP-2135962035-192.168.52.100-1562110398602 block blk_1073743750 MISSING 1 blocks of total size 511 B 0. BP-2135962035-192.168.52.100-1562110398602:blk_1073743750_2930 len=511 MISSING!
能夠看到正常文件後面都有ok字樣,有MISSING!字樣的就是丟失的文件。code
/flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080: CORRUPT blockpool BP-2135962035-192.168.52.100-1562110398602 block blk_1073743749
MISSING 1 blocks of total size 5663 Bserver
/flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata: CORRUPT blockpool BP-2135962035-192.168.52.100-1562110398602 block blk_1073743750
MISSING 1 blocks of total size 511 B
根據這個的路勁能夠在hadoop瀏覽器界面中找到對應的文件路徑,以下圖:
[root@node03 conf]# hdfs debug recoverLease -path /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080 -retries 10
[root@node03 conf]# hdfs debug recoverLease -path /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata -retries 10
[root@node03 conf]# hdfs debug recoverLease -path /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080 -retries 10 recoverLease SUCCEEDED on /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080 [root@node03 conf]# hdfs debug recoverLease -path /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata -retries 10 recoverLease SUCCEEDED on /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata [root@node03 conf]#
能夠看到:
...........Status: HEALTHY Total size: 273815315 B Total dirs: 403 Total files: 211 Total symlinks: 0 Total blocks (validated): 199 (avg. block size 1375956 B) Minimally replicated blocks: 199 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.8492463 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Fri Aug 23 11:15:01 CST 2019 in 11 milliseconds
...........Status: HEALTHY 集羣狀態:健康
如今從新啓動hadoop就不會一直處於安全模式了,hiveserver2也能正常啓動了。。
.............Status: CORRUPT #Hadoop狀態:不正常 Total size: 273821489 B Total dirs: 403 Total files: 213 Total symlinks: 0 Total blocks (validated): 201 (avg. block size 1362295 B) ******************************** UNDER MIN REPL'D BLOCKS: 2 (0.99502486 %) dfs.namenode.replication.min: 1 CORRUPT FILES: 2 #損壞了兩個文件 MISSING BLOCKS: 2 #丟失了兩個塊 MISSING SIZE: 6174 B CORRUPT BLOCKS: 2 ******************************** Minimally replicated blocks: 199 (99.004974 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.8208954 Corrupt blocks: 2 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Fri Aug 23 10:43:11 CST 2019 in 12 milliseconds
首先:將找到的損壞文件備份好
而後:執行[root@node03 export]# hadoop fsck / -delete將損壞文件刪除
[root@node03 export]# hadoop fsck / -delete
此命令一次不成功能夠多試幾回,前提是丟失、損壞的文件不重要!!!!!!!!!!
能夠先執行此命令:hadoop fs –safemode leave 強制離開安全模式狀態
[root@node03 export]# hadoop fs –safemode leave
此操做不能徹底解決問題,只能暫時讓集羣可以工做!!!!
並且,之後每次啓動hadoop集羣都要執行此命令,直到問題完全解決。