MegaCli修復RAID5

背景:IDC異地搬遷,存儲用貨車拉到新機房上架,不少磁盤自己就壞了或在路上被顛壞,找臺換完盤沒修復完的機器玩玩~ui

注意,如下操做盡可能在沒有IO操做的狀況下進行。code

一、查看全部磁盤的狀態,這沒啥好說的get

./MegaCli64 -PDList -a0

二、有塊盤Firmware state是Unconfigured(bad),這是今天要拯救的目標it

Enclosure Device ID: 0
Slot Number: 9
Device Id: 8
Sequence Number: 7
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Firmware state: Unconfigured(bad)
SAS Address(0): 0x5001c4500077d8a9
Connected Port Number: 0(path0) 
Inquiry Data:             手動馬賽克
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: Unknown 
Link Speed: Unknown 
Media Type: Hard Disk Device

三、先讓這塊磁盤變成goodast

./MegaCli64 -PDMakeGood -PhysDrv[0:9] -a0

這裏-PhysDrv[0:9]對應上面的Enclosure Device ID和Slot Number,-a確定是Adapter #0,再看磁盤的狀態List

Enclosure Device ID: 0
Slot Number: 9
Device Id: 8
Sequence Number: 8
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Firmware state: Unconfigured(good), Spun Up
SAS Address(0): 0x5001c4500077d8a9
Connected Port Number: 0(path0) 
Inquiry Data:             手動馬賽克
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: Foreign 
Foreign Secure: Drive is not secured by a foreign lock key
Device Speed: Unknown 
Link Speed: Unknown 
Media Type: Hard Disk Device

四、如今看看原來RAID陣列誰掉了,也就是說被換掉的壞盤原來在陣列裏的位置closure

./MegaCli64 -pdgetmissing -a0

    Adapter 0 - Missing Physical drives

    No.   Array   Row   Size Expected
    0     1       0     3814912 MB

Exit Code: 0x00

五、記住是Array 1,Row 0,下面用新盤替換這個位置數據

./MegaCli64 -PdReplaceMissing -physdrv[0:9] -array1 -row0 -a0   

Adapter: 0: Missing PD at Array 1, Row 0 is replaced.

Exit Code: 0x00

六、能夠看到成功了,可是RAID還不能用,咱們只是拿一塊空盤替換原來裝着數據的壞盤,要先恢復數據才行。怎麼恢復?RAID5能夠經過校驗其餘盤來恢復壞盤的數據,恢復的過程叫Rebuild。下面先把Rebuild開起來dict

./MegaCli64 -PDRbld -Start -PhysDrv[0:9] -a0    

Started rebuild progress on device(Encl-0 Slot-9)

Exit Code: 0x00

七、rebuild已經開始了,這個過程很是耗時間,對磁盤IO帶來很大壓力,因此儘可能不要讀寫數據。我也經歷過Rebuild 2天以後沒好,反而把其餘磁盤搞壞了的倒黴事兒。因此,有這個空去拜個佛燒柱香,成功的機率可能會大一些。怎麼知道Rebuild 進度呢?di

./MegaCli64 -pdrbld -showprog -physdrv[0:9] -a0   

Rebuild Progress on Device at Enclosure 0, Slot 9 Completed 1% in 6 Minutes.

Exit Code: 0x00

這表示:已經用了6分鐘,完成了1% 。。。。照這速度大概10個小時之後能完成,因此下班去拜佛燒香明天上班來看結果仍是很科學噠~

相關文章
相關標籤/搜索