背景:IDC異地搬遷,存儲用貨車拉到新機房上架,不少磁盤自己就壞了或在路上被顛壞,找臺換完盤沒修復完的機器玩玩~ui
注意,如下操做盡可能在沒有IO操做的狀況下進行。code
一、查看全部磁盤的狀態,這沒啥好說的get
./MegaCli64 -PDList -a0
二、有塊盤Firmware state是Unconfigured(bad),這是今天要拯救的目標it
Enclosure Device ID: 0 Slot Number: 9 Device Id: 8 Sequence Number: 7 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 3.638 TB [0x1d1c0beb0 Sectors] Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors] Coerced Size: 3.637 TB [0x1d1b00000 Sectors] Firmware state: Unconfigured(bad) SAS Address(0): 0x5001c4500077d8a9 Connected Port Number: 0(path0) Inquiry Data: 手動馬賽克 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device
三、先讓這塊磁盤變成goodast
./MegaCli64 -PDMakeGood -PhysDrv[0:9] -a0
這裏-PhysDrv[0:9]對應上面的Enclosure Device ID和Slot Number,-a確定是Adapter #0,再看磁盤的狀態List
Enclosure Device ID: 0 Slot Number: 9 Device Id: 8 Sequence Number: 8 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 3.638 TB [0x1d1c0beb0 Sectors] Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors] Coerced Size: 3.637 TB [0x1d1b00000 Sectors] Firmware state: Unconfigured(good), Spun Up SAS Address(0): 0x5001c4500077d8a9 Connected Port Number: 0(path0) Inquiry Data: 手動馬賽克 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: Foreign Foreign Secure: Drive is not secured by a foreign lock key Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device
四、如今看看原來RAID陣列誰掉了,也就是說被換掉的壞盤原來在陣列裏的位置closure
./MegaCli64 -pdgetmissing -a0 Adapter 0 - Missing Physical drives No. Array Row Size Expected 0 1 0 3814912 MB Exit Code: 0x00
五、記住是Array 1,Row 0,下面用新盤替換這個位置數據
./MegaCli64 -PdReplaceMissing -physdrv[0:9] -array1 -row0 -a0 Adapter: 0: Missing PD at Array 1, Row 0 is replaced. Exit Code: 0x00
六、能夠看到成功了,可是RAID還不能用,咱們只是拿一塊空盤替換原來裝着數據的壞盤,要先恢復數據才行。怎麼恢復?RAID5能夠經過校驗其餘盤來恢復壞盤的數據,恢復的過程叫Rebuild。下面先把Rebuild開起來dict
./MegaCli64 -PDRbld -Start -PhysDrv[0:9] -a0 Started rebuild progress on device(Encl-0 Slot-9) Exit Code: 0x00
七、rebuild已經開始了,這個過程很是耗時間,對磁盤IO帶來很大壓力,因此儘可能不要讀寫數據。我也經歷過Rebuild 2天以後沒好,反而把其餘磁盤搞壞了的倒黴事兒。因此,有這個空去拜個佛燒柱香,成功的機率可能會大一些。怎麼知道Rebuild 進度呢?di
./MegaCli64 -pdrbld -showprog -physdrv[0:9] -a0 Rebuild Progress on Device at Enclosure 0, Slot 9 Completed 1% in 6 Minutes. Exit Code: 0x00
這表示:已經用了6分鐘,完成了1% 。。。。照這速度大概10個小時之後能完成,因此下班去拜佛燒香明天上班來看結果仍是很科學噠~