MegaCli 是LSI公司官方提供的SCSI卡管理工具,因爲LSI被收購變成了如今的Broadcom,因此如今想下載MegaCli,須要去Broadcom官網查找Legacy產品支持,搜索MegaRAID便可。算法
如今官方有storcli,整合了LSI和3ware全部產品。可是我的認爲Megacli用起來更順手,並且線上用了幾家國產廠商服務器,用Megacli都能管理好RAID,因此換不換無所謂。服務器
查看Adapter 信息:工具
./MegaCli64 -AdpAllInfo -aALL
返回結果太長不少都看不懂但不要緊,新手先記住第一行,表示個人機器上有個0號適配器。MegaCli64不少命令都要在最後用-a指定Adapter,我只有Adapter #0 因此從此都寫-a0就行,還能夠-a0,1,2或-aALLui
Adapter #0 ============================================================================== Versions ================ Product Name : PERC H710 Adapter Serial No : 31P003R FW Package Build: 21.1.0-0007 Mfg. Data ================ Mfg. Date : 01/26/13 Rework Date : 01/26/13 Revision No : A00 Battery FRU : N/A ...
查看Adapter的具體配置,這臺機器插了12塊盤,一塊作RAID0裝系統,剩下的盤作了RAID5:spa
./MegaCli64 -CfgDsply -aALL ============================================================================== Adapter: 0 Product Name: PERC H710 Adapter Memory: 512MB BBU: Present Serial No: 31P003R ============================================================================== Number of DISK GROUPS: 2 #有倆磁盤組 DISK GROUPS: 0 #0號磁盤組 Number of Spans: 1 SPAN: 0 Span Reference: 0x00 Number of PDs: 1 Number of VDs: 1 Number of dedicated Hotspares: 0 Virtual Disk Information: Virtual Disk: 0 (Target Id: 0) Name: RAID Level: Primary-0, Secondary-0, RAID Level Qualifier-0 #作了RAID0 Size:2.728 TB State: Optimal Stripe Size: 64 KB Number Of Drives:1 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Encryption Type: None Physical Disk Information: Physical Disk: 0 Enclosure Device ID: 32 Slot Number: 0 Device Id: 0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online SAS Address(0): 0x500056b37789abee Connected Port Number: 0(path0) Inquiry Data: 手動馬賽克 #這裏是序列號 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Hard Disk Device DISK GROUPS: 1 #1號磁盤組 Number of Spans: 1 SPAN: 0 Span Reference: 0x01 Number of PDs: 11 #11塊物理盤 Number of VDs: 1 #作成了1塊虛擬盤 Number of dedicated Hotspares: 0 Virtual Disk Information: Virtual Disk: 0 (Target Id: 1) Name: RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 #作了RAID5 Size:27.285 TB State: Optimal Stripe Size: 64 KB Number Of Drives:11 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Encryption Type: None Physical Disk Information: Physical Disk: 0 #第一塊物理盤 Enclosure Device ID: 32 Slot Number: 1 Device Id: 1 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online SAS Address(0): 0x500056b37789abec Connected Port Number: 0(path0) Inquiry Data: 手動馬賽克 #這裏是磁盤的序列號,跟磁盤標籤一致 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Hard Disk Device ...
查看每塊物理盤的信息和狀態,跟前面同樣,只是少了Adapter信息。code
./MegaCli64 -PDList -a0 Adapter #0 Enclosure Device ID: 32 Slot Number: 0 Device Id: 0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online SAS Address(0): 0x500056b37789abee Connected Port Number: 0(path0) Inquiry Data: 手動馬賽克 #這裏是磁盤的序列號,跟磁盤標籤一致 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 1 Device Id: 1 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online SAS Address(0): 0x500056b37789abec Connected Port Number: 0(path0) Inquiry Data: 手動馬賽克 #這裏是磁盤的序列號,跟磁盤標籤一致 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Hard Disk Device ...
這裏會拿到不少有用的信息:orm
一、Slot Number:slot號,應該跟機器外觀上的標識一致。若是機器上有多塊盤,直接告訴現場工程師slot X的硬盤有問題,工程師就會直接換盤。圖片
二、Inquiry Data: 這裏是磁盤的序列號,跟磁盤標籤上一致。磁盤標籤須要拔盤才能看到,按slot拔盤看到磁盤的序列號應該跟Inquiry Data一致。ip
三、Firmware state: 這裏能看到磁盤的狀態,Online是咱們指望看到的最好狀態,除此以外還有 Unconfigured Offline Failed等等,大多表達一個悲傷的事實:你要加班報修/修復他們了。。。get
四、須要特別關注這幾個指標:Media Error / Other Error / Predictive Failure Count / Last Predictive Failure Event Seq Number 都有可能不是0。這意味着磁盤雖然能用但已經再也不可靠,頗有可能存在壞簇、壞道之類的問題,必須儘快換掉這塊盤。若是堅持使用,那磁盤就離完全壞掉不遠了。網上流傳的說法是前3個Count越大表明磁盤狀態越差,實際並非這樣,如下2個截圖就能夠說明。
同事爲這個問題專門與服務器RAID卡磁盤廠家溝通,獲得的反饋是:查到以前的資料,Medium error、other error數值的絕對值,不能直接反應硬盤的狀態。根據與RAID卡、硬盤廠家的溝通,建議作法是監控Predictive Failure 的數值,不爲零說明硬盤有問題。另外,若是硬盤failed,也能夠直接報修。Predictive Failure Count指令:storcli /c0/eall/sall show all監控關鍵字Predictive Failure Count,標準爲不能大於0,如有計數,將對應的硬盤換掉;Predictive Failure中已經涵蓋media error,並且比media error的範圍更廣、更全面。硬盤的 SMART 子系統已經具有一套完整的算法來評估硬盤的健康情況SMART 子系統算法會參考硬盤運行時各個方面的參數,media error 是其中一項SMART 對於 media error 的評估是基於單位時間增加數來計算的當 SMART 子系統中任何一個評估項達到對應的閾值時,硬盤會報告 Sense Code: 01 5D 00 (FAILURE PREDICTION THRESHOLD EXCEEDED)遵循 SCSI 協議標準的 host (OS SCSI 子系統,SAS 控制器, RAID 卡等) 能夠正確解析出該 Sense Code綜上,因爲 media error 已經被硬盤 SMART 子系統所涵蓋,而且會依據 SCSI 協議標準上報 predictive failure,全部硬盤部分只須要在Raid卡下監控Predictive Failure就好,標準爲不能大於0。