MegaCli入坑指南

MegaCli 是LSI公司官方提供的SCSI卡管理工具,因爲LSI被收購變成了如今的Broadcom,因此如今想下載MegaCli,須要去Broadcom官網查找Legacy產品支持,搜索MegaRAID便可。算法

如今官方有storcli,整合了LSI和3ware全部產品。可是我的認爲Megacli用起來更順手,並且線上用了幾家國產廠商服務器,用Megacli都能管理好RAID,因此換不換無所謂。服務器

查看Adapter 信息:工具

./MegaCli64 -AdpAllInfo -aALL

返回結果太長不少都看不懂但不要緊,新手先記住第一行,表示個人機器上有個0號適配器。MegaCli64不少命令都要在最後用-a指定Adapter,我只有Adapter #0 因此從此都寫-a0就行,還能夠-a0,1,2或-aALLui

Adapter #0
==============================================================================
                    Versions
                ================
Product Name    : PERC H710 Adapter
Serial No       : 31P003R
FW Package Build: 21.1.0-0007
                    Mfg. Data
                ================
Mfg. Date       : 01/26/13
Rework Date     : 01/26/13
Revision No     : A00
Battery FRU     : N/A
...

查看Adapter的具體配置,這臺機器插了12塊盤,一塊作RAID0裝系統,剩下的盤作了RAID5:spa

./MegaCli64 -CfgDsply -aALL
==============================================================================
Adapter: 0
Product Name: PERC H710 Adapter
Memory: 512MB
BBU: Present
Serial No: 31P003R
==============================================================================
Number of DISK GROUPS: 2 #有倆磁盤組
DISK GROUPS: 0 #0號磁盤組
Number of Spans: 1
SPAN: 0
Span Reference: 0x00
Number of PDs: 1
Number of VDs: 1
Number of dedicated Hotspares: 0
Virtual Disk Information:
Virtual Disk: 0 (Target Id: 0)
Name:
RAID Level: Primary-0, Secondary-0, RAID Level Qualifier-0 #作了RAID0
Size:2.728 TB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:1
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Physical Disk Information:
Physical Disk: 0
Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abee
Connected Port Number: 0(path0) 
Inquiry Data:             手動馬賽克 #這裏是序列號
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device

DISK GROUPS: 1 #1號磁盤組
Number of Spans: 1
SPAN: 0
Span Reference: 0x01
Number of PDs: 11 #11塊物理盤
Number of VDs: 1 #作成了1塊虛擬盤
Number of dedicated Hotspares: 0
Virtual Disk Information:
Virtual Disk: 0 (Target Id: 1)
Name:
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 #作了RAID5
Size:27.285 TB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:11
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Physical Disk Information:
Physical Disk: 0 #第一塊物理盤
Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abec
Connected Port Number: 0(path0) 
Inquiry Data:             手動馬賽克 #這裏是磁盤的序列號,跟磁盤標籤一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
...

查看每塊物理盤的信息和狀態,跟前面同樣,只是少了Adapter信息。code

./MegaCli64 -PDList -a0
 
Adapter #0
Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abee
Connected Port Number: 0(path0) 
Inquiry Data:             手動馬賽克 #這裏是磁盤的序列號,跟磁盤標籤一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abec
Connected Port Number: 0(path0) 
Inquiry Data:             手動馬賽克 #這裏是磁盤的序列號,跟磁盤標籤一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
...

這裏會拿到不少有用的信息:orm

一、Slot Number:slot號,應該跟機器外觀上的標識一致。若是機器上有多塊盤,直接告訴現場工程師slot X的硬盤有問題,工程師就會直接換盤。圖片

二、Inquiry Data: 這裏是磁盤的序列號,跟磁盤標籤上一致。磁盤標籤須要拔盤才能看到,按slot拔盤看到磁盤的序列號應該跟Inquiry Data一致。ip

三、Firmware state: 這裏能看到磁盤的狀態,Online是咱們指望看到的最好狀態,除此以外還有 Unconfigured Offline Failed等等,大多表達一個悲傷的事實:你要加班報修/修復他們了。。。get

四、須要特別關注這幾個指標:Media Error / Other Error / Predictive Failure Count / Last Predictive Failure Event Seq Number 都有可能不是0。這意味着磁盤雖然能用但已經再也不可靠,頗有可能存在壞簇、壞道之類的問題,必須儘快換掉這塊盤。若是堅持使用,那磁盤就離完全壞掉不遠了。網上流傳的說法是前3個Count越大表明磁盤狀態越差,實際並非這樣,如下2個截圖就能夠說明。

圖片描述
圖片描述

同事爲這個問題專門與服務器RAID卡磁盤廠家溝通,獲得的反饋是:查到以前的資料,Medium error、other error數值的絕對值,不能直接反應硬盤的狀態。根據與RAID卡、硬盤廠家的溝通,建議作法是監控Predictive Failure 的數值,不爲零說明硬盤有問題。另外,若是硬盤failed,也能夠直接報修。Predictive Failure Count指令:storcli /c0/eall/sall show all監控關鍵字Predictive Failure Count,標準爲不能大於0,如有計數,將對應的硬盤換掉;Predictive Failure中已經涵蓋media error,並且比media error的範圍更廣、更全面。硬盤的 SMART 子系統已經具有一套完整的算法來評估硬盤的健康情況SMART 子系統算法會參考硬盤運行時各個方面的參數,media error 是其中一項SMART 對於 media error 的評估是基於單位時間增加數來計算的當 SMART 子系統中任何一個評估項達到對應的閾值時,硬盤會報告 Sense Code: 01 5D 00 (FAILURE PREDICTION THRESHOLD EXCEEDED)遵循 SCSI 協議標準的 host (OS SCSI 子系統,SAS 控制器, RAID 卡等) 能夠正確解析出該 Sense Code綜上,因爲 media error 已經被硬盤 SMART 子系統所涵蓋,而且會依據 SCSI 協議標準上報 predictive failure,全部硬盤部分只須要在Raid卡下監控Predictive Failure就好,標準爲不能大於0。

相關文章
相關標籤/搜索