[Linux運維 -- 硬件]smartctl的使用

[Linux運維 -- 硬件]smartctl的使用

1. 是什麼

經常使用的磁盤檢查工具,smart(Self-Monitoring,Analysis and Reporting Technology)html

2. 安裝

(1)ubuntu

$ sudo apt-get install smartmontools

(2)rhat & Centos

$ yum install smartmontools

3. 使用

(1) 看磁盤是否支持smartctl

$ sudo smartctl -i /dev/sda1 
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Constellation ES (SATA 6Gb/s)
Device Model:     ST1000NM0011
Serial Number:    Z1N0EVRZ
LU WWN Device Id: 5 000c50 03f123968
Firmware Version: SN02
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7202 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Aug 23 23:27:54 2015 CST
SMART support is: Available - device has SMART capability.          
SMART support is: Enabled

最後兩行給出了是否支持smartctllinux

(2)手動開啓支持smartctl

$ smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda1

各個參數意思以下:shell

-s VALUE, --smart=VALUE
Enable/disable SMART on device (on/off)ubuntu

-o VALUE, --offlineauto=VALUE (ATA)
Enable/disable automatic offline testing on device (on/off)centos

-S VALUE, --saveauto=VALUE (ATA)
Enable/disable Attribute autosave on device (on/off)運維

(3)檢查磁盤的健康情況

$ sudo smartctl -H /dev/sda1 
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

(4)顯示磁盤的屬性值

$ sudo smartctl -A /dev/sdl1
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   084   063   044    Pre-fail  Always       -       238687534
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       3
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       573183052
  9 Power_On_Hours          0x0032   063   063   000    Old_age   Always       -       33120
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       3
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   075   049   045    Old_age   Always       -       25 (Min/Max 20/30)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       567
194 Temperature_Celsius     0x0022   025   051   000    Old_age   Always       -       25 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a   120   099   000    Old_age   Always       -       238687534
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       2
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

基本上,SMART屬性表列出了製造商在硬盤中定義好的屬性值,以及這些屬性相關的故障閾值。這個表由驅動固件自動生成和更新。工具

  • ID: 屬性值,一般是1到255之間的十進制數字
  • ATTRIBUTE_NAME:製造商定義的屬性值
  • VALUE:這是表格中最重要的信息之一,表明給定屬性的標準化值,在1到253之間。253意味着最好狀況,1意味着最壞狀況。取決於屬性和製造商,初始化VALUE能夠被設置成100或200.
  • FLAG:屬性操做標誌
  • THRESH: 在報告硬盤FAILED狀態前,WORST能夠容許的最小值
  • TYPE: 屬性的類型(Pre-fail或Oldage)。Pre-fail類型的屬性可被當作一個關鍵屬性,表示參與磁盤的總體SMART健康評估(PASSED/FAILED)。若是任何Pre-fail類型的屬性故障,那麼可視爲磁盤將要發生故障。另外一方面,Oldage類型的屬性可被當作一個非關鍵的屬性(如正常的磁盤磨損),表示不會使磁盤自己發生故障。
  • UPDATED: 表示屬性的更新頻率。Offline表明磁盤上執行離線測試的時間。
  • WHEN_FAILED: 若是VALUE小於等於THRESH,會被設置成「FAILING_NOW」;若是WORST小於等於THRESH會被設置成「In_the_past」;若是都不是,會被設置成「-」。在「FAILING_NOW」狀況下,須要儘快備份重要文件,特別是屬性是Pre-fail類型時。「In_the_past」表明屬性已經故障了,但在運行測試的時候沒問題。「-」表明這個屬性從沒故障過。
  • RAW_VALUE: 製造商定義的原始值,從VALUE派生。

(5)測試磁盤

  • short 測試
$ sudo smartctl -t short /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Mon Aug 24 00:01:22 2015

Use smartctl -X to abort test.
  • long測試
$ sudo smartctl -t long /dev/sda
  • 看測試進度
$ sudo smartctl -l selftest /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     33120         -
  • 中止測試
$ sudo smartctl -X /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Abort SMART off-line mode self-test routine".
Self-testing aborted!

參考:

(1) http://linux.cn/article-4682-1.html
(2) http://xmodulo.com/check-hard-disk-health-linux-smartmontools.html
(3) http://chaorenyong.blog.51cto.com/2163445/1051859
(4) http://bbs.chinaunix.net/thread-4132241-1-1.htmllinux運維

相關文章
相關標籤/搜索