Zabbix整合MegaCLI實現物理硬盤的自動發現和監控

MegaCLI是LSI提供的用戶空間管理RAID卡(LSI芯片)工具,適用於大多數的Dell服務器。html

MegaCLI介紹:python

http://zh.community.dell.com/techcenter/b/weblog/archive/2013/03/07/megacli-command-sharegit

http://blog.chinaunix.net/uid-25135004-id-3139293.htmlgithub

 

Zabbix提供low_level_discovery的機制去實現自動發現監控目標,自動添加監項的功能。Zabbix默認就基於low_level_discovery提供了文件系統掛載點和網卡的自動發現和監控。web

 

因此,物理硬盤的自動發現和監控也是基於zabbix的low_level_discovery機制,我所須要作的就是寫一個Python腳原本銜接Zabbix和MegaCLI。後面就再也不闡述原理和細節了,過程以下:json

 

1. 安裝MegaCLI

去LSI官網上下載一個最新版本的MegaCLI,注意操做系統32位仍是64位。服務器

安裝包默認是rpm的,CentOS等系統能輕鬆安裝。app

Ubuntu活debian可參考下面步驟安裝:ide

mkdir /opt/MegaCLI
cd /opt/MegaCLI
wget -c http://xxx/8.07.14_MegaCLI.zip .

unzip 8.07.14_MegaCLI.zip
cd /opt/MegaCLI/Linux
apt-get install rpm2cpio
rpm2cpio MegaCli-8.07.14-1.noarch.rpm | cpio -idmv
mv opt/MegaRAID /opt/

root@controller:~# ls -lh /opt/MegaRAID/MegaCli
total 5.7M
-rw-r--r-- 1 root root 296 Sep 24 19:10 CmdTool.log
-rwx------ 1 root root 528K Dec 16 2013 libstorelibir-2.so.14.07-0
-rwxr-xr-x 1 root root 2.4M Dec 16 2013 MegaCli
-rwsr-sr-x 1 root root 2.6M Dec 16 2013 MegaCli64
-rw-r--r-- 1 root root 139K Oct 10 17:43 MegaSAS.log

後面都默認MegaCli安裝在/opt/MegaRAID/MegaCli工具

2. 編輯raid.py

https://gist.github.com/AlexYangYu/14161ce866417f817508

/opt/DiskMonitoring/raid.py (chmod +x /opt/DiskMonitoring/raid.py)

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Description:
#   This application is used to discovery the pyhsical disk by using the MegaCLI tool.
#
# Author: Alex Yang <alex890714@gmail.com>
#


import commands
import os
import sys
import json
from optparse import OptionParser


MEGACLI_EXEC = '/opt/MegaRAID/MegaCli/MegaCli64'
LIST_DISK_OPT = '-PDList -aALL'

SLOT_NUMBER = 'Slot Number'
DEVICE_ID = 'Device Id'
WWN = 'WWN'
MEC = 'Media Error Count'
OEC = 'Other Error Count'
PFC = 'Predictive Failure Count'
PD_TYPE = 'PD Type'
RAW_SIZE = 'Raw Size'
FIRMWARE_STATE = 'Firmware state'
INQUIRY_DATA = 'Inquiry Data'


class Disk(object):
    def __init__(self, dev_id, slot_number, wwn, mec, oec, pfc, pd_type,
                 raw_size, firmware_state, inquiry_data):
        self.dev_id = dev_id
        self.slot_number = slot_number
        self.wwn = wwn
        # Media Error Count
        self.mec = mec
        # Other Error Count
        self.oec = oec
        # Predictive Failure Count
        self.pfc = pfc
        # PD Type
        self.pd_type = pd_type
        # Size
        self.raw_size = raw_size
        # Firmware State ("Failed", "Online, Spun Up", "Online, Spun Down", "Unconfigured(bad)", "Unconfigured(good), Spun down", "Hotspare, Spun down", "Hotspare, Spun up" or "not Online")
                self.firmware_state = firmware_state
        # Inquiry data
        self.inquiry_data = inquiry_data

    def jsonfiy(self):
        pass

    def __str__(self):
        return '%s %s %s %s %s %s %s %s %s %s' % (
            self.dev_id, self.slot_number, self.wwn, self.mec, self.oec,
            self.pfc, self.pd_type, self.raw_size, self.firmware_state,
            self.inquiry_data
        )


def check_megacli(cli_path):
    if not os.path.exists(cli_path) or not os.access(cli_path, os.X_OK):
        print 'MegaCLI is needed in %s with executable priviledge.' % (cli_path)
        os.exit(1)


def line_generator(string):
    line = []
    for c in string:
        if c != '\n':
            line.append(c)
        else:
            yield ''.join(line)
            line = []


def get_value(line):
    return line.split(':')[1].strip()


def make_disk_array(mega_output):
    disk_array = []
    for line in line_generator(mega_output):
        if line.startswith(SLOT_NUMBER):
            slot_number = get_value(line)
        elif line.startswith(DEVICE_ID):
            dev_id = get_value(line)
        elif line.startswith(WWN):
            wwn = get_value(line)
        elif line.startswith(MEC):
            mec = get_value(line)
        elif line.startswith(OEC):
            oec = get_value(line)
        elif line.startswith(PFC):
            pfc = get_value(line)
        elif line.startswith(PD_TYPE):
            pd_type = get_value(line)
        elif line.startswith(RAW_SIZE):
            raw_size = get_value(line)
        elif line.startswith(FIRMWARE_STATE):
            fw_state = get_value(line)
        elif line.startswith(INQUIRY_DATA):
            inquiry_data = get_value(line)

            disk = Disk(dev_id, slot_number, wwn, mec, oec, pfc, pd_type,
                        raw_size, fw_state, inquiry_data)
            disk_array.append(disk)
    return disk_array


def discovery_physical_disk(disk_array):
    array = []
    for d in disk_array:
        disk = {}
        disk['{#DISK_ID}'] = d.dev_id
        disk['{#WWN}'] = d.wwn
        array.append(disk)
    return json.dumps({'data': array}, indent=4, separators=(',',':'))


def count_media_error(disk_array, disk_id):
    for disk in disk_array:
        if int(disk.dev_id) == int(disk_id):
            return disk.mec
    return '-1'

def count_other_error(disk_array, disk_id):
    for disk in disk_array:
        if int(disk.dev_id) == int(disk_id):
            return disk.oec
    return '-1'

def count_predictive_error(disk_array, disk_id):
    for disk in disk_array:
        if int(disk.dev_id) == int(disk_id):
            return disk.pfc
    return '-1'


def get_disk_array():
    check_megacli(MEGACLI_EXEC)
    (status, output) = commands.getstatusoutput('%s %s' % (MEGACLI_EXEC, LIST_DISK_OPT))
    if status != 0:
        print 'Exec MegaCLI failed, please check the log.'
        os.exit(1)
    disk_array = make_disk_array(output)
    return disk_array


def init_option():
    usage = """
    """
    parser = OptionParser(usage=usage, version="0.1")
    return parser


parser = init_option()


if __name__ == '__main__':
    (options, args) = parser.parse_args()

    if len(args) < 1:
        print parser.print_help()
        sys.exit(1)

    disk_array = get_disk_array()

    command = args.pop(0)
    if command == 'pd_discovery':
        print discovery_physical_disk(disk_array)
    elif command == 'mec':
        print count_media_error(disk_array, args.pop())
    elif command == 'oec':
        print count_other_error(disk_array, args.pop())
    elif command == 'pfc':
        print count_predictive_error(disk_array, args.pop())
View Code

3. 配置Zabbix Agent

編輯zabbix_agentd.conf,確保以下兩個配置正確。

Include=/etc/zabbix/zabbix_agentd.conf.d/
UnsafeUserParameters=1

將zabbix用戶添加到sudoers中

echo "zabbix ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/zabbix

編輯/etc/zabbix/zabbix_agentd.conf.d/disk.conf, 添加自定義用戶參數

UserParameter=raid.phy.discovery,sudo /opt/DiskMonitoring/raid.py pd_discovery
UserParameter=raid.phy.mec[*],sudo /opt/DiskMonitoring/raid.py mec $1
UserParameter=raid.phy.oec[*],sudo /opt/DiskMonitoring/raid.py oec $1
UserParameter=raid.phy.pfc[*],sudo /opt/DiskMonitoring/raid.py pfc $1

4. 配置Zabbix Server

建立一個template,而後建立一個discovery rule,而後建立3個ITEM原型

Media Error Count的配置參考

後面只須要將模板關聯到相關機器,並在相關機器上部署監控腳本便可。報警什麼的就能夠按本身的需求去設置。 

相關文章
相關標籤/搜索