zabbix報警邏輯初探

zabbix報警邏輯初探

首先貼出一張網上找的一張關於zabbix報警相關表結構及表關聯邏輯圖:
前端

actions表

actions表對應前端配置是動做(actions)
數據庫

action由condition(條件)和operations(操做)組成。當知足指定的條件,而後執行操做。發送內容在動做裏配置(好比默認狀況下沒有加上報警產生時間,能夠人爲加上去)。網絡

MariaDB [rtm]> desc actions;
+---------------+---------------------+------+-----+---------+-------+
| Field         | Type                | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
| actionid      | bigint(20) unsigned | NO   | PRI | NULL    |       |
| name          | varchar(255)        | NO   | UNI |         |       |
| eventsource   | int(11)             | NO   | MUL | 0       |       |
| evaltype      | int(11)             | NO   |     | 0       |       |
| status        | int(11)             | NO   |     | 0       |       |
| esc_period    | int(11)             | NO   |     | 0       |       |
| def_shortdata | varchar(255)        | NO   |     |         |       |
| def_longdata  | text                | NO   |     | NULL    |       |
| recovery_msg  | int(11)             | NO   |     | 0       |       |
| r_shortdata   | varchar(255)        | NO   |     |         |       |
| r_longdata    | text                | NO   |     | NULL    |       |
| formula       | varchar(255)        | NO   |     |         |       |
+---------------+---------------------+------+-----+---------+-------+

actionid:   action的id
name:   action的名稱
eventsource:    對應events表的source
evaltype:   conditions裏計算方式的id(0:and/or, 1:and, 2:or, 3:自定義表達式)
status:     啓用狀態(0啓用, 1禁用)
esc_period:     步驟操做持續時間
def_shortdata:      subject
def_longdata:       message內容
recovery_msg:       恢復消息啓用狀態,1-啓用
r_shortdata:    恢復subject
r_longdata      恢復message
fromula:        conditions裏自定義的表達式

zabbix事件

zabbix事件一共有三種,分別爲:觸發器事件、發現事件、內部事件、自動註冊事件。3d

  • zabbix內部事件日誌

    • 監控項item狀態從normal變爲unsupported,或者從unsupported變爲normal
    • low-level發現規則狀態從normal變爲unsupported,或者從unsupported變爲normal
    • 觸發器狀態從normal變爲unknown,或者從unknown變爲normal
  • zabbix發現事件code

    • 配置網絡發現規則以後,zabbix會按期按照這個規則去掃描IP段,一旦發現主機和服務,便生成一個事件
    • zabbix自動發現
  • zabbix觸發事件orm

    觸發器狀態發生變化生成一個包含詳細狀態信息的觸發器事件server

  • zabbix自動註冊事件blog

    active agent主動與server通訊,zabbix server使用agent通訊的ip地址與端口來添加主機,並生成一個自動註冊事件隊列

events表

MariaDB [rtm]> select * from events where source=0;
+---------+--------+--------+----------+------------+-------+--------------+-----------+--------------------------------------------+
| eventid | source | object | objectid | clock      | value | acknowledged | ns        | description                                |
+---------+--------+--------+----------+------------+-------+--------------+-----------+--------------------------------------------+
|     317 |      0 |      0 |    13075 | 1548827260 |     0 |            0 | 399008160 | 99.9512<5                                  |
|     318 |      0 |      0 |    13467 | 1548827312 |     0 |            0 | 696464358 | (0=0 and 0.1854>75) or (0=1 and 0.1854>65) |
|     308 |      0 |      0 |    13468 | 1548827253 |     0 |            1 | 367035016 | (0=0 and 0>75) or (0=1 and 0>65)           |
|     309 |      0 |      0 |    13469 | 1548827254 |     0 |            0 | 352296205 | (0=0 and 0>75) or (0=1 and 0>65)           |
|     310 |      0 |      0 |    13470 | 1548827255 |     0 |            0 | 363172506 | (0=0 and 0>75) or (0=1 and 0>65)           |
|     311 |      0 |      0 |    13471 | 1548827256 |     0 |            0 | 375124809 | (0=0 and 0.0169>75) or (0=1 and 0.0169>65) |
|     319 |      0 |      0 |    13472 | 1548827257 |     0 |            0 | 373863748 | (0=0 and 2.5554>75) or (0=1 and 2.5554>65) |
|     320 |      0 |      0 |    13473 | 1548827258 |     0 |            0 | 381757318 | (0=0 and 0.0846>75) or (0=1 and 0.0846>65) |
|     321 |      0 |      0 |    13474 | 1548827259 |     0 |            0 | 388674314 | (0=0 and 0.2199>75) or (0=1 and 0.2199>65) |
|     322 |      0 |      0 |    13475 | 1548827260 |     0 |            0 | 398635590 | (0=0 and 0>75) or (0=1 and 0>65)           |
|     323 |      0 |      0 |    13479 | 1548827264 |     0 |            0 | 425321837 | (0=0 and 3.1495>75) or (0=1 and 3.1495>65) |
|     324 |      0 |      0 |    13480 | 1548827265 |     0 |            0 | 429536321 | (0=0 and 0>75) or (0=1 and 0>65)           |
|     325 |      0 |      0 |    13481 | 1548827266 |     0 |            0 | 439574519 | (0=0 and 0>75) or (0=1 and 0>65)           |
|     326 |      0 |      0 |    13482 | 1548827267 |     0 |            0 | 441541684 | (0=0 and 0>75) or (0=1 and 0>65)           |
|     327 |      0 |      0 |    13483 | 1548827268 |     0 |            0 | 448121449 | (0=0 and 0>75) or (0=1 and 0>65)           |
|     328 |      0 |      0 |    13484 | 1548827269 |     0 |            1 | 460702185 | (0=0 and 0.0406>75) or (0=1 and 0.0406>65) |


action裏可根據事件源(四個事件類型)建立不一樣的動做。和這裏的source是對應起來的。
objectid對應的是triggers表裏的triggerid
value等於0就是OK,等於1就是PROBLEM
acknowledged=0就是未確認,等於1就是已確認

source=0的就是觸發器事件
source=1的就是自動發現事件
source=2的就是自動註冊事件
source=3的就是內部事件

zabbix報警媒介自定義

zabbix媒介類型包括mail、sms、自定義腳本。

media_type表

MariaDB [rtm]> select * from media_type\G;
*************************** 1. row ***************************
        mediatypeid: 1
               type: 0
        description: Email
        smtp_server: mail.company.com
          smtp_helo: company.com
         smtp_email: rtm@company.com
          exec_path: 
          gsm_modem: 
           username: 
             passwd: 
             status: 0
          smtp_port: 25
      smtp_security: 0
   smtp_verify_peer: 0
   smtp_verify_host: 0
smtp_authentication: 0
        exec_params: 
*************************** 2. row ***************************
        mediatypeid: 2
               type: 3
        description: Jabber
        smtp_server: 
          smtp_helo: 
         smtp_email: 
          exec_path: 
          gsm_modem: 
           username: jabber@company.com
             passwd: rtm
             status: 0
          smtp_port: 25
      smtp_security: 0
   smtp_verify_peer: 0
   smtp_verify_host: 0
smtp_authentication: 0
        exec_params: 
*************************** 3. row ***************************
        mediatypeid: 3
               type: 2
        description: SMS
        smtp_server: 
          smtp_helo: 
         smtp_email: 
          exec_path: 
          gsm_modem: /dev/ttyS0
           username: 
             passwd: 
             status: 0
          smtp_port: 25
      smtp_security: 0
   smtp_verify_peer: 0
   smtp_verify_host: 0
smtp_authentication: 0
        exec_params: 
*************************** 4. row ***************************
        mediatypeid: 4
               type: 1
        description: 智能告警
        smtp_server: 
          smtp_helo: 
         smtp_email: 
          exec_path: sr_event/sr_event_client/sr_event_client.py
          gsm_modem: 
           username: 
             passwd: 
             status: 0
          smtp_port: 25
      smtp_security: 0
   smtp_verify_peer: 0
   smtp_verify_host: 0
smtp_authentication: 0
        exec_params: {ALERT.SUBJECT}\n
4 rows in set (0.00 sec)

media表

MariaDB [rtm]> desc media;
+-------------+---------------------+------+-----+-----------------+-------+
| Field       | Type                | Null | Key | Default         | Extra |
+-------------+---------------------+------+-----+-----------------+-------+
| mediaid     | bigint(20) unsigned | NO   | PRI | NULL            |       |
| userid      | bigint(20) unsigned | NO   | MUL | NULL            |       |
| mediatypeid | bigint(20) unsigned | NO   | MUL | NULL            |       |
| sendto      | varchar(100)        | NO   |     |                 |       |
| active      | int(11)             | NO   |     | 0               |       |
| severity    | int(11)             | NO   |     | 63              |       |
| period      | varchar(100)        | NO   |     | 1-7,00:00-24:00 |       |
+-------------+---------------------+------+-----+-----------------+-------+

media表數據來自用戶配置的報警媒介。

alerts表

MariaDB [rtm]> show create table alerts\G;
*************************** 1. row ***************************
       Table: alerts
Create Table: CREATE TABLE `alerts` (
  `alertid` bigint(20) unsigned NOT NULL,
  `actionid` bigint(20) unsigned NOT NULL,
  `eventid` bigint(20) unsigned NOT NULL,
  `userid` bigint(20) unsigned DEFAULT NULL,
  `clock` int(11) NOT NULL DEFAULT '0',
  `mediatypeid` bigint(20) unsigned DEFAULT NULL,
  `sendto` varchar(100) COLLATE utf8_bin NOT NULL DEFAULT '',
  `subject` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
  `message` text COLLATE utf8_bin NOT NULL,
  `status` int(11) NOT NULL DEFAULT '0',
  `retries` int(11) NOT NULL DEFAULT '0',
  `error` varchar(128) COLLATE utf8_bin NOT NULL DEFAULT '',
  `esc_step` int(11) NOT NULL DEFAULT '0',
  `alerttype` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`alertid`),
  KEY `alerts_1` (`actionid`),
  KEY `alerts_2` (`clock`),
  KEY `alerts_3` (`eventid`),
  KEY `alerts_4` (`status`,`retries`),
  KEY `alerts_5` (`mediatypeid`),
  KEY `alerts_6` (`userid`),
  CONSTRAINT `c_alerts_1` FOREIGN KEY (`actionid`) REFERENCES `actions` (`actionid`) ON DELETE CASCADE,
  CONSTRAINT `c_alerts_2` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE,
  CONSTRAINT `c_alerts_3` FOREIGN KEY (`userid`) REFERENCES `users` (`userid`) ON DELETE CASCADE,
  CONSTRAINT `c_alerts_4` FOREIGN KEY (`mediatypeid`) REFERENCES `media_type` (`mediatypeid`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
1 row in set (0.00 sec)

列出這個表結構是由於遇到過zabbix郵件隊列阻塞,解決辦法網上不多有資料,所以本身去後臺查看了表信息。網上說有人經過刪數據庫數據解決。並且zabbix界面的動做日誌數據都來自這張表。

刪數據解決我沒嘗試過,我經過引流方式將原來阻塞的郵件方式轉換爲mediatypeid爲自定義腳本操做將阻塞隊列排空。

不過下來研究過表結構和事件邏輯,我以爲能夠這樣嘗試解決:

  • 備份events和alerts表
  • 查找出阻塞這段時間的events,根據這個eventsid去刪除events相關記錄和alerts相關記錄。

還有一種方法就是在zabbix的general下的管家下面設置事件和報警相關數據存儲時間爲一天,這樣其實也是經過刪除數據庫數據解決,不過是zabbix管家主動幫咱們作了。

時間倉促,後面再補充。。。。。。

相關文章
相關標籤/搜索