使用Monit替代Supervisor自動化管理和監控服務小結

前言

對於進程的監控最多見的需求就是進程掛了如何被自動拉起來,如今能夠由Kubernetes等先進的容器化技術來自動化管理,那原來再物理服務器或者虛擬機中的進程有什麼好的辦法呢?答案就是Monit/Supervisor等第三方應用來解決,由於線上環境分別使用Monit來監控Core Logical Service,Supervisor用在Codis Dashboard/FE/Proxy上,使用下來的感覺和網上的對比分析報告相似,具體內容會在文章內引用,推薦你們使用Monit替代Supervisor自動化管理和監控服務。html

使用Monit替代Supervisor自動化管理和監控服務小結

更新歷史

2020年01月15日 - 初稿python

閱讀原文 - https://wsgzao.github.io/post...mysql

擴展閱讀git

Monit
Supervisorgithub


Monit介紹

NAME

Monit - utility for monitoring services on a Unix systemsql

SYNOPSIS

monit [options] <arguments>shell

DESCRIPTION

Monit is a utility for managing and monitoring processes, programs, files, directories and filesystems on a Unix system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations. E.g. Monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses too much resources. You can use Monit to monitor files, directories and filesystems for changes, such as timestamps changes, checksum changes or size changes.apache

Monit is controlled via an easy to configure control file based on a free-format, token-oriented syntax. Monit logs to syslog or to its own log file and notifies you about error conditions via customisable alert messages. Monit can perform various TCP/IP network checks, protocol checks and can utilise SSL for such checks. Monit provides a HTTP(S) interface and you may use a browser to access the Monit program.vim

WHAT TO MONITOR?

You can use Monit to monitor daemon processes or similar programs running on localhost. Monit is particularly useful for monitoring daemon processes, such as those started at system boot time. For instance sendmail, sshd, apache and mysql. In contrast to many other monitoring systems, Monit can act if an error situation should occur, e.g.; if sendmail is not running, monit can start sendmail again automatically or if apache is using too many resources (e.g. if a DoS attack is in progress) Monit can stop or restart apache and send you an alert message. Monit can also monitor process characteristics, such as how much memory or cpu cycles a process is using.bash

You can also use Monit to monitor files, directories and filesystems on localhost. Monit can monitor these items for changes, such as timestamps changes, checksum changes or size changes. This is also useful for security reasons - you can monitor the md5 or sha1 checksum of files that should not change and get an alert or perform an action if they should change.

Monit can monitor network connections to various servers, either on localhost or on remote hosts. TCP, UDP and Unix Domain Sockets are supported. Network test can be performed on a protocol level; Monit has built-in tests for the main Internet protocols, such as HTTP, SMTP etc. Even if a protocol is not supported you can still test the server because you can configure Monit to send any data and test the response from the server.

Monit can be used to test programs or scripts at certain times, much like cron, but in addition, you can test the exit value of a program and perform an action or send an alert if the exit value indicates an error. This means that you can use Monit to perform any type of check you can write a script for.

Finally, Monit can be used to monitor general system resources on localhost such as overall CPU usage, Memory and System Load.

https://mmonit.com/monit/docu...

Supervisor介紹

Supervisor: A Process Control System

Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.

It shares some of the same goals of programs like launchd, daemontools, and runit. Unlike some of these programs, it is not meant to be run as a substitute for init as 「process id 1」. Instead it is meant to be used to control processes related to a project or a customer, and is meant to start like any other program at boot time.

http://supervisord.org/

Monit VS Supervisor

Monit是什麼

  • Monit 是一個管理和監控 Unix 系統的小型開源組件.
  • Monit 能夠在出現錯誤的狀況下, 自動維護, 修復和作一些有意義的行爲

爲何選擇Monit

除了 Monit 還有一些其餘的第三方監控方案(eg. Supervisor), 咱們考慮選擇額 Monit 做爲監控的緣由有

  • 超輕量, 穩定, 高可用
  • 依賴少, 安裝配置方便, 儘可能減小運維及學習成本(即便沒有任何 Monit 基礎的人, 都能輕易的讀懂大部分監控文件)
  • 非侵入式, 被監控的程序能夠不用知道監控程序的存在(若是使用 Supervisor 監控, 則服務必須從 Supervisor 啓動)
  • 基本功能完備(9 種類型監控, 郵件報警, 支持用戶自定義 shell 擴展)

Supervisor優缺點

優勢

  1. 輕量、特性豐富、內存友好(好客套的優勢。。。)
  2. 對被監控進程的狀態獲取迅速且精確——經過子進程管理,沒辦法不精確

缺點

  1. 被監控進程必須運行在前臺(能夠理解爲有本身的控制終端)——這也是最爲致命的一點
  2. 沒法管理依賴,也就是說沒法控制服務啓動前後順序
  3. 沒法管理被監控進程建立的子進程——重啓服務時被監控進程的子進程沒法正常退出,隱患大
  4. 沒法控制進程失敗重試的間隔時間——有些進程須要清理資源,不過這點還好

Monit優缺點

優勢

  1. 安裝配置簡單,一樣輕量(彷佛也是很客套了)
  2. 能夠監控前臺進程和非前臺進程——完美的彌補了supervisor的致命缺點
  3. 除了監控進程還能監控文件、文件系統,甚至系統資源,CPU利用率,內存使用也是能夠的
  4. 被監控的進程能夠設置依賴,控制啓動順序

缺點

  1. 沒法監控沒有pid文件的進程,如shell腳本
  2. 對進程監控的狀態感知有延時,即精度不夠——採用輪訓決定了它沒法像supervisor同樣實時感知被監控進程狀態

這樣看起來仍是monit更爲普適一點。

不過這催生了一個大膽的想法,使用supervisor管理容器內多進程,monit做爲一個被監控進程掛在supervisor之下。這樣對於沒法前臺運行的程序,就能夠經過monit監控,而對服務中斷感知強烈的則直接掛在supervisor之下。看起來彷佛是個好辦法,有機會試試,哈哈哈。

從實際容器中運行的表現看,monit常常出現各類未知異常,而supervisor表現得十分穩定。

Monit基本用法

Monit經常使用命令

# monit -h
Usage: monit [options]+ [command]
Options are as follows:
 -c file       Use this control file
 -d n          Run as a daemon once per n seconds
 -g name       Set group name for monit commands
 -l logfile    Print log information to this file
 -p pidfile    Use this lock file in daemon mode
 -s statefile  Set the file monit should write state information to
 -I            Do not run in background (needed when run from init)
 --id          Print Monit's unique ID
 --resetid     Reset Monit's unique ID. Use with caution
 -B            Batch command line mode (do not output tables or colors)
 -t            Run syntax check for the control file
 -v            Verbose mode, work noisy (diagnostic output)
 -vv           Very verbose mode, same as -v plus log stacktrace on error
 -H [filename] Print SHA1 and MD5 hashes of the file or of stdin if the
               filename is omited; monit will exit afterwards
 -V            Print version number and patchlevel
 -h            Print this text
Optional commands are as follows:
 start all             - Start all services
 start <name>          - Only start the named service
 stop all              - Stop all services
 stop <name>           - Stop the named service
 restart all           - Stop and start all services
 restart <name>        - Only restart the named service
 monitor all           - Enable monitoring of all services
 monitor <name>        - Only enable monitoring of the named service
 unmonitor all         - Disable monitoring of all services
 unmonitor <name>      - Only disable monitoring of the named service
 reload                - Reinitialize monit
 status [name]         - Print full status information for service(s)
 summary [name]        - Print short status information for service(s)
 report [up|down|..]   - Report state of services. See manual for options
 quit                  - Kill the monit daemon process
 validate              - Check all services and start if not running
 procmatch <pattern>   - Test process matching pattern

想要讓 Monit 可靠的爲咱們工做, 學習成本很是低, 只須要學習一些 Monit 命令行和配置文件寫法

# options - 選項
- monit
- monit -t
- monit -c /var/monit/monitrc  # 指定配置文件
- monit -g <groupname> start/stop # Monit 能夠對各個監控分組, 若是須要對某個分組統一操做, 能夠用這個命令

# arguments - 參數
- monit reload
- monit quit
- monit start/stop/restart/monitor/unmonitor <name>/all  # <name>: 每一個監控都有一個獨一無二的名字, 具體後面會提到; all: 全部監控服務

Monit服務監控配置文件格式

詳細配置, 共計 9 種, 全部配置中, 都符合如下規則

  • 若是指定的 path 不存在, 並且配置塊裏包含 start 方法, 會調用這個 start 方法
  • 若是 path 指定的文件類型不對, Monit 不能監控這個項目
  1. Process
CHECK PROCESS <unique name> <PIDFILE <path> | MATCHING <regex>>

<path> pid-file 的絕對路徑. 不存在 pid-file 文件或者 pid-file 文件沒有對應的正在運行的程序, Monit 會執行 start 方法

<regex> 進程名稱的正則表達來監控進程, 能夠經過命令行測試正則是否寫對了: monit procmatch "regex-pattern"
  1. File
CHECK FILE <unique name> PATH <path>

<path> file 的絕對路徑.
  1. Fifo
CHECK FIFO <unique name> PATH <path>
<path> fifo 的絕對路徑.
  1. Filesystem
CHECK FILESYSTEM <unique name> PATH <path>
<path> 設備/磁盤, 掛載點的路徑 或 NFS/CIFS/FUSE 連接字符串. 若是文件系統不可用, Monit 會執行 start 方法
  1. Directory
CHECK DIRECTORY <unique name> PATH <path>

<path> 目錄問價的絕對路徑
  1. Remote host
CHECK HOST <unique name> ADDRESS <host>

<host> 能夠是域名或者 IP 地址. eg: "tildeslash.com" or "64.87.72.95".
  1. System
CHECK SYSTEM <unique name>

<unique name> 一般來講是本機名稱(能夠用 $HOST), 也能夠是其餘名稱. 用於郵件報警或者 M/Monit 的初始化名稱
這類配置能夠監控系統資源(CPU, memory, load average...)
  1. Program
CHECK PROGRAM <unique name> PATH <executable file> [TIMEOUT <number> SECONDS]

<path> 可執行程序或腳本的絕對路徑. 容許檢查程序退出狀態.若是程序沒能在 <number> 秒內執行完成, Monit 會終結這個程序, 默認是 300s
程序的輸出會被記錄, 用於用戶界面或者報警, 默認 512 bytes(能夠經過 set limits 修改)
  1. Network
CHECK NETWORK <unique name> <ADDRESS <ipaddress> | INTERFACE <name>>

# <ipaddress> 是被監控的 IPv4/IPv6 網卡地址. 用 eth0 也是能夠的

更多配置信息能夠參考Monit官方文檔和實例

https://mmonit.com/documentat...

https://mmonit.com/wiki/Monit...

Monit配置實踐

  1. 建立templates模板,利用python生成monit配置文件
  2. 使用ansible推送到目標服務器中
# 建立通用配置,配置日誌,郵件告警
vim basic.j2

# log to monit.log
set logfile /var/log/monit.log

set daemon {{ monit_poll_interval }}

set eventqueue basedir /var/lib/monit/events slots 5000

set mailserver smtp.xxx.com port 465

set alert xxx@xxx.com { nonexist, timeout, resource }

set mail-format {
  from: xxx@xxx.com
  subject: monit alert -- $SERVICE $EVENT at $DATE
  message: $EVENT Service $SERVICE
                Date:        $DATE
                Action:      $ACTION
                Host:        $HOST
                Description: $DESCRIPTION

           Your faithful employee,
           Monit
}

# 建立標準應用監控
vim daemon_set.j2

check process xxx with pidfile /run/xxx/daemon.pid
    start program = "/usr/bin/python2  /bin/xxx restart"
    stop program = "/usr/bin/python2 /bin/xxx stop"

    if 10 restarts within 10 cycles then unmonitor

check process xxxx with matching xxxx
    start program = "/etc/init.d/xxxx start"
    stop program = "/etc/init.d/xxxx stop"

    if 10 restarts within 10 cycles then unmonitor

# 建立非標準應用監控
vim logic_service.j2

check process {{ service_name }} with pidfile {{ root_dir }}/{{ service_name }}/deploy/{{ monit_name }}.pid
    start program = "/bin/bash -c 'cd {{ root_dir }}/{{ service_name }}/deploy && ./start.sh &>start.log '"
    stop program = "/bin/bash -c 'cd {{ root_dir }}/{{ service_name }}/deploy && ./stop.sh &>stop.log '"

    if 5 restarts within 15 cycles then unmonitor
    {% if memory_usage is defined %}
    if total memory usage > {{ memory_usage }} for 10 cycles then restart
    {% endif %}

參考文章

Monit Document

Monit 筆記

Docker容器內多進程管理——supervisor VS monit

相關文章
相關標籤/搜索