藤原豆腐店 關注html
0.1 2017.06.29 21:08* 字數 1472 閱讀 937評論 0喜歡 1node
文章已經放到個人 github 上git
郵件
和 shell + bearychat
除了 Monit 還有一些其餘的第三方監控方案(eg. Supervisor), 咱們考慮選擇額 Monit 做爲監控的緣由有github
想要讓 Monit 可靠的爲咱們工做, 學習成本很是低, 只須要學習一些 Monit 命令行和配置文件寫法shell
3.2.1 經常使用命令安全
# options - 選項 - monit - monit -t - monit -c /var/monit/monitrc # 指定配置文件 - monit -g <groupname> start/stop # Monit 能夠對各個監控分組, 若是須要對某個分組統一操做, 能夠用這個命令 # arguments - 參數 - monit reload - monit quit - monit start/stop/restart/monitor/unmonitor <name>/all # <name>: 每一個監控都有一個獨一無二的名字, 具體後面會提到; all: 全部監控服務
3.2.2 所有命令文檔服務器
3.3.1 概述併發
1. 配置文件 monitrc 優先級app
# Monit 查找配置文件的優先級按以下順序 # 命令行指定 command-line # 配置文件 ~/.monitrc /etc/monitrc @sysconfdir@/monitrc # 編譯安裝指定配置文件路徑 eg: ./configure --sysconfdir /var/monit/etc ./monitrc
2. 配置文件權限less
0700
(u=xrw,g=,o=) 權限, 不然 Monit 會警告並退出3. 配置文件支持字符
4. 配置文件分層
3.3.2 服務監控配置文件格式
詳細配置, 共計 9 種, 全部配置中, 都符合如下規則
1. Process
CHECK PROCESS <unique name> <PIDFILE <path> | MATCHING <regex>> <path> pid-file 的絕對路徑. 不存在 pid-file 文件或者 pid-file 文件沒有對應的正在運行的程序, Monit 會執行 start 方法 <regex> 進程名稱的正則表達來監控進程, 能夠經過命令行測試正則是否寫對了: monit procmatch "regex-pattern"
2. File
CHECK FILE <unique name> PATH <path> <path> file 的絕對路徑.
3. Fifo
CHECK FIFO <unique name> PATH <path> <path> fifo 的絕對路徑.
4. Filesystem
CHECK FILESYSTEM <unique name> PATH <path> <path> 設備/磁盤, 掛載點的路徑 或 NFS/CIFS/FUSE 連接字符串. 若是文件系統不可用, Monit 會執行 start 方法
5. Directory
CHECK DIRECTORY <unique name> PATH <path> <path> 目錄問價的絕對路徑
6. Remote host
CHECK HOST <unique name> ADDRESS <host> <host> 能夠是域名或者 IP 地址. eg: "tildeslash.com" or "64.87.72.95".
7. System
CHECK SYSTEM <unique name> <unique name> 一般來講是本機名稱(能夠用 $HOST), 也能夠是其餘名稱. 用於郵件報警或者 M/Monit 的初始化名稱 這類配置能夠監控系統資源(CPU, memory, load average...)
8. Program
CHECK PROGRAM <unique name> PATH <executable file> [TIMEOUT <number> SECONDS] <path> 可執行程序或腳本的絕對路徑. 容許檢查程序退出狀態.若是程序沒能在 <number> 秒內執行完成, Monit 會終結這個程序, 默認是 300s 程序的輸出會被記錄, 用於用戶界面或者報警, 默認 512 bytes(能夠經過 set limits 修改)
9. Network
CHECK NETWORK <unique name> <ADDRESS <ipaddress> | INTERFACE <name>> # <ipaddress> 是被監控的 IPv4/IPv6 網卡地址. 用 eth0 也是能夠的
3.3.3 全局配置
1. 設置日誌路徑:
SET LOGFILE
2. 守護進程模式:
SET DAEMON <seconds> [[WITH] START DELAY <seconds>] 第一個 <seconds> 監控週期 第二個 <seconds> 多少時間後開始監控 - 開機啓動時候比較有用 命令行: monit - 若是已經有後臺守護 Monit 進程, 發送喚醒信號給守護進程的 Monit, 馬上開始檢查 monit quit - 關閉後臺守護 Monit 進程
3. HTTPD
- 能夠經過過頁面作一些操做(查看狀態/控制監控服務) 默認: http://localhost:2812/ - 若是關閉可能影響某些功能 monit status - 強烈建議開啓 若是安全敏感性比較高, 綁定本機訪問 能夠設定只讀帳號 `read-only`
4. 報警
Monit 提供郵件報警, 若是有其餘報警方案, 能夠經過本身實現 shell 腳原本擴展功能, 好比咱們就經過 shell 腳本向第三方實時通信軟件 bearychat 發消息, 實現手機端的推送, 這裏主要描述郵件報警的相關配置
set alert foo@bar # 默認報警郵箱, 若是有多個, 能夠寫多個 set alert foo1@bar check .... noalert foo@bar
set mail-format { from: Monit Support <monit@foo.bar> reply-to: support@domain.com subject: $SERVICE $EVENT at $DATE message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION. Yours sincerely, monit }
可用變量
$EVENT $SERVICE $DATE $HOST $ACTION $DESCRIPTION
須要配置郵件服務器和隊列 SET MAILSERVER <hostname|ip-address> [PORT number] [USERNAME string] [PASSWORD string] [using SSL [with options {...}] [CERTIFICATE CHECKSUM [MD5|SHA1] <hash>], ... [with TIMEOUT X SECONDS] [using HOSTNAME hostname]
多個服務器能夠用逗號隔開, Monit 會依次嘗試, 直到找到可用的
若是沒有可用的郵件服務器, 會存在本地文件系統的隊列裏, 等待下一次嘗試(隊列內部信息是持久化的)
SET EVENTQUEUE BASEDIR <path> [SLOTS <number>]
<path> 存儲隊列的目錄
<number> 限定隊列的長度
若是一臺機器上運行多個 Monit 實例, 請確保隊列使用不一樣的文件目錄
5. 服務內 start/stop 方法
<START | STOP | RESTART> [PROGRAM] = "program" [[AS] UID <number | string>] [[AS] GID <number | string>] [[WITH] TIMEOUT <number> SECOND(S)] check process mmonit with pidfile /usr/local/mmonit/mmonit/logs/mmonit.pid start program = "/usr/local/mmonit/bin/mmonit" as uid "mmonit" and gid "mmonit" with timeout 60 seconds
默認是 30s 的嘗試 start/stop, 失敗後會放棄嘗試並打印錯誤信息, 全局設定是經過 SET LIMITS
6. 監控分組
monit -g <groupname> start/stop/restart check process mmonit with pidfile /usr/local/mmonit/mmonit/logs/mmonit.pid GROUP groupname GROUP groupname1
7. 報警模式
MODE <ACTIVE | PASSIVE>
ACTIVE: 默認, 嘗試重啓服務, 發報警
PASSIVE: 不會嘗試重啓服務, 只會發報警
8. 開機行爲
ONREBOOT <START | NOSTART | LASTSTATE>
START: Monit 老是啓動全部監控, 不論服務在重啓前是不是中止的狀態
NOSTART: 永遠不會自動啓動監控服務. 用於一些高可用場景(XXX)
這塊理解描述可能不許, 原文: In nostart mode, the service is never started automatically after reboot. This mode is intended for a high-availability solutions with active/passive clusters. For example, a service group HA, consisting of e.g. a mobile IP alias and an application server, is started on host H1, host H2 is backup and heartbeat is in place between both hosts. The service group HA must be started on one node only. If H1 dies, H2 takes over the HA group. If H1 reboots, it is important that it won't try to start the HA group also. Even though the group was active on H1 before it crashed, as HA is running on H2 now.
LASTSTATE: 保持以前狀態
9. 嘗試服務重啓重試次數
IF <number> RESTART <number> CYCLE(S) THEN <action> if 2 restarts within 3 cycles then unmonitor if 5 restarts within 5 cycles then exec "/foo/bar"
10. 依賴
DEPENDS on service[, service [,...]] WEB-SERVER(a) -> APPLICATION-SERVER(b) -> DATABASE(c) -> FILESYSTEM(d) 當前沒有服務啓動 啓動順序: d, c, b, a 當前全部服務都啓動 (monit stop all) 中止順序: a, b, c, d; (monit stop d) a, b, c 也會被中止, 由於依賴 d 若是 a 沒有啓動 Monit 會啓動 a 若是 b 沒有啓動 Monit 會中止 a, 啓動 b, 最後啓動 a 若是依賴包含循環或者不包含被依賴的服務, 會通告並退出
能夠同時依賴多個服務
在 stop/start/monitor/unmonitor 的時候會檢查 DEPENDS 語句裏的服務
若是服務 stop/unmonitor, 則會 stop/unmonitor 任何依賴這個服務的服務
若是服務 start, 全部該服務依賴的服務都會先 started, 而後在 start 這個服務
若是服務 restart, 關閉依賴該服務的全部服務, 在這個服務正常啓動後, 會啓動 restart 以前是活躍狀態的服務
11. Monit 自己相關的配置 LIMITS
SET LIMITS { PROGRAMOUTPUT: <number> <unit>, SENDEXPECTBUFFER: <number> <unit>, FILECONTENTBUFFER: <number> <unit>, HTTPCONTENTBUFFER: <number> <unit>, NETWORKTIMEOUT: <number> <timeunit> PROGRAMTIMEOUT: <number> <timeunit> STOPTIMEOUT: <number> <timeunit> STARTTIMEOUT: <number> <timeunit> RESTARTTIMEOUT: <number> <timeunit> } unit is "B" (byte), "kB" (kilobyte) or "MB" (megabyte) timeunit is "MS" (millisecond) or "S" (second)
12. ACTION
Monit 監控以後能夠作的行爲
ALERT: 發報警 RESTART: 重啓併發報警(註冊的 restart 方法, 若是沒有, 則先 stop 再 start) START: 啓動併發報警(註冊的 start 方法) STOP: 關閉服務併發報警, 關閉以後不會再被 Monit 檢查, 重啓 Monit 也不會監控這個服務, 只能從網頁或者控制檯再次開啓 (註冊的 stop 方法) EXEC: 執行指定的腳本並報警, 能夠指定用戶(須要以 root 權限啓動), 能夠設定屢次檢查週期做爲一個週期 if failed <test> then exec "/usr/local/bin/sms.sh" as uid nobody and gid nobody repeat every 5 cycles UNMONITOR: 再也不監控併發報警, 關閉以後不會再被 Monit 檢查, 重啓 Monit 也不會監控這個服務, 只能從網頁或者控制檯再次開啓
13.容許必定的監控公差
FOR <X> CYCLES ... or <Y> [TIMES WITHIN] <Z> CYCLES ... if failed port 80 for 3 cycles then alert if failed port 80 for 3 times within 5 cycles then alert
<X> X 個週期連續符合條件
<Y/Z> Z 個週期內有 Y 次符合條件
cycles 最大值爲 64
14. 一些判斷條件的語法, 寫在 CHECK 中
# process/file/directory/filesystem/fifo IF [DOES] NOT EXIST THEN <action> IF [DOES] EXIST THEN <action> # system/process IF <resource> <operator> <value> THEN <action> <resource>: "CPU", "TOTAL CPU", "CPU([user|system|wait])", "MEMORY", "SWAP", "THREADS", "CHILDREN", "TOTAL MEMORY", "LOADAVG([1min|5min|15min])" <operator>: "<", ">", "!=", "==" in C notation, "gt", "lt", "eq", "ne" in shell sh notation and "greater", "less", "equal", "notequal" in human readable form (if not specified, default is EQUAL) # process IF DISK READ [RATE] <operator> <number> <unit>/S THEN action IF DISK READ <operator> <number> operations/S THEN action IF DISK WRITE <operator> <number> <unit>/S THEN action IF DISK WRITE <operator> <number> operations/S THEN action # file IF FAILED [MD5|SHA1] CHECKSUM [EXPECT checksum] THEN action IF CHANGED [MD5|SHA1] CHECKSUM THEN action # test the content of a text file IF CONTENT <operator> <regex|path> THEN action # 默認只有 511 字符被監控, 能夠用 limit 設置該值的大小 IGNORE CONTENT <operator> <regex|path> # 被 IGNORE 命中的行不被監控 # file/fifo/directory IF TIMESTAMP [[operator] value [unit]] THEN action IF CHANGED TIMESTAMP THEN action # file IF SIZE [[operator] value [unit]] THEN action IF CHANGED SIZE THEN action # filesystem IF CHANGED FSFLAGS THEN action IF SPACE operator value unit THEN action IF SPACE FREE operator value unit THEN action # filesystem IF INODE(S) operator value [unit] THEN action IF INODE(S) FREE operator value [unit] THEN action # filesystem IF READ [RATE] <operator> <number> <unit>/S THEN action IF READ [RATE] <operator> <number> operations/S THEN action IF WRITE [RATE] <operator> <number> <unit>/S THEN action IF WRITE [RATE] <operator> <number> operations/S THEN action # Service Time is the time taken to complete a read or a write operation IF SERVICE TIME <operator> <number> <unit> THEN action # file/fifo/directory/filesystem IF FAILED PERM(ISSION) octalnumber THEN action IF CHANGED PERM(ISSION) THEN action IF FAILED [E]UID user THEN action IF FAILED GID group THEN action IF CHANGED PID THEN action IF CHANGED PPID THEN action # process/system IF UPTIME [[operator] value [unit]] THEN action # program IF STATUS operator value THEN action IF CHANGED STATUS THEN action # network IF FAILED LINK THEN action IF CHANGED LINK [CAPACITY] THEN action IF SATURATION operator value% THEN action IF UPLOAD operator value unit/S THEN action IF DOWNLOAD operator value unit/S THEN action IF TOTAL UPLOADED operator value unit IN LAST number time-unit THEN action IF TOTAL DOWNLOADED operator value unit IN LAST number time-unit THEN action IF UPLOAD operator value PACKETS/S THEN action IF DOWNLOAD operator value PACKETS/S THEN action IF TOTAL UPLOADED operator value PACKETS IN LAST number time-unit THEN action IF TOTAL DOWNLOADED operator value PACKETS IN LAST number time-unit THEN action IF FAILED PING[4|6] [COUNT number] [SIZE number] [TIMEOUT number SECONDS] [ADDRESS string] THEN action # process/host ## TCP/UDP IF FAILED [HOST string] <PORT number> [ADDRESS string] [IPV4 | IPV6] [TYPE <TCP|UDP>] [<SSL|TLS> [with options {...}] [CERTIFICATE CHECKSUM [MD5|SHA1] string] [CERTIFICATE VALID for number DAYS] [PROTOCOL protocol | <SEND|EXPECT> "string",...] [TIMEOUT number SECONDS] [RETRY number] THEN action ## SOCKET IF FAILED <UNIXSOCKET path> [TYPE <TCP|UDP>] [PROTOCOL protocol | <SEND|EXPECT> "string",...] [TIMEOUT number SECONDS] [RETRY number] THEN action ## Specific protocol test options [<SEND|EXPECT> "string"]+ PROTO(COL) HTTP [USERNAME "string"] [PASSWORD "string"] [REQUEST "string"] [STATUS operator number] [CHECKSUM checksum] [HTTP HEADERS list of headers] [CONTENT < "=" | "!=" > STRING] ## MySQL PROTOCOL MYSQL [USERNAME string PASSWORD string] ## SIP PROTOCOL SIP [TARGET valid@uri] [MAXFORWARD n] ## SMTP PROTOCOL SMTP[S] [USERNAME string PASSWORD string] PROTOCOL WEBSOCKET [REQUEST string] [HOST string] [ORIGIN string] [VERSION number]
3.3.4 include 配置
INCLUDE <globstring>
<globstring> 按照 glob(7) 規範, 若是匹配到目錄而不是文件, 會忽略
被 include 的文件也能夠 include 其餘文件
若是 <globstring> 能匹配到多個, 引入沒有特定的順序, 若是你真的須要按照某些順序, 請按順序 include 每一條記錄
3.4.1 環境變量
你本身實現 shell 擴展的時候, 能夠經過如下環境變量獲取當前 Monit 監控到的問題的詳細信息
- MONIT_SERVICE # only available for service - MONIT_DESCRIPTION - MONIT_DATE - MONIT_HOST # only available for process - MONIT_PROCESS_PID - MONIT_PROCESS_MEMORY - MONIT_PROCESS_CHILDREN - MONIT_PROCESS_CPU_PERCENT # only available for program - MONIT_PROGRAM_STATUS
3.4.2 信號量
小禮物走一走,來簡書關注我