一 環境web
1.1 操做系統操作系統
[root@host-xxxsoft]# lsb_release -a
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.6 (Final)
Release: 6.6
Codename: Final
[root@host-xxx soft]# 日誌
1.2 zabbix 版本 agent 和server 以及webfront 都市2.4.6server
[wls81@host-xxxx sbin]$ ./zabbix_agent --version
Zabbix agent v2.4.6 (revision 54796) (10 August 2015)
Compilation time: Nov 2 2015 21:29:13
進程
1.3 目前我這邊監控了791臺虛擬機ip
二 問題虛擬機
特此說明:此問題不是zabbix web頁面 出現紅色的 zabbix server is not runningio
2.1 web 端監控
頁面顯示zabbix_server 不在運行配置
zabbixserver 還報以下錯誤
Less than 25% free in the trends cache
2.2 agent 端日誌
28079:20161012:121243.196 active check configuration update from [192.168.176.25:10051] started to fail (cannot connect to [[192.168.176.25]:10051]: [4] Interrupted system call)
28079:20161012:122102.894 active check configuration update from [192.168.176.25:10051] is working again
28079:20161012:130105.458 active check configuration update from [192.168.176.25:10051] started to fail (ZBX_TCP_READ() failed: [4] Interrupted system call)
28079:20161012:153008.930 active check configuration update from [192.168.176.25:10051] is working again
28079:20161012:160811.493 active check configuration update from [192.168.176.25:10051] started to fail (ZBX_TCP_READ() failed: [4] Interrupted system call)
28079:20161013:104855.178 active check configuration update from [192.168.176.25:10051] is working again
28079:20161013:112258.667 active check configuration update from [192.168.176.25:10051] started to fail (cannot connect to [[192.168.176.25]:10051]: [4] Interrupted system call)
而且 從agent端 telent server端 10051 不通
2.3 zabbix server
zabbix_server 進程是活的,端口10051 也是監聽的。
三解決思路
仍是看日誌
最後是定位這個配置,默認小了致使的。
### Option: TrendCacheSize # Size of trend cache, in bytes. # Shared memory size for storing trends data. # # Mandatory: no # Range: 128K-2G # Default: # TrendCacheSize=4M TrendCacheSize=400M