gmond dead but subsys locked

在作ganglia的puppet的統一管理,同步時發現服務沒法正常啓動,網絡

# puppetd --test
info: Retrieving plugin
info: Loading facts in /var/lib/puppet/lib/facter/haproxyrunning.rb
info: Loading facts in /var/lib/puppet/lib/facter/kernel_mod_ip_conntrack.rb
info: Loading facts in /var/lib/puppet/lib/facter/kernel_mod_nf_conntrack.rb
info: Caching catalog for taurus-bj20.internal.gexing.com
info: Applying configuration version '1377502166'
notice: /Stage[main]/Common/Exec[find /var/lib/puppet/clientbucket -name paths -execdir cat {} \; -execdir pwd \; -execdir date -r {} +"%F %T" \; -exec echo \; > /var/log/puppet/clientbucket.log]/returns: executed successfully
notice: /Stage[main]/Ganglia::Service/Service[gmond]/ensure: ensure changed 'stopped' to 'running'
notice: Finished catalog run in 9.40 seconds

同步是成功的,回饋的信息現實服務已經啓動起來了,但實際上卻沒有相關進程,查看狀態發現
ide

service gmond status 時顯示:
gmond dead but subsys locked

查看日誌spa

# tail -f /var/log/messages 
Aug 26 15:29:27 taurus-bj20 puppet-agent[22030]: Applying configuration version '1377502166'
Aug 26 15:29:33 taurus-bj20 puppet-agent[22030]: (/Stage[main]/Common/Exec[find /var/lib/puppet/clientbucket -name paths -execdir cat {} \; -execdir pwd \; -execdir date -r {} +"F T" \; -exec echo \; > /var/log/puppet/clientbucket.log]/returns) executed successfully
Aug 26 15:29:36 taurus-bj20 /usr/sbin/gmond[22620]: Error creating UDP server on port 8649 bind=taurus-bj20. Exiting.#012
Aug 26 15:29:36 taurus-bj20 puppet-agent[22030]: (/Stage[main]/Ganglia::Service/Service[gmond]/ensure) ensure changed 'stopped' to 'running'
Aug 26 15:29:37 taurus-bj20 puppet-agent[22030]: Finished catalog run in 9.40 seconds
Aug 26 15:30:03 taurus-bj20 /usr/sbin/gmond[22684]: Error creating UDP server on port 8649 bind=taurus-bj20. Exiting.#012
Aug 26 15:30:30 taurus-bj20 /usr/sbin/gmond[22755]: Error creating UDP server on port 8649 bind=taurus-bj20. Exiting.#012
Aug 26 15:30:31 taurus-bj20 /usr/sbin/gmond[22774]: Error creating UDP server on port 8649 bind=taurus-bj20. Exiting.#012
Aug 26 15:30:32 taurus-bj20 /usr/sbin/gmond[22793]: Error creating UDP server on port 8649 bind=taurus-bj20. Exiting.#012
Aug 26 15:30:33 taurus-bj20 /usr/sbin/gmond[22812]: Error creating UDP server on port 8649 bind=taurus-bj20. Exiting.#012

查看端口,8649並沒在用,那頗有多是本地的網絡問題,由於我定義節點是採用的puppet的facter取的hostname日誌


udp_recv_channel {
 port = 8649
 bind = taurus-bj20
}

而hosts文件server

172.16.2.19   taurus-bj19
172.16.2.20   taurus-bj19     #太粗心了,這居然寫錯了
進程

問題找到了,原來是這個主機名沒法被識別,修改主機信息後,再次同步
進程啓動起來了


總結進程啓動不起來的緣由是由於gmond在建立本地的8649端口監控時的沒法鏈接問題形成的!
ip

相關文章
相關標籤/搜索