Ganglia彙總監控搭建和配置詳解

時間 2020-07-06

標籤 ganglia 彙總監控搭建配置詳解简体版

原文原文鏈接

致

linuxidc.com和linuxso.com與其餘複製粘貼的編輯，我做爲一個開源世界的愛好者和貢獻者，本着開源的精神，並不反對大家轉載個人文章，既然寫出來就是想與你們分享和交流知識。可是，但願大家也能本着開源的精神，在轉載的時候寫明原做者和出處，請不要將版權寫上來源是Linux公社，即便是Linux的GPL協議也是有版權保護的。我相信大家不會把Linux kernel源代碼寫上來源是Linux公社，那爲什麼對於其餘內容就執行雙重標準呢？

#---------------------------------

Ganglia是加州伯克利大學千禧計劃的其中一個開源項目，以BSD協議分發。是一個集羣彙總監控用的的軟件，和不少人熟知的Cacti不一樣，cacti是詳細監控集羣中每臺服務器的運行狀態，而Ganglia是將集羣中的服務器數據進行彙總而後監控。有時經過cacti或者zabbix看不出來的集羣整體負載問題，卻可以在Ganglia中體現，其集羣的熵圖我我的認爲是個挺亮點的東西，一眼就明確集羣的負載情況。中文翻譯叫神經中樞，一目瞭然，言簡意賅。

如下內容分爲3個部分，ganglia的編譯和初始配置，web展示的部署，分組監控的配置方法。

1、Ganglia的編譯和配置

1.Ganglia基本概念

ganglia分爲服務器端和客戶端，編譯後文件名是gmetad和gmond，其中gmetad是服務器端，gmond是客戶端，服務器端只有一個，而被監控服務器均安裝客戶端。頗有意思的是，Ganglia採用Internet IPv4 類D地址中的的組播進行數據請求。我猜可能主要是爲了實現一對多節省帶寬的須要。其實現原理應該是gmetad發送一個請求到一個組播地址，因爲是組播地址，因此gmetad只需發送一次請求包便可完成對全部gmond的輪詢。（若是是單播，則Ganglia須要向每臺服務器均發送一次輪詢請求，這樣的話，集羣數量多了，主服務器光發送就會佔用不小的帶寬。而Ganglia自己是爲大規模集羣所作的HPC而生的，若是佔用很高的帶寬和佔用很大量的CPU資源去處理網絡數據就不符合其設計理念了。）而後gmond經過這個請求將採集到的數據返回給gmetad，gmetad將數據保存在rrd數據庫中，而後經過web界面繪圖展現。

2.Ganglia編譯

編譯其實有點複雜，依賴的東西比較多，主要有rrdtool，這個用過cacti應該不陌生；expat；confuse；python；apr開發包；PCRE。

ganglia編譯分爲兩種狀況，服務器端和客戶端。

我也不說那麼複雜了，直接給腳本，複製粘貼就好了。前提是你已經編譯安裝了rrdtool到/opt/rrdtool文件夾，若是是別的，自行修改腳本路徑就行了。

server端腳本

#!/bin/sh
yum install -y expat expat-devel pcre pcre-devel
wget http://mirror.bit.edu.cn/apache/apr/apr-1.4.6.tar.gz
tar zxf apr-1.4.6.tar.gz
cd apr-1.4.6
./configure;make;make install
cd ..
wget http://download.savannah.gnu.org/releases/confuse/confuse-2.7.tar.gz
tar zxf confuse-2.7.tar.gz
cd confuse-2.7
./configure CFLAGS=-fPIC --disable-nls ;make;make install
cd ..
wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.3.1/ganglia-3.3.1.tar.gz
tar zxf ganglia-3.3.1.tar.gz
cd ganglia-3.3.1
#server
./configure --prefix=/opt/modules/ganglia --with-static-modules --enable-gexec --enable-status --with-gmetad --with-python=/usr --with-librrd=/opt/rrdtool-1.4.5 --with-libexpat=/usr --with-libconfuse=/usr/local --with-libpcre=/usr/local
#client
#./configure --prefix=/opt/modules/ganglia --enable-gexec --enable-status --with-python=/usr --with-libapr=/usr/local/apr/bin/apr-1-config --with-libconfuse=/usr/local --with-libexpat=/usr --with-libpcre=/usr
make; make install
cd gmetad
cp gmetad.conf /opt/modules/ganglia/etc/
cp gmetad.init /etc/init.d/gmetad
sed -i "s/^GMETAD=\/usr\/sbin\/gmetad/GMETAD=\/opt\/modules\/ganglia\/sbin\/gmetad/g" /etc/init.d/gmetad
chkconfig --add gmetad
ip route add 239.2.11.71 dev eth1
service gmetad start

因爲我安裝在/opt/modules/ganglia下面，因此用sed替換掉啓動文件gmetad中的啓動項。路由須要加上，也就是ip route，我指向到了內網的網卡上。gmond一樣這樣作路由。

客戶端

#!/bin/sh
yum install -y expat expat-devel pcre pcre-devel
wget http://mirror.bit.edu.cn/apache/apr/apr-1.4.6.tar.gz
tar zxf apr-1.4.6.tar.gz
cd apr-1.4.6
./configure;make;make install
cd ..
wget http://download.savannah.gnu.org/releases/confuse/confuse-2.7.tar.gz
tar zxf confuse-2.7.tar.gz
cd confuse-2.7
./configure CFLAGS=-fPIC --disable-nls ;make;make install
cd ..
wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.3.1/ganglia-3.3.1.tar.gz
tar zxf ganglia-3.3.1.tar.gz
cd ganglia-3.3.1
#server
#./configure --prefix=/opt/modules/ganglia --with-static-modules --enable-gexec --enable-status --with-gmetad --with-python=/usr --with-librrd=/opt/rrdtool-1.4.5 --with-libexpat=/usr --with-libconfuse=/usr/local --with-libpcre=/usr/local
#client
./configure --prefix=/opt/modules/ganglia --enable-gexec --enable-status --with-python=/usr --with-libapr=/usr/local/apr/bin/apr-1-config --with-libconfuse=/usr/local --with-libexpat=/usr --with-libpcre=/usr
make; make install
cd gmond
./gmond -t > /opt/modules/ganglia/etc/gmond.conf
cp gmond.init /etc/init.d/gmond
sed -i "s/^GMOND=\/usr\/sbin\/gmond/GMOND=\/opt\/modules\/ganglia\/sbin\/gmond/g" /etc/init.d/gmond
chkconfig --add gmond
ip route add 239.2.11.71 dev eth1
service gmond start

客戶端是不須要rrdtool的，且客戶端的配置文件是須要用一個命令生成的。

服務器只裝一個，客戶端用腳本分發下去各自安裝就行了。

2、web展示部分的部署

就是基本的nginx,php環境，全部的php文件在ganglia源文件路徑下的web文件夾下。進去SRC目錄，把web文件夾整個複製出來，再把nginx指向到那個文件夾就能夠了。

第一次訪問web界面可能會報錯，須要修改幾個文件到正確指向上，一個是能夠看一下個人diff

[root@portal-lc-209 html]# diff conf_default.php conf_default.php.in
29c29
< $conf['gmetad_root'] = "/opt/modules/ganglia/html";
---
> $conf['gmetad_root'] = "@varstatedir@/ganglia";
46c46
< $conf['rrdtool'] = "/opt/rrdtool-1.4.5/bin/rrdtool";
---
> $conf['rrdtool'] = "/usr/bin/rrdtool";

好像還有幾個文件須要修改，可是時間過久記不清了，多是eval_conf.php和header.php，按照報錯提示修改就行了。沒有太難的東西。

3、集羣的分組部署。

網上Ganglia講安裝配置的文章不少，可是講分組配置的不多。其實這個很重要，默認配置下，Ganglia會把全部東西放在一個Grid裏面，也就是一個網格。大的集羣，不分組。可是真實的服務器集羣有各類功能，每一個羣分管不一樣的事務，全放一塊兒就太亂了。也很差識別，因此須要分組使用。

其實Ganglia的分組很簡單，就是分端口，不一樣的組配置不一樣的監聽端口就完事了。

個人gmetad.conf是這樣配置的。

gmetad

data_source "Namenode" 192.168.1.28:8653
data_source "Datanode" 192.168.1.27:8649
data_source " Portal" 192.168.1.43: 8650
data_source "Collector" 192.168.1.35:8651
data_source " DB" 192.168.1.51: 8652

gridname "Hadoop"
rrd_rootdir "/opt/modules/ganglia/html/rrds"
#配置rrd數據保存文件的路徑，給web界面用的，這個是固定的，最好放在web文件夾下，並賦予正確的權限
case_sensitive_hostnames 0

數據來源有5個，這5個分別是每一個組的組長，至關於一道槓。可是組長是不須要配置gmetad的，除非你要作多級組播收集數據。每一個組長只須要分配不一樣的端口號就能夠了。你可能會問，IP不同，端口同樣不行嗎？不行，由於這個IP是單播IP，至關於一個路由指向，而Ganglia實際的數據傳輸是在多播IP上進行的，而多播IP只有一個。在客戶端配置，若是你須要多級gmetad，能夠配多個多播IP。

客戶端配置就比較複雜一些了。我只貼上須要修改的部分，其餘都是默認就能夠了

gmond

cluster {
    name = " Portal "
#對應gmetad中的Portal，名稱必定要寫對。
    owner = "unspecified"
    latlong = "unspecified"
    url = "unspecified"
}
/* Feel free to specify as many udp_send_channels as you like.    Gmond
     used to only support having a single channel */
udp_send_channel {
    #bind_hostname = yes # Highly recommended, soon to be default.
                                             # This option tells gmond to use a source address
                                             # that resolves to the machine's hostname.    Without
                                             # this, the metrics may appear to come from any
                                             # interface and the DNS names associated with
                                             # those IPs will be used to create the RRDs.
    mcast_join = 239.2.11.71
    port = 8650
#gmetad中的Portal所分配的端口號。
    ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
    mcast_join = 239.2.11.71
    port = 8650
    bind = 239.2.11.71
}

/* You can specify as many tcp_accept_channels as you like to share
     an xml description of the state of the cluster */
tcp_accept_channel {
    port = 8650
}

紅色部分就是Portal小組的端口，從gmetad.conf中能夠看到，Portal小組屬於8650端口，那麼相應的在gmond中，也要將udp和tcp端口寫爲8650。

若是是另一個組的，就寫上在gmetad中配置的那個端口。固然，你能夠把這個端口號想像爲小組的代號。可能更好理解一些。

再加上另一個組的成員gmond就更容易理解了

cluster {
    name = " DB"
    owner = "unspecified"
    latlong = "unspecified"
    url = "unspecified"
}

/* The host section describes attributes of the host, like the location */
host {
    location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.    Gmond
     used to only support having a single channel */
udp_send_channel {
    #bind_hostname = yes # Highly recommended, soon to be default.
                                             # This option tells gmond to use a source address
                                             # that resolves to the machine's hostname.    Without
                                             # this, the metrics may appear to come from any
                                             # interface and the DNS names associated with
                                             # those IPs will be used to create the RRDs.
    mcast_join = 239.2.11.71
    port = 8652
    ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
    mcast_join = 239.2.11.71
    port = 8652
    bind = 239.2.11.71
}

/* You can specify as many tcp_accept_channels as you like to share
     an xml description of the state of the cluster */
tcp_accept_channel {
    port = 8652
}

紅色對紅色，藍色對藍色。一目瞭然。

附監控效果圖：