最近,有朋友私密我,Hadoop有什麼好的監控工具,其實,Hadoop的監控工具仍是蠻多的。今天給你們分享一個老牌監控工具Ganglia,這個在企業用的也算是比較多的,Hadoop對它的兼容也很好,不過就是監控界面就不是很美觀。下次給你們介紹另外一款工具——Hue,這個界面官方稱爲Hadoop UI,界面美觀,功能也比較豐富。今天,在這裏主要給你們介紹Ganglia這款監控工具,介紹的內容主要包含以下:php
下面開始今天的內容分享。html
Ganglia是UC Berkeley發起的一個開源集羣監視項目,設計用於測量數以千計的節點。Ganglia的核心包含gmond、gmetad以及一個Web前端。主要是用來監控系統性能,如:cpu 、mem、硬盤利用率, I/O負載、網絡流量狀況等,經過曲線很容易見到每一個節點的工做狀態,對合理調整、分配系統資源,提升系統總體性能起到重要做用。前端
Ganglia其核心由3部分組成:node
下面,咱們來看看Ganglia的架構圖,以下圖所示:web
從架構圖中,咱們能夠知道Ganglia支持故障轉移,統計能夠配置多個收集節點。因此咱們在配置的時候,能夠按需選擇去配置Ganglia,既能夠配置廣播,也能夠配置單播。根據實際需求和手上資源來決定。網絡
本次安裝的Ganglia工具是基於Apache的Hadoop-2.6.0,如果未安裝Hadoop集羣,能夠參考我寫的《配置高可用的Hadoop平臺》。另外系統環境是CentOS 6.6。首先,咱們下載Ganglia軟件包,步驟以下所示:架構
[hadoop@nna ~]$ rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
[hadoop@nna ~]$ yum -y install httpd-devel automake autoconf libtool ncurses-devel libxslt groff pcre-devel pkgconfig
[hadoop@nna ~]$ yum search ganglia
而後,我爲了簡便,把Ganglia安裝所有安裝,安裝命令以下所示:app
[hadoop@nna ~]$ yum -y install ganglia*
最後等待安裝完成,因爲這裏資源有限,我將Ganglia Web也安裝在NNA節點上,另外,其餘節點也須要安裝Ganglia的Gmond服務,該服務用來發送數據到Gmetad,安裝方式參考上面的步驟。frontend
在安裝Ganglia時,我這裏將Ganglia Web部署在NNA節點,其餘節點部署Gmond服務,下表爲各個節點的部署角色:工具
節點 | Host | 角色 |
NNA | 10.211.55.26 | Gmetad、Gmond、Ganglia-Web |
NNS | 10.211.55.27 | Gmond |
DN1 | 10.211.55.16 | Gmond |
DN2 | 10.211.55.17 | Gmond |
DN3 | 10.211.55.18 | Gmond |
Ganglia部署在Hadoop集羣的分佈圖,以下所示:
在安裝好Ganglia後,咱們須要對Ganglia工具進行配置,在由Ganglia-Web服務的節點上,咱們須要配置Web服務。
ganglia.conf
[hadoop@nna ~]$ vi /etc/httpd/conf.d/ganglia.conf
修改內容以下所示:
# # Ganglia monitoring system php web frontend # Alias /ganglia /usr/share/ganglia <Location /ganglia> Order deny,allow # Deny from all Allow from all # Allow from 127.0.0.1 # Allow from ::1 # Allow from .example.com </Location>
注:紅色爲添加的內容,綠色爲註銷的內容。
[hadoop@nna ~]$ vi /etc/ganglia/gmetad.conf
修改內容以下所示:
data_source "hadoop" nna nns dn1 dn2 dn3
這裏「hadoop」表示集羣名,nna nns dn1 dn2 dn3表示節點域名或IP。
[hadoop@nna ~]$ vi /etc/ganglia/gmond.conf
修改內容以下所示:
/* * The cluster attributes specified will be used as part of the <CLUSTER> * tag that will wrap all hosts collected by this instance. */ cluster { name = "hadoop" owner = "unspecified" latlong = "unspecified" url = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { #bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. # mcast_join = 239.2.11.71 host = 10.211.55.26 port = 8649 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { # mcast_join = 239.2.11.71 port = 8649 bind = 10.211.55.26 retry_bind = true # Size of the UDP buffer. If you are handling lots of metrics you really # should bump it up to e.g. 10MB or even higher. # buffer = 10485760 }
這裏我採用的是單播,cluster下的name要與gmetad中的data_source配置的名稱一致,發送節點地址配置爲NNA的IP,接受節點配置在NNA上,因此綁定的IP是NNA節點的IP。以上配置是在有Gmetad服務和Ganglia-Web服務的節點上須要配置,在其餘節點只須要配置gmond.conf文件便可,內容配置以下所示:
/* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { #bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. # mcast_join = 239.2.11.71 host = 10.211.55.26 port = 8649 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { # mcast_join = 239.2.11.71 port = 8649 # bind = 10.211.55.26 retry_bind = true # Size of the UDP buffer. If you are handling lots of metrics you really # should bump it up to e.g. 10MB or even higher. # buffer = 10485760 }
在Hadoop中,對Ganglia的兼容是很好的,在Hadoop的目錄下/hadoop-2.6.0/etc/hadoop,咱們能夠找到hadoop-metrics2.properties文件,這裏咱們修改文件內容以下所示,命令以下所示:
[hadoop@nna hadoop]$ vi hadoop-metrics2.properties
修改內容以下所示:
namenode.sink.ganglia.servers=nna:8649 #datanode.sink.ganglia.servers=yourgangliahost_1:8649,yourgangliahost_2:8649 resourcemanager.sink.ganglia.servers=nna:8649 #nodemanager.sink.ganglia.servers=yourgangliahost_1:8649,yourgangliahost_2:8649 mrappmaster.sink.ganglia.servers=nna:8649 jobhistoryserver.sink.ganglia.servers=nna:8649
這裏修改的是NameNode節點的內容,如果修改DataNode節點信息,內容以下所示:
#namenode.sink.ganglia.servers=nna:8649 datanode.sink.ganglia.servers=dn1:8649 #resourcemanager.sink.ganglia.servers=nna:8649 nodemanager.sink.ganglia.servers=dn1:8649 #mrappmaster.sink.ganglia.servers=nna:8649 #jobhistoryserver.sink.ganglia.servers=nna:8649
其餘DN節點能夠以此做爲參考來進行修改。
另外,在配置完成後,若以前Hadoop集羣是運行的,這裏須要重啓集羣服務。
Ganglia的啓動命令有start、restart以及stop,這裏咱們分別在各個節點啓動相應的服務,各個節點須要啓動的服務以下:
[hadoop@nna ~]$ service gmetad start
[hadoop@nna ~]$ service gmond start
[hadoop@nna ~]$ service httpd start
[hadoop@nns ~]$ service gmond start
[hadoop@dn1 ~]$ service gmond start
[hadoop@dn2 ~]$ service gmond start
[hadoop@dn3 ~]$ service gmond start
而後,到這裏Ganglia的相關服務就啓動完畢了,下面給你們附上Ganglia監控的運行截圖,以下所示:
在安裝Hadoop監控工具Ganglia時,須要在安裝的時候注意一些問題,好比:系統環境的依賴,因爲Ganglia須要依賴一些安裝包,在安裝以前把依賴環境準備好,另外在配置Ganglia的時候須要格外注意,理解Ganglia的架構很重要,這有助於咱們在Hadoop集羣上去部署相關的Ganglia服務,同時,在配置Hadoop安裝包的配置文件下(/etc/hadoop)目錄下,配置Ganglia配置文件。將hadoop-metrics2.properties配置文件集成到Hadoop集羣中去。
這篇博客就和你們分享到這裏,若是你們在研究學習的過程中有什麼問題,能夠加羣進行討論或發送郵件給我,我會盡我所能爲您解答,與君共勉!