總攬
前端
這個Hadoop插件能夠用來監控Hadoop集羣的NameNode和JobTracker。Hadoop is the leading and defacto distributed big data processing system "out there"。然而被像雅虎(聽說擁有很是龐大的Hadoop集羣),Facebook,Groupon等公司所使用的彷佛只有Ganglia和openTSDB兩種監控解決方案。當您閱讀文檔,你會發現這兩個監測解決方案是很是緊密的結合Hadoop而且對Hadoop的版本、庫等信息十分敏感。node
這個Hadoop插件的主旨是在已經運行的Hadoop集羣或者Zabbix服務中不須要安裝任何軟件而且能直接使用。這太好了是真的嗎?你爲何不繼續把下面的內容讀完......shell
安裝和配置瀏覽器
這個Hadoop插件用於從Hadoop的NameNode和JobTracker的Web UI接口截取信息。沒有必要添加或者修改任何Hadoop的配置參數或重啓你的Hadoop集羣。下載這個插件以後你僅須要花費不超過5分鐘的時間就能運行這個插件。服務器
這個插件會調用一個叫作curl的命令工具,因此須要先安裝這個命令工具。在Zabbix這邊,你能夠登陸root用戶(默認密碼爲zabbix)運行yast -i curl命令。注意雖然curl的包是很是小的,可是yast將花費幾分鐘的時間從新更新包倉庫。接下來下載Hadoop插件它包括2個shell腳本和2個模板xml文件,下載路徑是:http://mikoomi.googlecode.com/svn/plugins/。在Zabbix服務器上建立目錄/etc/zabbix/externalscripts,並將shell腳本複製到這個目錄裏面。網絡
完成上述操做後打開瀏覽器,下載NameNode和JobTracker的模板文件,下載路徑是:http://mikoomi.googlecode.com/svn/plugins/。打開一個新的瀏覽窗口或者標籤,登陸Zabbix的前端(默認的用戶名是admin,密碼是zabbix)。curl
操做以下:ide
Configuration >> Templatessvn
點擊窗口右上角的「Import Template」按鈕工具
在「Import file」對話框內, 找到並選中剛纔下載的模板文件。
上傳模板
如今你能夠開始監控你的Hadoop集羣了。使用說明以下:
監控你的Hadoop集羣
按照下面的步驟進行:
監控NameNode
登陸Zabbix的前端而後點擊導航欄上的Configuration >> Hosts
點擊右上角的「Create Host」按鈕
按照提示填寫監控選項 - Name:你選擇的名字(在Zabbix中每個監控實體被稱爲一個主機 - 可是它多是一個主機、一個服務、一個程序乃至一個集羣)。
完成後單擊「templates」選項卡里面的「Add」按鈕。
你將看到一個模板列表 - 選擇「Template_Hadoop_NameNode」
在「Macros」選項卡里面添加以下宏-
{$HADOOP_NAMENODE_HOST}
{$HADOOP_NAMENODE_METRICS_PORT}
{$ZABBIX_NAME}
{$HADOOP_NAMENODE_HOST}的值應該是NameNode節點服務器的主機名或者徹底主機名(能夠在網絡上ping通)。{$HADOOP_NAMENODE_METRICS_PORT}的值是NameNode的Web UI管理界面的端口。最後{$ZABBIX_NAME}是前面在Zabbix前端定義的NameNode的實體名稱。
一樣的,安裝監控JobTracker的步驟以下 -
監控JobTracker
登陸Zabbix的前端而後點擊導航欄上的Configuration >> Hosts
點擊右上角的「Create Host」按鈕
按照提示填寫監控選項 - Name:你選擇的名字(在Zabbix中每個監控實體被稱爲一個主機 - 可是它多是一個主機、一個服務、一個程序乃至一個集羣)。
完成後單擊「templates」選項卡里面的「Add」按鈕。
你將看到一個模板列表 - 選擇「Template_Hadoop_JobTracker」
在「Macros」選項卡里面添加以下宏-
{$HADOOP_JOBTRACKER_HOST}
{$HADOOP_JOBTRACKER_METRICS_PORT}
{$ZABBIX_NAME}
{$HADOOP_NAMENODE_HOST}的值應該是NameNode節點服務器的主機名或者徹底主機名(能夠在網絡上ping通)。{$HADOOP_NAMENODE_METRICS_PORT}的值是NameNode的Web UI管理界面的端口。最後{$ZABBIX_NAME}是前面在Zabbix前端定義的NameNode的實體名稱。
NameNode監控指標
Configured Cluster Storage
Configured Max. Heap Size (GB)
Hadoop Version
NameNode Process Heap Size (GB)
NameNode Start Time
Number of Dead Nodes
Number of Decommissioned Nodes
Number of Files and Directories in HDFS
Number of HDFS Blocks Used
Number of Live Nodes
Number of Under-Replicated Blocks
Ping Check
Storage Unit
Total % of Storage Available
Total % of Storage Used
Total Storage Available
Total Storage Used by DFS
Total Storage Used by non-DFS
Least (min) Node-level non-DFS Storage Used
Least (min) Node-level Storage Configured
Least (min) Node-level Storage Free
Least (min) Node-level Storage Free %
Least (min) Node-level Storage Used
Least (min) Node-level Storage Used %
Most (max) Node-level non-DFS Storage Used
Most (max) Node-level Storage Configured
Most (max) Node-level Storage Free
Most (max) Node-level Storage Free %
Most (max) Node-level Storage Used
Most (max) Node-level Storage Used %
Node-level Storage Unit of Measure
Node with Least (min) Node-level non-DFS Storage Used
Node with Least (min) Node-level Storage Configured
Node with Least (min) Node-level Storage Free
Node with Least (min) Node-level Storage Free %
Node with Least (min) Node-level Storage Used
Node with Least (min) Node-level Storage Used %
Node with Most (max) Node-level non-DFS Storage Used
Node with Most (max) Node-level Storage Configured
Node with Most (max) Node-level Storage Free
Node with Most (max) Node-level Storage Free %
Node with Most (max) Node-level Storage Used
Node with Most (max) Node-level Storage Used %
JobTracker監控指標
Average Task Capacity Per Node
Hadoop Version
JobTracker Start Time
JobTracker State
Map Task Capacity
Number of Blacklisted Nodes
Number of Excluded Nodes
Number of Jobs Completed
Number of Jobs Failed
Number of Jobs Retired
Number of Jobs Running
Number of Jobs Submitted
Number of Map Tasks Running
Number of Nodes in Hadoop Cluster
Number of Reduce Tasks Running
Occupied Map Slots
Occupied Reduce Slots
Reduce Task Capacity
Reserved Map Slots
Reserved Reduce Slots
Pre-canned NameNode Triggers
Less than 20% free space available on the cluster
NameNode was restarted
No monitoring data received for the last 10 minutes
One or more nodes have become alive or restarted
One or more nodes have become dead
One or more nodes have been added to the decommissioned list
One or more nodes have been removed from the decommissioned list
The number of live nodes has been reduced
The number of live nodes has increased
There has been a reduction in the number of under-replicated blocks
There has been an increase in the number of under-replicated blocks
Less than 20% free space available on one or more nodes in the cluster
Pre-canned JobTracker Triggers
No monitoring data received for the last 10 minutes
One or more jobs have failed
One or more nodes have become blacklisted
One or more nodes have been added to the exclude list
One or more nodes have been added to the Hadoop cluster
One or more nodes have been removed from the blacklisted nodes
One or more nodes have been removed from the exclude list
One or more nodes have been removed from the Hadoop cluster
The JobTracker was restarted