mikoomi -Zabbix監控Apache Hadoop插件/模板 (翻譯)

總攬
前端


   這個Hadoop插件能夠用來監控Hadoop集羣的NameNode和JobTracker。Hadoop is the leading and defacto distributed big data processing system "out there"。然而被像雅虎(聽說擁有很是龐大的Hadoop集羣),Facebook,Groupon等公司所使用的彷佛只有Ganglia和openTSDB兩種監控解決方案。當您閱讀文檔,你會發現這兩個監測解決方案是很是緊密的結合Hadoop而且對Hadoop的版本、庫等信息十分敏感。node

   這個Hadoop插件的主旨是在已經運行的Hadoop集羣或者Zabbix服務中不須要安裝任何軟件而且能直接使用。這太好了是真的嗎?你爲何不繼續把下面的內容讀完......shell


安裝和配置瀏覽器


   這個Hadoop插件用於從Hadoop的NameNode和JobTracker的Web UI接口截取信息。沒有必要添加或者修改任何Hadoop的配置參數或重啓你的Hadoop集羣。下載這個插件以後你僅須要花費不超過5分鐘的時間就能運行這個插件。服務器

   這個插件會調用一個叫作curl的命令工具,因此須要先安裝這個命令工具。在Zabbix這邊,你能夠登陸root用戶(默認密碼爲zabbix)運行yast -i curl命令。注意雖然curl的包是很是小的,可是yast將花費幾分鐘的時間從新更新包倉庫。接下來下載Hadoop插件它包括2個shell腳本和2個模板xml文件,下載路徑是:http://mikoomi.googlecode.com/svn/plugins/。在Zabbix服務器上建立目錄/etc/zabbix/externalscripts,並將shell腳本複製到這個目錄裏面。網絡

   完成上述操做後打開瀏覽器,下載NameNode和JobTracker的模板文件,下載路徑是:http://mikoomi.googlecode.com/svn/plugins/。打開一個新的瀏覽窗口或者標籤,登陸Zabbix的前端(默認的用戶名是admin,密碼是zabbix)。curl

   操做以下:ide

   Configuration >> Templatessvn

   點擊窗口右上角的「Import Template」按鈕工具

   在「Import file」對話框內, 找到並選中剛纔下載的模板文件。

   上傳模板

   如今你能夠開始監控你的Hadoop集羣了。使用說明以下:


監控你的Hadoop集羣

   按照下面的步驟進行:

   監控NameNode

   登陸Zabbix的前端而後點擊導航欄上的Configuration >> Hosts

   點擊右上角的「Create Host」按鈕

   按照提示填寫監控選項 - Name:你選擇的名字(在Zabbix中每個監控實體被稱爲一個主機 - 可是它多是一個主機、一個服務、一個程序乃至一個集羣)。

   完成後單擊「templates」選項卡里面的「Add」按鈕。

   你將看到一個模板列表 - 選擇「Template_Hadoop_NameNode」

   在「Macros」選項卡里面添加以下宏-

       {$HADOOP_NAMENODE_HOST}

       {$HADOOP_NAMENODE_METRICS_PORT}

       {$ZABBIX_NAME}

   {$HADOOP_NAMENODE_HOST}的值應該是NameNode節點服務器的主機名或者徹底主機名(能夠在網絡上ping通)。{$HADOOP_NAMENODE_METRICS_PORT}的值是NameNode的Web UI管理界面的端口。最後{$ZABBIX_NAME}是前面在Zabbix前端定義的NameNode的實體名稱。

   一樣的,安裝監控JobTracker的步驟以下 -

   監控JobTracker

   登陸Zabbix的前端而後點擊導航欄上的Configuration >> Hosts

   點擊右上角的「Create Host」按鈕

   按照提示填寫監控選項 - Name:你選擇的名字(在Zabbix中每個監控實體被稱爲一個主機 - 可是它多是一個主機、一個服務、一個程序乃至一個集羣)。

   完成後單擊「templates」選項卡里面的「Add」按鈕。

   你將看到一個模板列表 - 選擇「Template_Hadoop_JobTracker」

   在「Macros」選項卡里面添加以下宏-

       {$HADOOP_JOBTRACKER_HOST}

       {$HADOOP_JOBTRACKER_METRICS_PORT}

       {$ZABBIX_NAME}

       {$HADOOP_NAMENODE_HOST}的值應該是NameNode節點服務器的主機名或者徹底主機名(能夠在網絡上ping通)。{$HADOOP_NAMENODE_METRICS_PORT}的值是NameNode的Web UI管理界面的端口。最後{$ZABBIX_NAME}是前面在Zabbix前端定義的NameNode的實體名稱。


NameNode監控指標

Configured Cluster Storage

Configured Max. Heap Size (GB)

Hadoop Version

NameNode Process Heap Size (GB)

NameNode Start Time

Number of Dead Nodes

Number of Decommissioned Nodes

Number of Files and Directories in HDFS

Number of HDFS Blocks Used

Number of Live Nodes

Number of Under-Replicated Blocks

Ping Check

Storage Unit

Total % of Storage Available

Total % of Storage Used

Total Storage Available

Total Storage Used by DFS

Total Storage Used by non-DFS

Least (min) Node-level non-DFS Storage Used

Least (min) Node-level Storage Configured

Least (min) Node-level Storage Free

Least (min) Node-level Storage Free %

Least (min) Node-level Storage Used

Least (min) Node-level Storage Used %

Most (max) Node-level non-DFS Storage Used

Most (max) Node-level Storage Configured

Most (max) Node-level Storage Free

Most (max) Node-level Storage Free %

Most (max) Node-level Storage Used

Most (max) Node-level Storage Used %

Node-level Storage Unit of Measure

Node with Least (min) Node-level non-DFS Storage Used

Node with Least (min) Node-level Storage Configured

Node with Least (min) Node-level Storage Free

Node with Least (min) Node-level Storage Free %

Node with Least (min) Node-level Storage Used

Node with Least (min) Node-level Storage Used %

Node with Most (max) Node-level non-DFS Storage Used

Node with Most (max) Node-level Storage Configured

Node with Most (max) Node-level Storage Free

Node with Most (max) Node-level Storage Free %

Node with Most (max) Node-level Storage Used

Node with Most (max) Node-level Storage Used %


JobTracker監控指標

Average Task Capacity Per Node

Hadoop Version

JobTracker Start Time

JobTracker State

Map Task Capacity

Number of Blacklisted Nodes

Number of Excluded Nodes

Number of Jobs Completed

Number of Jobs Failed

Number of Jobs Retired

Number of Jobs Running

Number of Jobs Submitted

Number of Map Tasks Running

Number of Nodes in Hadoop Cluster

Number of Reduce Tasks Running

Occupied Map Slots

Occupied Reduce Slots

Reduce Task Capacity

Reserved Map Slots

Reserved Reduce Slots

Pre-canned NameNode Triggers

Less than 20% free space available on the cluster

NameNode was restarted

No monitoring data received for the last 10 minutes

One or more nodes have become alive or restarted

One or more nodes have become dead

One or more nodes have been added to the decommissioned list

One or more nodes have been removed from the decommissioned list

The number of live nodes has been reduced

The number of live nodes has increased

There has been a reduction in the number of under-replicated blocks

There has been an increase in the number of under-replicated blocks

Less than 20% free space available on one or more nodes in the cluster

Pre-canned JobTracker Triggers

No monitoring data received for the last 10 minutes

One or more jobs have failed

One or more nodes have become blacklisted

One or more nodes have been added to the exclude list

One or more nodes have been added to the Hadoop cluster

One or more nodes have been removed from the blacklisted nodes

One or more nodes have been removed from the exclude list

One or more nodes have been removed from the Hadoop cluster

The JobTracker was restarted

相關文章
相關標籤/搜索