查詢了網上datanode的監控,都是比較複雜的,涉及底層源碼,須要編寫JAVA取獲取狀態的,我也不會。node
個人要求很簡單,即jps查一下,若是能夠看到datanode、nodemanger進程,則認爲節點正常。bash
#!/bin/bash ############################################################################################ # Script Name: datanodes_alarm.sh # Script Desc: Hadoop集羣datanodes狀態監控,循環登錄主機執行jps命令,獲取節點狀態 # crontab 配置,一小時一次:6 * * * * . /home/hadoop/lyc/datanodes_alarm.sh>>/home/hadoop/lyc/datanodes_alarm.log 2>&1 # Inputs: # Author: 林育馳 # Date: 2017-12-26 # History: Version----------Date---------Common-------author---------Desc------------ # 01 2017-12-26 Create 林育馳 建立 # 02 # 03 # 04 # 05 ############################################################################################ source /home/hadoop/.bash_profile echo -e "\n\n===================[EXEC TIME:`date +%c`]===================" datanodes=`cat /etc/hosts|grep hadoop|sed -n '3,$p'|awk '{print $2}'` for datanode in `echo $datanodes ` do jps_status=`ssh -t $datanode 'jps'|grep Node|wc -l|tail -1` if [[ $jps_status -lt 2 ]]; then echo $datanode is lost! sh /home/hadoop/lyc/bonc_warn_job.sh "B" "集羣監控" "Hadoop datanode $datanode is lost!" else echo $datanode is ok! fi done
記錄的日誌以下:ssh