hdfs竟然沒法正常中止

背景

在某次啓動hdfs以後,在一個月以後,須要將其重啓,但卻發現沒法中止,無奈只有kill掉進程,再次啓動。我並不打算將其問題放棄,想看看究竟是什麼緣由致使這個狀況。html

查看中止腳本

[hadoop@hadoop001 sbin]$ vim stop-dfs.sh 
....
# namenodes

NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)

echo "Stopping namenodes on [$NAMENODES]"

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \        <--- 調用另外一個腳本
  --config "$HADOOP_CONF_DIR" \
  --hostnames "$NAMENODES" \
  --script "$bin/hdfs" stop namenode
...

[hadoop@hadoop001 sbin]$ vim hadoop-daemons.sh
...
# 這裏又調用了hadoop-daemon.sh腳本
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh"
--config $HADOOP_CONF_DIR "$@"
...

[hadoop@hadoop001 sbin]$ vim hadoop-daemon.sh 
...
# HADOOP_PID_DIR The pid files are stored. /tmp by default.
# HADOOP_IDENT_STRING A string representing this instance of hadoop. $USER by default

pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid
...
(stop)

    if [ -f $pid ]; then      <-- 這裏主要找到pid,來進行kill
      TARGET_PID=`cat $pid`
      if kill -0 $TARGET_PID > /dev/null 2>&1; then
        echo stopping $command
        kill $TARGET_PID
        sleep $HADOOP_STOP_TIMEOUT
        if kill -0 $TARGET_PID > /dev/null 2>&1; then
          echo "$command did not stop gracefully after $HADOOP_STOP_TIMEOUT seconds: killing with kill -9"
          kill -9 $TARGET_PID
        fi
      else
        echo no $command to stop
      fi
      rm -f $pid
    else
      echo no $command to stop
    fi
    ;;

  (*)
    echo $usage
    exit 1
    ;;

esac
複製代碼

根據腳本中的內容可知,hdfs中止靠的就是/tmp下的pid文件,文件中存的就是pid,就像下面的同樣:java

[hadoop@hadoop001 tmp]$ pwd
/tmp
[hadoop@hadoop001 tmp]$ ll
-rw-rw-r-- 1 hadoop     hadoop  6 Jul  6 12:54 hadoop-hadoop-datanode.pid
-rw-rw-r-- 1 hadoop     hadoop  6 Jul  6 12:54 hadoop-hadoop-namenode.pid
-rw-rw-r-- 1 hadoop     hadoop  6 Jul  6 12:54 hadoop-hadoop-secondarynamenode.pid
[hadoop@hadoop001 hadoop]$ jps
25330 NameNode
25636 SecondaryNameNode
25463 DataNode
25751 Jps
[root@hadoop001 tmp]# cat hadoop-hadoop-datanode.pid 
25463
複製代碼

實驗

將/tmp路徑下的datanode.pid文件刪除,再次中止hdfs,看是否會出現沒法中止的狀況呢?node

[hadoop@hadoop001 tmp]$ rm -rf hadoop-hadoop-datanode.pid 
[hadoop@hadoop001 tmp]$ ll
-rw-rw-r-- 1 hadoop     hadoop  6 Jul  6 12:54 hadoop-hadoop-namenode.pid
-rw-rw-r-- 1 hadoop     hadoop  6 Jul  6 12:54 hadoop-hadoop-secondarynamenode.pid
[hadoop@hadoop001 hadoop]$ sbin/stop-dfs.sh 
19/07/06 14:01:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [hadoop001]
hadoop001: stopping namenode
hadoop001: no datanode to stop
Stopping secondary namenodes [hadoop001]
hadoop001: stopping secondarynamenode
19/07/06 14:02:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop001 hadoop]$ jps
24906 Jps
17391 DataNode
複製代碼

看來狀況就是沒法中止DataNode,這個進程依舊存在,緣由就是hdfs中止腳本找不到DataNode的pid。那麼,爲何路徑/tmp下就沒有了pid呢?vim

Linux自動刪除文件

上網搜索Linux的tmp路徑相關資料,發現Linux系統有個自動清理tmp目錄的機制,默認每30天刪除一次。相關詳細資料請點擊這裏,關於Linux系統清理/tmp/文件夾的原理bash

解決方案

# 先停hdfs,再修改文件,重啓
[hadoop@hadoop001 hadoop]$ vim hadoop-env.sh 
# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by 
# the user that will run the hadoop daemons. Otherwise there is the
# potential for a symlink attack.
# export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_PID_DIR=/home/hadoop/data/tmp   <---改爲你想要存儲pid文件的目錄
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

[hadoop@hadoop001 ~]$ ll data/tmp/
total 12
-rw-rw-r-- 1 hadoop hadoop 6 Jul  6 14:32 hadoop-hadoop-datanode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Jul  6 14:32 hadoop-hadoop-namenode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Jul  6 14:32 hadoop-hadoop-secondarynamenode.pid
複製代碼
相關文章
相關標籤/搜索