在某次啓動hdfs以後,在一個月以後,須要將其重啓,但卻發現沒法中止,無奈只有kill掉進程,再次啓動。我並不打算將其問題放棄,想看看究竟是什麼緣由致使這個狀況。html
[hadoop@hadoop001 sbin]$ vim stop-dfs.sh
....
# namenodes
NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)
echo "Stopping namenodes on [$NAMENODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \ <--- 調用另外一個腳本
--config "$HADOOP_CONF_DIR" \
--hostnames "$NAMENODES" \
--script "$bin/hdfs" stop namenode
...
[hadoop@hadoop001 sbin]$ vim hadoop-daemons.sh
...
# 這裏又調用了hadoop-daemon.sh腳本
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh"
--config $HADOOP_CONF_DIR "$@"
...
[hadoop@hadoop001 sbin]$ vim hadoop-daemon.sh
...
# HADOOP_PID_DIR The pid files are stored. /tmp by default.
# HADOOP_IDENT_STRING A string representing this instance of hadoop. $USER by default
pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid
...
(stop)
if [ -f $pid ]; then <-- 這裏主要找到pid,來進行kill
TARGET_PID=`cat $pid`
if kill -0 $TARGET_PID > /dev/null 2>&1; then
echo stopping $command
kill $TARGET_PID
sleep $HADOOP_STOP_TIMEOUT
if kill -0 $TARGET_PID > /dev/null 2>&1; then
echo "$command did not stop gracefully after $HADOOP_STOP_TIMEOUT seconds: killing with kill -9"
kill -9 $TARGET_PID
fi
else
echo no $command to stop
fi
rm -f $pid
else
echo no $command to stop
fi
;;
(*)
echo $usage
exit 1
;;
esac
複製代碼
根據腳本中的內容可知,hdfs中止靠的就是/tmp下的pid文件,文件中存的就是pid,就像下面的同樣:java
[hadoop@hadoop001 tmp]$ pwd
/tmp
[hadoop@hadoop001 tmp]$ ll
-rw-rw-r-- 1 hadoop hadoop 6 Jul 6 12:54 hadoop-hadoop-datanode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Jul 6 12:54 hadoop-hadoop-namenode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Jul 6 12:54 hadoop-hadoop-secondarynamenode.pid
[hadoop@hadoop001 hadoop]$ jps
25330 NameNode
25636 SecondaryNameNode
25463 DataNode
25751 Jps
[root@hadoop001 tmp]# cat hadoop-hadoop-datanode.pid
25463
複製代碼
將/tmp路徑下的datanode.pid文件刪除,再次中止hdfs,看是否會出現沒法中止的狀況呢?node
[hadoop@hadoop001 tmp]$ rm -rf hadoop-hadoop-datanode.pid
[hadoop@hadoop001 tmp]$ ll
-rw-rw-r-- 1 hadoop hadoop 6 Jul 6 12:54 hadoop-hadoop-namenode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Jul 6 12:54 hadoop-hadoop-secondarynamenode.pid
[hadoop@hadoop001 hadoop]$ sbin/stop-dfs.sh
19/07/06 14:01:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [hadoop001]
hadoop001: stopping namenode
hadoop001: no datanode to stop
Stopping secondary namenodes [hadoop001]
hadoop001: stopping secondarynamenode
19/07/06 14:02:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop001 hadoop]$ jps
24906 Jps
17391 DataNode
複製代碼
看來狀況就是沒法中止DataNode,這個進程依舊存在,緣由就是hdfs中止腳本找不到DataNode的pid。那麼,爲何路徑/tmp下就沒有了pid呢?vim
上網搜索Linux的tmp路徑相關資料,發現Linux系統有個自動清理tmp目錄的機制,默認每30天刪除一次。相關詳細資料請點擊這裏,關於Linux系統清理/tmp/文件夾的原理。bash
# 先停hdfs,再修改文件,重啓
[hadoop@hadoop001 hadoop]$ vim hadoop-env.sh
# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
# the user that will run the hadoop daemons. Otherwise there is the
# potential for a symlink attack.
# export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_PID_DIR=/home/hadoop/data/tmp <---改爲你想要存儲pid文件的目錄
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
[hadoop@hadoop001 ~]$ ll data/tmp/
total 12
-rw-rw-r-- 1 hadoop hadoop 6 Jul 6 14:32 hadoop-hadoop-datanode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Jul 6 14:32 hadoop-hadoop-namenode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Jul 6 14:32 hadoop-hadoop-secondarynamenode.pid
複製代碼