Hadoop啓動腳本解析

在工做過程當中,常常須要調整一些hadoop的參數配置,因此常常會遇到各類各樣的問題。好比改了個配置怎麼忽然namenode起不來啦,加了個jar包怎麼讓hadoop的jvm加載啊,如何設定log目錄啦等等,每次都須要仔細的查一遍啓動腳本才能找到緣由,費時又費力,所以專門總結了一下以便不時之需。java


cloudera的hadoop的啓動腳本寫的異常複雜和零散,各類shell腳本分散在系統的各個角落,讓人很無語。下面以namenode啓動的過程爲例說明hadoop的啓動腳本的調用關係和各個腳本的做用。node


hadoop啓動的入口腳本是/etc/init.d/hadoop-hdfs-name,下面咱們順着啓動namenode的順序看看hadoop的啓動調用過程。shell


/etc/init.d/hadoop-hdfs-namenode:apache

#1.加載/etc/default/hadoop /etc/default/hadoop-hdfs-namenodebash


#2.執行/usr/lib/hadoop/sbin/hadoop-daemon.sh啓動namenodejvm


cloudera啓動namenode的用戶爲hdfs,默認的配置目錄是/etc/hadoop/confide


start() {  oop

 [ -x $EXEC_PATH ] || exit $ERROR_PROGRAM_NOT_INSTALLED  spa

 [ -d $CONF_DIR ] || exit $ERROR_PROGRAM_NOT_CONFIGURED  .net

 log_success_msg "Starting ${DESC}: "  


 su -s /bin/bash $SVC_USER -c "$EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"  


 # Some processes are slow to start  

 sleep $SLEEP_TIME  

 checkstatusofproc  

 RETVAL=$?  


 [ $RETVAL -eq $RETVAL_SUCCESS ] && touch $LOCKFILE  

 return $RETVAL  

}  


/etc/default/hadoop  /etc/default/hadoop-hdfs-namenode:


#1.配置logdir,piddir,user


/usr/lib/hadoop/sbin/hadoop-daemon.sh

#1.加載/usr/lib/hadoop/libexec/hadoop-config.sh


DEFAULT_LIBEXEC_DIR="$bin"/../libexec  

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}  

. $HADOOP_LIBEXEC_DIR/hadoop-config.sh  

#2.加載hadoop-env.sh


if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then  

 . "${HADOOP_CONF_DIR}/hadoop-env.sh"  

fi  

#3.指定log目錄


# get log directory  

if [ "$HADOOP_LOG_DIR" = "" ]; then  

 export HADOOP_LOG_DIR="$HADOOP_PREFIX/logs"  

fi  

#4.補全log目錄和log4j的logger等參數


export HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log  

export HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,RFA"}  

export HADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:-"INFO,RFAS"}  

export HDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:-"INFO,NullAppender"}  

log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out  

pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid  

HADOOP_STOP_TIMEOUT=${HADOOP_STOP_TIMEOUT:-5}  

#5.調用/usr/lib/hadoop-hdfs/bin/hdfs


hadoop_rotate_log $log  

echo starting $command, logging to $log  

cd "$HADOOP_PREFIX"  

case $command in  

 namenode|secondarynamenode|datanode|journalnode|dfs|dfsadmin|fsck|balancer|zkfc)  

if [ -z "$HADOOP_HDFS_HOME" ]; then  

  hdfsScript="$HADOOP_PREFIX"/bin/hdfs  

else  

  hdfsScript="$HADOOP_HDFS_HOME"/bin/hdfs  

fi  

nohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &  

 ;;  

 (*)  

nohup nice -n $HADOOP_NICENESS $hadoopScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &  

 ;;  

esac  

echo $! > $pid  

sleep 1; head "$log"  

sleep 3;  

if ! ps -p $! > /dev/null ; then  

 exit 1  

fi  

能夠看到namenode的sysout輸出到$log中,即log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out


/usr/lib/hadoop/libexec/hadoop-config.sh

#1.加載/usr/lib/hadoop/libexec/hadoop-layout.sh

hadoop-layout.sh主要描述了hadoop的lib的文件夾結構,主要內容以下


HADOOP_COMMON_DIR="./"  

HADOOP_COMMON_LIB_JARS_DIR="lib"  

HADOOP_COMMON_LIB_NATIVE_DIR="lib/native"  

HDFS_DIR="./"  

HDFS_LIB_JARS_DIR="lib"  

YARN_DIR="./"  

YARN_LIB_JARS_DIR="lib"  

MAPRED_DIR="./"  

MAPRED_LIB_JARS_DIR="lib"  


HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-"/usr/lib/hadoop/libexec"}  

HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/conf"}  

HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/usr/lib/hadoop"}  

HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/usr/lib/hadoop-hdfs"}  

HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-0.20-mapreduce"}  

YARN_HOME=${YARN_HOME:-"/usr/lib/hadoop-yarn"}  

#2.指定HDFS和YARN的lib


HADOOP_COMMON_DIR=${HADOOP_COMMON_DIR:-"share/hadoop/common"}  

HADOOP_COMMON_LIB_JARS_DIR=${HADOOP_COMMON_LIB_JARS_DIR:-"share/hadoop/common/lib"}  

HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_COMMON_LIB_NATIVE_DIR:-"lib/native"}  

HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"}  

HDFS_LIB_JARS_DIR=${HDFS_LIB_JARS_DIR:-"share/hadoop/hdfs/lib"}  

YARN_DIR=${YARN_DIR:-"share/hadoop/yarn"}  

YARN_LIB_JARS_DIR=${YARN_LIB_JARS_DIR:-"share/hadoop/yarn/lib"}  

MAPRED_DIR=${MAPRED_DIR:-"share/hadoop/mapreduce"}  

MAPRED_LIB_JARS_DIR=${MAPRED_LIB_JARS_DIR:-"share/hadoop/mapreduce/lib"}  


# the root of the Hadoop installation  

# See HADOOP-6255 for directory structure layout  

HADOOP_DEFAULT_PREFIX=$(cd -P -- "$common_bin"/.. && pwd -P)  

HADOOP_PREFIX=${HADOOP_PREFIX:-$HADOOP_DEFAULT_PREFIX}  

export HADOOP_PREFIX  

#3.對slave文件判斷。但cdh的hadoop不是依靠slave來啓動集羣的,而是要用戶本身寫集羣啓動腳本(也許是爲了逼用戶用他的CloudManager。。。)


#4.再次指定env文件


if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then  

 . "${HADOOP_CONF_DIR}/hadoop-env.sh"  

fi  

#5.指定java home


# Attempt to set JAVA_HOME if it is not set  

if [[ -z $JAVA_HOME ]]; then  

 # On OSX use java_home (or /Library for older versions)  

 if [ "Darwin" == "$(uname -s)" ]; then  

   if [ -x /usr/libexec/java_home ]; then  

     export JAVA_HOME=($(/usr/libexec/java_home))  

   else  

     export JAVA_HOME=(/Library/Java/Home)  

   fi  

 fi  


 # Bail if we did not detect it  

 if [[ -z $JAVA_HOME ]]; then  

   echo "Error: JAVA_HOME is not set and could not be found." 1>&2  

   exit 1  

 fi  

fi  

#6.指定Java程序啓動的heapsize,若是用戶在hadoop-env.sh中指定了HADOOP_HEAPSIZE字段則會覆蓋默認值1000m


# some Java parameters  

JAVA_HEAP_MAX=-Xmx1000m  


# check envvars which might override default args  

if [ "$HADOOP_HEAPSIZE" != "" ]; then  

 #echo "run with heapsize $HADOOP_HEAPSIZE"  

 JAVA_HEAP_MAX="-Xmx""$HADOOP_HEAPSIZE""m"  

 #echo $JAVA_HEAP_MAX  

fi  

#7.指定程序的classpath,一大串代碼,總結下就是

HADOOP_CONF_DIR+HADOOP_CLASSPATH+HADOOP_COMMON_DIR+HADOOP_COMMON_LIB_JARS_DIR+

HADOOP_COMMON_LIB_JARS_DIR+HADOOP_COMMON_LIB_NATIVE_DIR+HDFS_DIR+HDFS_LIB_JARS_DIR

+YARN_DIR+YARN_LIB_JARS_DIR+MAPRED_DIR+MAPRED_LIB_JARS_DIR


有一個要注意的,hadoop比較貼心的提供了HADOOP_USER_CLASSPATH_FIRST屬性,如何設置了,

則HADOOP_CLASSPATH(用戶自定義classpath)會在hadoop自身的jar包前加載,用來解決用戶

想最早加載自定義的jar包狀況。


#8.指定HADOOP_OPTS,-Dhadoop.log.dir這些相似參數會在conf下的log4j配置中用到


HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"  

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.file=$HADOOP_LOGFILE"  

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.home.dir=$HADOOP_PREFIX"  

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"  

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.root.logger=${HADOOP_ROOT_LOGGER:-INFO,console}"  

if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then  

 HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"  

 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH  

fi  

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.policy.file=$HADOOP_POLICYFILE"  


# Disable ipv6 as it can cause issues  

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"  


<span style="font-size:18px;">  

</span>  

/usr/lib/hadoop-hdfs/bin/hdfs

#1.加載/usr/lib/hadoop/libexec/hdfs-config.sh,但好像沒啥做用


#2.根據啓動參數指定java的啓動mainclass:


if [ "$COMMAND" = "namenode" ] ; then  

 CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'  

 HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"  


#3.啓動Java程序  

exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"  


最後介紹幾個配置的小例子。


1.如何指定hadoop的log目錄:


從啓動腳本中看幾個配置的優先級排序是hadoop-env.sh>hadoop-config.sh>/etc/default/hadoop,所以咱們若是想指定hadoop的log目錄只需在hadoop-env.sh中添加一行:


export HADOOP_LOG_DIR=xxxxx


2.如何添加本身的jar包到hadoop中被namenode,datanode使用


export HADOOP_CLASSPATH=xxxxx


3.如何單獨設定namenode的java heapsize。


好比想設置namenode10G,datanode1G,這個就有點意思了。若是直接指定HADOOP_HEAPSIZE那麼此參數會做用於namenode,datanode,而單獨在namenode的參數中指定也會有點小問題哦,不過基本是可使用的。


總之,因爲hadoop的啓動腳本極其多並且瑣碎,再加上hbase hive的啓動腳本都是相似的結構,致使在添加修改一些配置時會產生不少莫名的問題,你們也能夠在使用的過程當中細細體會啦

相關文章
相關標籤/搜索