用了一段時間的hadoop,如今回來看看源碼發現別有一番味道,溫故而知新,還真是這樣的java
在使用hadoop以前咱們須要配置一些文件,hadoop-env.sh,core-site.xml,hdfs-site.xml,mapred-site.xml。那麼這些文件在何時被hadoop使用?node
通常的在啓動hadoop的時候使用最多就是start-all.sh,那麼這個腳本都幹了些什麼?linux
# Start all hadoop daemons. Run this on master node. #特別的地方時要在master節點上啓動hadoop全部進程 bin=`dirname "$0"` bin=`cd "$bin"; pwd` #bin=$HADOOP_HOME/bin if [ -e "$bin/../libexec/hadoop-config.sh" ]; then . "$bin"/../libexec/hadoop-config.sh else . "$bin/hadoop-config.sh" fi # start dfs daemons "$bin"/start-dfs.sh --config $HADOOP_CONF_DIR # start mapred daemons "$bin"/start-mapred.sh --config $HADOOP_CONF_DIR
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then . "${HADOOP_CONF_DIR}/hadoop-env.sh" fi
測試$HADOOP_HOME/conf/hadoop-env.sh爲普通文件後,經過 . "${HADOOP_CONF_DIR}/hadoop-env.sh"執行hadoop-env.sh這個腳本,ok,咱們在hadoop-env.sh中配置的環境變量 JAVA_HOME 生效了,其實我感受這個地方徹底能夠不用配置,爲何?由於咱們在linux上安裝hadoop確定要安裝java,那麼安裝時確定都會配置JAVA_HOME,在/etc/profile中配置的環境變量在任何的shell進程中都生效。shell
# Run this on master node. usage="Usage: start-dfs.sh [-upgrade|-rollback]" bin=`dirname "$0"` bin=`cd "$bin"; pwd` if [ -e "$bin/../libexec/hadoop-config.sh" ]; then . "$bin"/../libexec/hadoop-config.sh else . "$bin/hadoop-config.sh" fi # get arguments if [ $# -ge 1 ]; then nameStartOpt=$1 shift case $nameStartOpt in (-upgrade) ;; (-rollback) dataStartOpt=$nameStartOpt ;; (*) echo $usage exit 1 ;; esac fi # start dfs daemons # start namenode after datanodes, to minimize time namenode is up w/o data # note: datanodes will log connection errors until namenode starts "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode
仔細看看不能發現,在start-dfs.sh中一樣也會執行hadoop-config.sh,之因此有這一步,是由於咱們不老是使用start-all.sh來啓動hadoop的全部進程,有時候咱們只須要使用hdfs而不須要MapReduce,此時只須要單獨執行start-dfs.sh,一樣hadoop-config.sh中定義的變量也會被文件系統相關進程使用,因此這裏在啓動namenode,datanode,secondarynamenode以前須要先執行hadoop-config.sh,同時hadoop-env.sh文件被執行。再來看看最後的三行代碼,分別是啓動namenode,datanode,secondarynamenode的腳本。啓動hadoop後一共有5個進程,其中三個就是namenode,datanode,secondarynamenode,既然能啓動進程說明對應的類中必定有main方法,看看源碼就能夠驗證這一點,這不是重點,重點是來看看對應的類是怎麼加載配置文件的。不管是namenode,仍是dataname,仍是secondarynamenode,他們在啓動時都會加載core-*.xml和hdfs-*.xml文件,以org.apache.hadoop.hdfs.server.namenode.NameNode 這個類爲例,其餘的兩個類org.apache.hadoop.hdfs.server.datanode.DataNode,org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode相似。apache
public class NameNode implements ClientProtocol, DatanodeProtocol, NamenodeProtocol, FSConstants, RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol { static{ Configuration.addDefaultResource("hdfs-default.xml"); Configuration.addDefaultResource("hdfs-site.xml"); } ... }
看看靜態代碼塊裏面內容,會很興奮,看到了hdfs-default.xml和hdfs-site.xml。對重點就在這裏,static code block在類加載到jvm執行類的初始化時會執行(不是對象初始化)。Configuration.addDefaultResource("hdfs-default.xml");這段代碼執行前會先將Configuration這個類加載jvm中,那麼看下org.apache.hadoop.conf.Configuration這個類中的static code block幹了些什麼app
static{ //print deprecation warning if hadoop-site.xml is found in classpath ClassLoader cL = Thread.currentThread().getContextClassLoader(); if (cL == null) { cL = Configuration.class.getClassLoader(); } if(cL.getResource("hadoop-site.xml")!=null) { LOG.warn("DEPRECATED: hadoop-site.xml found in the classpath. " + "Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, " + "mapred-site.xml and hdfs-site.xml to override properties of " + "core-default.xml, mapred-default.xml and hdfs-default.xml " + "respectively"); } addDefaultResource("core-default.xml"); addDefaultResource("core-site.xml"); }
Configuration類在類的初始化時加載了core-default.xml和core-site.xml這兩個文件。這樣namenode在啓動的時候就加載了core-*.xml和hdfs-*.xml文件,其中core-*.xml是由Configuration這個類加載的。jvm
start-mapred.sh # Start hadoop map reduce daemons. Run this on master node. bin=`dirname "$0"` bin=`cd "$bin"; pwd` if [ -e "$bin/../libexec/hadoop-config.sh" ]; then . "$bin"/../libexec/hadoop-config.sh else . "$bin/hadoop-config.sh" fi # start mapred daemons # start jobtracker first to minimize connection errors at startup "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker
該腳本一樣也會執行hadoop-config.sh,一樣也會執行hadoop-env.sh。這裏和start-dfs.sh是統一的。最後兩行代碼是啓動jobtracker和tasktracker進程的。一樣對應着兩個類org.apache.hadoop.mapred.JobTracker和org.apache.hadoop.mapred.TaskTrackeride
public class JobTracker implements MRConstants, InterTrackerProtocol, JobSubmissionProtocol, TaskTrackerManager, RefreshUserMappingsProtocol, RefreshAuthorizationPolicyProtocol, AdminOperationsProtocol, JobTrackerMXBean { static{ Configuration.addDefaultResource("mapred-default.xml"); Configuration.addDefaultResource("mapred-site.xml"); } ... }