網絡拓撲:參考:http://blog.csdn.net/lastsweetop/article/details/9065667html
分佈式的集羣一般包含很是多的機器,因爲受到機架槽位和交換機網口的限制,一般大型的分佈式集羣都會跨好幾個機架,由多個機架上的機器共同組成一個分佈式集羣。機架內的機器之間的網絡速度一般都會高於跨機架機器之間的網絡速度,而且機架之間機器的網絡通訊一般受到上層交換機間網絡帶寬的限制。java
在Hadoop中把網絡看作一個樹,兩個節點之間的距離是它們到最近的共同祖先的距離總和。該樹中的層次是沒有預先設定的,可是對於數據中心、機架和正在運行的節點一般能夠設定等級的。node
distance(/D1/R1/H1,/D1/R1/H1)=0 同一節點中的進程(相同的datanode)
distance(/D1/R1/H1,/D1/R1/H2)=2 同一機架上的不一樣節點(同一rack下的不一樣datanode)
distance(/D1/R1/H1,/D1/R1/H4)=4 同一數據中心中不一樣機架上的節點(同一IDC下的不一樣datanode)
distance(/D1/R1/H1,/D2/R3/H7)=6 不一樣數據中心中的節點(不一樣IDC下的datanode)apache
Hadoop沒法自行定義網絡拓撲結構,它須要咱們可以理解並輔助定義。(若是網絡是平鋪的(僅有單一層次),能夠不須要進行配置)。api
The HDFS and the Map/Reduce components are rack-aware.網絡
The NameNode and the JobTracker obtains the rack id of the slaves in the cluster by invoking an API resolve in an administrator configured module. The API resolves the slave's DNS name (also IP address) to a rack id. What module to use can be configured using the configuration item topology.node.switch.mapping.impl. The default implementation of the same runs a script/command configured using topology.script.file.name. If topology.script.file.name is not set, the rack id /default-rack is returned for any passed IP address. The additional configuration in the Map/Reduce part is mapred.cache.task.levels which determines the number of levels (in the network topology) of caches. So, for example, if it is the default value of 2, two levels of caches will be constructed - one for hosts (host -> task mapping) and another for racks (rack -> task mapping).app
副本存放:分佈式
namenode節點選擇一個datanode節點去存儲block副本得過程就叫作副本存放,這個過程的策略其實就是在可靠性和讀寫帶寬間得權衡。函數
那麼咱們來看兩個極端現象:oop
// NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes) echo "Starting namenodes on [$NAMENODES]" "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \ --config "$HADOOP_CONF_DIR" \ --hostnames "$NAMENODES" \ --script "$bin/hdfs" start namenode $nameStartOpt "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \ --config "$HADOOP_CONF_DIR" \ --script "$bin/hdfs" start datanode $dataStartOpt "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \ --config "$HADOOP_CONF_DIR" \ --hostnames "$SECONDARY_NAMENODES" \ --script "$bin/hdfs" start secondarynamenode
2.hadoop-daemon.sh
nohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &
3.hdfs
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
運行腳本:啓動GetConf,NameNode,DataNode,SecondaryNameNode四個類的main函數。