環境: 1臺NameNode服務器,2臺DataNode服務器java
①:配置/etc/hosts文件:實現集羣內部的DNS解析,無需查詢DNS服務器,當訪問遠程主機時首先查詢hosts文件是否有配置,若是配置則直接按照指定的IP直接訪問遠程主機(實際規模較大的hadoop集羣通常會配置DNS服務器進行統一管理)node
修改linux主機的主機名:/etc/sysconfig/network文件的HOSTNAME字段的值便可(注意重啓纔可永久生效) hostname newName:重啓以後就會失效linux
hosts文件:注意每一個節點最好共享同1份hosts文件apache
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.174.142 NameNode 192.168.174.143 DataNode_01 192.168.174.145 DataNode_02
測試hosts文件:服務器
[squirrel@DataNode_02 ~]\$ ping DataNode_01 PING DataNode_01 (192.168.174.143) 56(84) bytes of data. 64 bytes from DataNode_01 (192.168.174.143): icmp_seq=1 ttl=64 time=2.24 ms --- DataNode_01 ping statistics --- 7 packets transmitted, 7 received, 0% packet loss, time 6589ms rtt min/avg/max/mdev = 0.275/0.733/2.241/0.624 ms [squirrel@DataNode_02 ~]\$ ping DataNode_02 PING DataNode_02 (192.168.174.145) 56(84) bytes of data. 64 bytes from DataNode_02 (192.168.174.145): icmp_seq=1 ttl=64 time=0.029 ms --- DataNode_02 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2381ms rtt min/avg/max/mdev = 0.029/0.050/0.062/0.016 ms
結論:日誌顯示能夠ping主機名,代表hosts文件配置沒問題dom
②:配置hadoop核心配置文件:hadoop-env.sh、 core-site.xml、hdfs-site.xml、 mapred-site.xmlssh
hadoop-env.sh:(jdk安裝目錄配置)ide
export JAVA_HOME=/usr/local/java/jdk1.8.0_112
core-site.xml: 注意:名稱節點位置須要實際的節點主機名或IP地址svn
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://NameNode:9000</value> </property> </configuration>
hdfs-site.xml:oop
注意:數據塊存放目錄若是不存在,DataNode節點不會啓動DataNode守護進程
說明:由於配置兩臺DataNode節點,數據塊備份2份
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.data.dir</name> <value>/home/squirrel/Programme/hadoop-0.20.2/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
mapred-site.xml:
注意:任務追蹤器的位置需改成實際的主機名或IP地址
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>NameNode:9001</value> </property> </configuration>
③:配置masters和slaves文件
注意:文件裏面只需每行之名服務器的主機名或IP地址便可
masters文件: mater節點:NameNode/SecondaryNameNode/JobTracker
NameNode
slaves文件: slave節點:DataNode/TaskTracker
DataNode_01 DataNode_02
④:將hadoop配置好的文件所有共享給hadoop集羣節點服務器
scp -r /home/squirrel/Programme/hadoop-0.20.2 DataNode_01:/home/squirrel/Programme/hadoop-0.20.2 scp -r /home/squirrel/Programme/hadoop-0.20.2 DataNode_02:/home/squirrel/Programme/hadoop-0.20.2
⑤:格式化HDFS文件系統:hadoop namenode -format
16/12/28 23:23:13 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = NameNode/192.168.174.142 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ 16/12/28 23:23:15 INFO namenode.FSNamesystem: fsOwner=squirrel,squirrel 16/12/28 23:23:15 INFO namenode.FSNamesystem: supergroup=supergroup 16/12/28 23:23:15 INFO namenode.FSNamesystem: isPermissionEnabled=true 16/12/28 23:23:15 INFO common.Storage: Image file of size 98 saved in 0 seconds. 16/12/28 23:23:15 INFO common.Storage: Storage directory /tmp/hadoop-squirrel/dfs/name has been successfully formatted. 16/12/28 23:23:15 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at NameNode/192.168.174.142 ************************************************************
分析:
"/tmp/hadoop-squirrel/dfs/name has been successfully formatted."打印日誌代表HDFS文件系統格式化成功。
⑥:啓動Hadoop:在hadoop解壓目錄bin下執行./start-all.sh
starting namenode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-namenode-NameNode.out DataNode_01: starting datanode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-datanode-DataNode_01.out DataNode_02: starting datanode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-datanode-DataNode_02.out NameNode: starting secondarynamenode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-secondarynamenode-NameNode.out starting jobtracker, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-jobtracker-NameNode.out DataNode_02: starting tasktracker, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-tasktracker-DataNode_02.out DataNode_01: starting tasktracker, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-tasktracker-DataNode_01.out
分析:
⑦:檢測hadoop啓動守護進程 NameNode節點運行jps:
15825 JobTracker 15622 NameNode 15752 SecondaryNameNode 15935 Jps
DataNode節點運行jps:
15237 DataNode 15350 Jps 15310 TaskTracker
結論:hadoop集羣徹底成功啓動
注意:hadoop採用ssh實現文件傳輸,所以必然涉及到hadoop集羣內部節點之間的文件訪問權限問題,建議hadoop目錄放在節點服務器登錄用戶擁有徹底權限的目錄下,不然會出現日誌寫不進日誌文件等問題