一些重複的細節參考Hadoop1.X集羣徹底分佈式模式環境部署html
HADOOP 集羣具體來講包含兩個集羣:HDFS 集羣和YARN集羣,二者邏輯上分離,但物理上常在一塊兒.java
HDFS集羣
:負責海量數據的存儲,集羣中的角色主要有 NameNode
/ DataNode
YARN集羣
:負責海量數據運算時的資源調度,集羣中的角色主要有 ResourceManager
/NodeManager
本集羣搭建案例,以 5 節點爲例進行搭建,角色分配以下:node
結點 | 角色 | IP |
---|---|---|
node1 | NameNode SecondaryNameNode |
192.168.33.200 |
node2 | ResourceManager | 192.168.33.201 |
node3 | DataNode NodeManager |
192.168.33.202 |
node4 | DataNode NodeManager |
192.168.33.203 |
node5 | DataNode NodeManager |
192.168.33.204 |
部署圖以下:瀏覽器
本案例使用虛擬機服務器來搭建 HADOOP 集羣,所用軟件及版本:bash
★ paraller Desktop 12服務器
★ Centos 6.5 64bit網絡
最簡化配置以下:app
vi hadoop-env.sh
ssh
/home/hd2/tmp目錄要先建好分佈式
# The java implementation to use. export JAVA_HOME=/usr/local/jdk1.7.0_65
vi core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hd2/tmp</value> </property> <property> <name>hadoop.logfile.size</name> <value>10000000</value> <description>The max size of each log file</description> </property> <property> <name>hadoop.logfile.count</name> <value>10</value> <description>The max number of log files</description> </property> </configuration>
vi hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hd2/data/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hd2/data/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.secondary.http.address</name> <value>node1:50090</value> </property> ca </configuration>
vi mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
vi yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>node1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
vi salves
node1 node2 node3 node4 node5
初始化 HDFS
bin/hadoop namenode -format
啓動 HDFS
sbin/start-dfs.sh
啓動 YARN
sbin/start-yarn.sh
瀏覽器訪問http://192.168.33.200:50070
[hd2@node1 hadoop-2.4.1]$ hadoop fs -mkdir input
[hd2@node1 hadoop-2.4.1]$ hadoop fs -ls Found 1 items drwxr-xr-x - root supergroup 0 2014-08-18 09:02 input
[hd2@node1 hadoop-2.4.1]$ vi test.txt
hello hadoop
hello World
Hello Java
Hey man
i am a programmer
[hd2@node1 hadoop-2.4.1]$ hadoop fs -put test.txt input/
[hd2@node1 hadoop-2.4.1]$ hadoop fs -ls input/ Found 1 items -rw-r--r-- 1 root supergroup 62 2014-08-18 09:03 input/test.txt
[hd2@node1 hadoop-2.4.1]$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount input/ output/
執行過程
17/04/19 21:07:19 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.33.200:8032 17/04/19 21:07:19 INFO input.FileInputFormat: Total input paths to process : 2 17/04/19 21:07:20 INFO mapreduce.JobSubmitter: number of splits:2 17/04/19 21:07:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492605823444_0003 17/04/19 21:07:20 INFO impl.YarnClientImpl: Submitted application application_1492605823444_0003 17/04/19 21:07:20 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1492605823444_0003/ 17/04/19 21:07:20 INFO mapreduce.Job: Running job: job_1492605823444_0003 17/04/19 21:07:26 INFO mapreduce.Job: Job job_1492605823444_0003 running in uber mode : false 17/04/19 21:07:26 INFO mapreduce.Job: map 0% reduce 0% 17/04/19 21:07:33 INFO mapreduce.Job: map 100% reduce 0% 17/04/19 21:07:40 INFO mapreduce.Job: map 100% reduce 100% 17/04/19 21:07:42 INFO mapreduce.Job: Job job_1492605823444_0003 completed successfully 17/04/19 21:07:42 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=68 FILE: Number of bytes written=279333 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=246 HDFS: Number of bytes written=25 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=1 Rack-local map tasks=1 Total time spent by all maps in occupied slots (ms)=8579 Total time spent by all reduces in occupied slots (ms)=5101 Total time spent by all map tasks (ms)=8579 Total time spent by all reduce tasks (ms)=5101 Total vcore-seconds taken by all map tasks=8579 Total vcore-seconds taken by all reduce tasks=5101 Total megabyte-seconds taken by all map tasks=8784896 Total megabyte-seconds taken by all reduce tasks=5223424 Map-Reduce Framework Map input records=2 Map output records=6 Map output bytes=62 Map output materialized bytes=74 Input split bytes=208 Combine input records=6 Combine output records=5 Reduce input groups=3 Reduce shuffle bytes=74 Reduce input records=5 Reduce output records=3 Spilled Records=10 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=430 CPU time spent (ms)=1550 Physical memory (bytes) snapshot=339206144 Virtual memory (bytes) snapshot=1087791104 Total committed heap usage (bytes)=242552832 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=38 File Output Format Counters Bytes Written=25
執行結果
[hd2@node1 hadoop-2.4.1]$ hadoop fs -ls /user/hd2/out/ Found 2 items -rw-r--r-- 3 hd2 supergroup 0 2017-04-19 21:07 /user/hd2/out/_SUCCESS -rw-r--r-- 3 hd2 supergroup 25 2017-04-19 21:07 /user/hd2/out/part-r-00000 [hd2@node1 hadoop-2.4.1]$ hadoop fs -cat /user/hd2/out/part-r-00000 hadoop 2 hello 3 world 1