執行命令 : sudo vim /etc/hostnamehtml
假設3臺機器分別叫作,node
vm-007 做爲master算法
vm-008 做爲slaveapache
vm-009 做爲slavevim
執行命令:sudo vi /etc/hostsbash
3臺機器都得修改網絡
192.168.132.128 vm-007app
192.168.132.129 vm-008ssh
192.168.132.130 vm-009oop
ping vm-007 ,ping vm-008,ping vm-009 能Ping通則沒問題
在3臺機器上,執行命令:
sudo apt-get install ssh 安裝好SSH
ssh-keygen 一直 Enter Enter... 會在當前用戶的文件夾中生成一個.ssh文件夾
cd .ssh
在vm-007 執行命令:cp id_rsa.pub authorized_keys
在vm-008,vm-009 執行命令:
scp id_rsa.pub lwj@vm-007:/home/lwj/.ssh/id_rsa.pub.vm-008
scp id_rsa.pub lwj@vm-007:/home/lwj/.ssh/id_rsa.pub.vm-009
在vm-007執行命令:
cat d_rsa.pub.vm-008 >> authorized_keys
cat id_rsa.pub.vm-009 >> authorized_keys
vm-008,vm-009的公鑰追加到vm-007的authorized_keys文件中,這樣的話vm-007,就擁有了3臺機器的公鑰
執行命令:
scp authorized_keys lwj@vm-008:/home/lwj/.ssh/
scp authorized_keys lwj@vm-009:/home/lwj/.ssh/
在vm-007上執行
ssh vm-008,ssh vm-009 若是不須要輸入密碼,則表示成功。
這個環節也是頗有必要的:在每臺機器上執行,ssh localhost
執行命令:
sudo vim .bashrc 或者 sudo vim /etc/profile
export JAVA_HOME=/opt/software/jdk1.7.0_80
export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:.
#Hadoop 配置
export HADOOP_PREFIX="/opt/software/hadoop-2.6.5"
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export YARN_HOME=$HADOOP_PREFIX
export PATH=$JAVA_HOME/bin:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin:$PATH
export JAVA_HOME=/opt/software/jdk1.7.0_80
export HADOOP_HOME=/opt/software/hadoop-2.6.5
<property> <name>hadoop.tmp.dir</name> <value>/disk/hadoop/tempdir</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://vm-007:9000</value> </property> <!--Enabling Trash--> <!--fs.trash.interval:意思是在這個1440分鐘的回收週期範圍以內,文件不會被立刻刪除, 而是暫時被移動到了trash的目錄下面,等時間到了纔會被真正的刪除。當執行./bin/hadoop fs -rm -r /xxx 的時候,能 看見他是暫時被移動到了某個地方。能夠試一試。--> <property> <name>fs.trash.interval</name> <value>1440</value> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>1440</value> </property> <!--注意:和壓縮有關係,若是沒有安裝Snappy,就不要啓動整個參數。--> <property> <name>io.compression.codecs</name> <value> org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.SnappyCodec </value> </property>
<property> <name>dfs.replication</name> <value>2</value> </property> <!--這個參數能夠避免Permission denied: user=root, access=WRITE, inode="":hduser:supergroup:rwxr-xr-x 這個錯誤。注意,若是想讓這個參數生效,須要重啓啓動集羣。我就是在從新啓動集羣的狀況下嘗試成功的。--> <property> <name>dfs.permissions</name> <value>false</value> </property>
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--Map Task的輸出被寫出到本地磁盤, 並且須要經過網絡傳輸至Reduce Task的節點, 只要簡單地使用一個快速的壓縮算法(如LZO、LZ四、Snappy)就能夠帶來性能的提高, 由於壓縮機制的使用避免了Map Tasks與Reduce Tasks之間大量中間結果數據被傳輸。 能夠經過設置相應的Job配置屬性開啓:..若是說是Snappy沒有安裝,就不要設置整個屬性。 記住Snappy須要單獨安裝--> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>vm-007</value> </property>
vm-007
vm-008
vm-009
scp hadoop-env.sh core-site.xml mapred-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml slaves vm-008:/opt/software/hadoop-2.6.5/etc/hadoop/
scp hadoop-env.sh core-site.xml mapred-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml slaves vm-009:/opt/software/hadoop-2.6.5/etc/hadoop/
到Haoop安裝目錄下
執行命令:
hdfs namenode –format
這叫格式化NameNode
執行命令:
./sbin/start-all.sh
vim-007啓動的進程
3936 NodeManager
3648 ResourceManager
3516 SecondaryNameNode
5011 Jps
3213 NameNode
3320 DataNode
vim-008 vim-009 啓動的進程
4913 DataNode
5563 Jps
5027 NodeManager
http://vm-007:50070/dfshealth.html#tab-overview
執行命令:./bin/hadoop fs -mkdir /data 新建一個data目錄
執行命令:./bin/hadoop fs -put a.txt /data/ 將a.txt上傳到HDFS
執行命令:lwj@vm-007:/opt/software/hadoop-2.6.5$ ./bin/hadoop fs -ls /
Found 2 items
drwxr-xr-x - lwj supergroup 0 2016-12-22 19:28 /data
drwx------ - lwj supergroup 0 2016-12-22 19:28 /tmp
執行命令:lwj@vm-007:/opt/software/hadoop-2.6.5$ ./bin/hadoop fs -cat /data/a.txt
a
a
b
b
c
c
a
a
執行命令:lwj@vm-007:/opt/software/hadoop-2.6.5$ ./bin/yarn jar /opt/software/hadoop-2.6.5/bvc-test-0.0.0.jar com.blueview.hadoop.mr.WordCount /data/a.txt /data/word_count
inputPath: /data/a.txt
outputpath: /data/word_count
16/12/22 20:51:51 INFO client.RMProxy: Connecting to ResourceManager at vm-007/192.168.132.128:8032
16/12/22 20:51:52 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/12/22 20:51:53 INFO input.FileInputFormat: Total input paths to process : 1
16/12/22 20:51:53 INFO mapreduce.JobSubmitter: number of splits:1
16/12/22 20:51:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1482405394053_0003
16/12/22 20:51:54 INFO impl.YarnClientImpl: Submitted application application_1482405394053_0003
16/12/22 20:51:54 INFO mapreduce.Job: The url to track the job: http://vm-007:8088/proxy/application_1482405394053_0003/
16/12/22 20:51:54 INFO mapreduce.Job: Running job: job_1482405394053_0003
16/12/22 20:52:04 INFO mapreduce.Job: Job job_1482405394053_0003 running in uber mode : false
16/12/22 20:52:04 INFO mapreduce.Job: map 0% reduce 0%
16/12/22 20:52:12 INFO mapreduce.Job: map 100% reduce 0%
16/12/22 20:52:22 INFO mapreduce.Job: map 100% reduce 100%
16/12/22 20:52:22 INFO mapreduce.Job: Job job_1482405394053_0003 completed successfully
16/12/22 20:52:23 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=70
FILE: Number of bytes written=215415
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=110
HDFS: Number of bytes written=12
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5664
Total time spent by all reduces in occupied slots (ms)=6487
Total time spent by all map tasks (ms)=5664
Total time spent by all reduce tasks (ms)=6487
Total vcore-milliseconds taken by all map tasks=5664
Total vcore-milliseconds taken by all reduce tasks=6487
Total megabyte-milliseconds taken by all map tasks=5799936
Total megabyte-milliseconds taken by all reduce tasks=6642688
Map-Reduce Framework
Map input records=8
Map output records=8
Map output bytes=48
Map output materialized bytes=70
Input split bytes=94
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=70
Reduce input records=8
Reduce output records=3
Spilled Records=16
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=176
CPU time spent (ms)=1810
Physical memory (bytes) snapshot=315183104
Virtual memory (bytes) snapshot=1355341824
Total committed heap usage (bytes)=135860224
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=16
File Output Format Counters
Bytes Written=12
任務提交成功!
花費時間:34077ms
執行命令:lwj@vm-007:/opt/software/hadoop-2.6.5$ ./bin/hadoop fs -cat /data/word_count/part-r-00000
a 4
b 2
c 2
說明:我也是參考了推酷的這篇文章
http://www.tuicool.com/articles/BRVjiq