1、安裝hadoop前的準備工做php
1. 修改主機名稱node
1) vi /etc/sysconfig/networkweb
修改成HOSTNAME=master(/slave1/slave2)------每臺主機改爲本身的名稱算法
2) vi /etc/hostsmongodb
將以下內容添加到三臺主機該文件的末尾瀏覽器
192.168.0.6 master網絡
192.168.0.5 slave1app
192.168.0.2 slave2ssh
3) 以上方法須要重啓主機才能生效,若要即便生效,可使用<hostname 主機名>命令webapp
驗證配置,執行ping master,若是可以ping經過。則表示,這個配置正確。(須要保證網絡連通)
同理,須要在其餘節點設置相同。
2. ssh無密鑰登錄
1) 經過如下命令檢查是否已安裝sshd服務:
rpm –qa | grep openssh
rpm –qa | grep rsync
2) 若是沒有安裝sshd服務和rsync,能夠經過下面命令進行安裝:
yum install ssh 安裝SSH協議
yum install rsync (rsync是一個遠程數據同步工具,可經過LAN/WAN快速同步多臺主機間的文件)
service sshd restart 啓動服務
service iptables stop關閉防火牆
3) Master機器上生成密碼對
在Master節點上執行如下命令,在home目錄下建立.ssh目錄,直接以root用戶來配置,進入系統時,root用戶直接進入的是root目錄,而其餘用戶直接進入的是home目錄:
cd mkdir .ssh
ssh-keygen –t rsa
這條命令是生成其無密碼密鑰對,一直按enter鍵(詢問其保存路徑時直接回車採用默認路徑)。生成的密鑰對:id_rsa和id_rsa.pub,默認存儲在"/.ssh"目錄下。
把生成的id_rsa.pub複製一份,命名爲authorized_keys
cp id_rsa.pub authorized_keys
能夠在.ssh目錄下查看一下
4) 將公鑰文件authorized_keys分發到各DataNode節點:
[root@localhost .ssh]# scp authorized_keys root@192.168.0.5:/root/.ssh/
[root@localhost .ssh]# scp authorized_keys root@192.168.0.5:/root/.ssh/
(或者主機ip使用slave1,slave2)
scp -r hbase-1.0.0 root@slave1:/opt/
注意:在此過程當中須要注意提問時要輸入yes,不能直接回車。不然會出現下面的錯誤。
5) 驗證無密鑰是否配置成功
ssh slave1 (首次登錄時須要驗證slave1的密碼,以後就不用了)
ifconfig (查看ip是否已經變爲slave1的ip)
exit (退出鏈接)
ssh slave2 (驗證方式同slave1)
ifconfig
exit
驗證結果以下圖所示,則無密鑰登錄配置成功。接下來就能夠安裝hadoop了。
2、安裝hadoop
1.下載編譯安裝包
從官網下載hadoop2.5.2安裝包,這時候要注意hadoop-2.5.2.tar.gz是編譯過的文件,而hadoop-2.5.2-src.tar.gz是沒有編譯的文件。因爲官網的hadoop-2.5.2.tar.gz編譯的是32位系統的,因此咱們須要下載hadoop-2.5.2-src.tar.gz,再在本機上編譯成64位系統的安裝包。
可參照教程http://f.dataguru.cn/forum.php?mod=viewthread&tid=454226
(因爲村長已經編譯好了,咱們直接去他的機子下面將安裝包scp到本身的機子上便可。村長的主機ip是192.168.0.10)
1) master/slave1/slave2 建立hadoop文件夾存放安裝包和解壓後的文件:
cd /home
mkdir hadoop
2) 從村長的機子上將安裝包scp到本身的機子上
cd /root/download/hadoop-2.5.2-src/hadoop-dist/target
scp hadoop-2.5.2.tar.gz root@192.168.0.6:/home/hadoop/
scp hadoop-2.5.2.tar.gz root@192.168.0.5:/home/hadoop/
scp hadoop-2.5.2.tar.gz root@192.168.0.2:/home/hadoop/
2. 解壓安裝包
cd /home/hadoop
tar -zvxf hadoop-2.5.2.tar.gz
3. 修改配置
cd /home/hadoop/hadoop-2.5.2/etc/hadoop
1) vi core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> </configuration> |
2) vi hdfs-site.xml
<configuration> <property> <name>dfs.nameservices</name> <value>hadoop-cluster1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> --------------------------slave個數 </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration> |
3) vi mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>master:50030</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration> |
4) vi yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration> |
5) vi slaves
刪掉localhost
輸入
master (這樣master自己也做爲一個dataNode)
slave1
slave2
6) vi hadoop-env.sh
修改export JAVA_HOME=/jdk17
7) vi yarn-env.sh
去掉#
export JAVA_HOME=/jdk17
8) 將主機配置scp到兩臺slave上
cd /home/hadoop/hadoop-2.5.2/etc
scp -r hadoop root@slave1:/home/hadoop/hadoop-2.5.2/etc
scp -r hadoop root@slave2:/home/hadoop/hadoop-2.5.2/etc
4. 主機上格式化文件系統
cd /home/hadoop/hadoop-2.5.2
bin/hdfs namenode -format
(格式化過程當中會有一次須要輸入yes)
5.啓動
sbin/start-dfs.sh
sbin/start-yarn.sh
(或者sbin/start-all.sh)
查看啓動的進程jps
PS:中止
cd /home/hadoop/hadoop-2.5.2
sbin/stop-dfs.sh
sbin/stop-yarn.sh
6. 經過瀏覽器訪問
能夠看到live節點數和集羣節點數均爲3,基本能夠肯定配置成功了,接下來就來測試一下吧。
三.集羣測試
1.圓周率
這是mongodb蒙特卡洛算法計算圓周率的測試用例,pi後跟的兩個數字分別表示使用多少個map以及計算的精度。
cd /home/hadoop/hadoop-2.5.2
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar pi 10 1000
測試結果以下:
[root@master ~]# cd /home/hadoop/hadoop-2.5.2 [root@master hadoop-2.5.2]# bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar pi 10 1000 Number of Maps = 10 Samples per Map = 1000 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 15/03/15 18:59:14 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.6:8032 15/03/15 18:59:14 INFO input.FileInputFormat: Total input paths to process : 10 15/03/15 18:59:14 INFO mapreduce.JobSubmitter: number of splits:10 15/03/15 18:59:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1426367748160_0006 15/03/15 18:59:15 INFO impl.YarnClientImpl: Submitted application application_1426367748160_0006 15/03/15 18:59:15 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1426367748160_0006/ 15/03/15 18:59:15 INFO mapreduce.Job: Running job: job_1426367748160_0006 15/03/15 18:59:21 INFO mapreduce.Job: Job job_1426367748160_0006 running in uber mode : false 15/03/15 18:59:21 INFO mapreduce.Job: map 0% reduce 0% 15/03/15 18:59:34 INFO mapreduce.Job: map 40% reduce 0% 15/03/15 18:59:41 INFO mapreduce.Job: map 100% reduce 0% 15/03/15 18:59:42 INFO mapreduce.Job: map 100% reduce 100% 15/03/15 18:59:43 INFO mapreduce.Job: Job job_1426367748160_0006 completed successfully 15/03/15 18:59:44 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=1070916 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2610 HDFS: Number of bytes written=215 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=154713 Total time spent by all reduces in occupied slots (ms)=5591 Total time spent by all map tasks (ms)=154713 Total time spent by all reduce tasks (ms)=5591 Total vcore-seconds taken by all map tasks=154713 Total vcore-seconds taken by all reduce tasks=5591 Total megabyte-seconds taken by all map tasks=158426112 Total megabyte-seconds taken by all reduce tasks=5725184 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1430 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=930 CPU time spent (ms)=5540 Physical memory (bytes) snapshot=2623418368 Virtual memory (bytes) snapshot=9755574272 Total committed heap usage (bytes)=1940914176 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 29.79 seconds Estimated value of Pi is 3.14080000000000000000 [root@master hadoop-2.5.2]# |
2. 單詞統計
1.進入到hadoop-2.5.2目錄下 cd /home/hadoop/hadoop-2.5.2 2.建立tmp文件 bin/hdfs dfs -mkdir /tmp 3.事先在hadoop-2.5.2目錄下建立好一個test.txt文件,做爲測試文件 4.將測試文件上傳到tmp文件中 bin/hdfs dfs -copyFromLocal / home/hadoop/hadoop-2.5.2/test.txt /tmp 5.查看是否上傳成功 bin/hdfs dfs -ls /tmp 6.執行hadoop-mapreduce-examples-2.5.2.jar裏面的wordcount bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar wordcount /tmp/test.txt /tmp-output 7.查看結果 bin/hdfs dfs -ls /tmp-output bin/hadoop fs -cat /tmp-output/part-r-00000 Ps:再次運行時須要先刪除hdfs系統目錄裏的tmp-output文件夾 [root@master hadoop-2.5.2]# bin/hdfs dfs -rmdir /tmp-output |
四.實驗中遇到的問題
1.Namenode沒法啓動
形成有個問題的緣由最多見的是屢次格式化namenode形成的,即 namespaceID 不一致。這種狀況清空logs,重啓啓動有時候甚至有時候都沒有datanode的日誌產生。
解決方法:找到不一致的 VERSION 修改 namespaceID
或者:刪除 hdfs/data 中所有文件,從新初始化namenode,這樣作數據就所有沒了(看到的結果是這樣)
PS : 還有一種說法形成啓動不了datanode的緣由是 data文件的權限問題,這個問題目前沒有遇到
刪除了data中全部文件從新初始化後,問題解決
2. nodemanager沒法啓動
一開始他運行了,後來又中止了,查資料將全部的name文件夾清空,從新格式化,以後nodemanager出現。
3.結點信息中有兩個slave的信息,可是集羣信息中只有master的信息。
解決辦法:將主機配置文件scp到slave上
cd /home/hadoop/hadoop-2.5.2/etc
scp -r hadoop root@slave1:/home/hadoop/hadoop-2.5.2/etc
scp -r hadoop root@slave2:/home/hadoop/hadoop-2.5.2/etc
4. 單詞統計失敗
出錯緣由與解決方法