hadoop安裝轉載

1、安裝hadoop前的準備工做php

1. 修改主機名稱node

1) vi /etc/sysconfig/networkweb

修改成HOSTNAME=master(/slave1/slave2)------每臺主機改爲本身的名稱算法

2) vi /etc/hostsmongodb

將以下內容添加到三臺主機該文件的末尾瀏覽器

192.168.0.6 master網絡

192.168.0.5 slave1app

192.168.0.2 slave2ssh

3) 以上方法須要重啓主機才能生效,若要即便生效,可使用<hostname 主機名>命令webapp

 驗證配置,執行ping master,若是可以ping經過。則表示,這個配置正確。(須要保證網絡連通)

同理,須要在其餘節點設置相同。

2. ssh無密鑰登錄

1) 經過如下命令檢查是否已安裝sshd服務:

    rpm –qa | grep openssh

    rpm –qa | grep rsync

2) 若是沒有安裝sshd服務和rsync,能夠經過下面命令進行安裝:

    yum install ssh 安裝SSH協議

    yum install rsync (rsync是一個遠程數據同步工具,可經過LAN/WAN快速同步多臺主機間的文件)

    service sshd restart 啓動服務

service iptables stop關閉防火牆

3) Master機器上生成密碼對

在Master節點上執行如下命令,在home目錄下建立.ssh目錄,直接以root用戶來配置,進入系統時,root用戶直接進入的是root目錄,而其餘用戶直接進入的是home目錄:

cd mkdir .ssh

ssh-keygen –t rsa

這條命令是生成其無密碼密鑰對,一直按enter鍵(詢問其保存路徑時直接回車採用默認路徑)。生成的密鑰對:id_rsa和id_rsa.pub,默認存儲在"/.ssh"目錄下。

把生成的id_rsa.pub複製一份,命名爲authorized_keys

cp id_rsa.pub authorized_keys

能夠在.ssh目錄下查看一下

                       

4) 將公鑰文件authorized_keys分發到各DataNode節點:

[root@localhost .ssh]# scp authorized_keys root@192.168.0.5:/root/.ssh/ 

[root@localhost .ssh]# scp authorized_keys root@192.168.0.5:/root/.ssh/

(或者主機ip使用slave1,slave2)

scp -r hbase-1.0.0 root@slave1:/opt/

 

注意:在此過程當中須要注意提問時要輸入yes,不能直接回車。不然會出現下面的錯誤。

 

5) 驗證無密鑰是否配置成功

ssh slave1   (首次登錄時須要驗證slave1的密碼,以後就不用了)

ifconfig     (查看ip是否已經變爲slave1的ip)

exit          (退出鏈接)

ssh slave2   (驗證方式同slave1)

ifconfig

exit

驗證結果以下圖所示,則無密鑰登錄配置成功。接下來就能夠安裝hadoop了。

 

 

 

2、安裝hadoop

1.下載編譯安裝包

從官網下載hadoop2.5.2安裝包,這時候要注意hadoop-2.5.2.tar.gz是編譯過的文件,而hadoop-2.5.2-src.tar.gz是沒有編譯的文件。因爲官網的hadoop-2.5.2.tar.gz編譯的是32位系統的,因此咱們須要下載hadoop-2.5.2-src.tar.gz,再在本機上編譯成64位系統的安裝包。

可參照教程http://f.dataguru.cn/forum.php?mod=viewthread&tid=454226

(因爲村長已經編譯好了,咱們直接去他的機子下面將安裝包scp到本身的機子上便可。村長的主機ip是192.168.0.10)

1)  master/slave1/slave2 建立hadoop文件夾存放安裝包和解壓後的文件:

cd /home

mkdir hadoop

2)  從村長的機子上將安裝包scp到本身的機子上

cd /root/download/hadoop-2.5.2-src/hadoop-dist/target

scp hadoop-2.5.2.tar.gz root@192.168.0.6:/home/hadoop/

scp hadoop-2.5.2.tar.gz root@192.168.0.5:/home/hadoop/

scp hadoop-2.5.2.tar.gz root@192.168.0.2:/home/hadoop/ 

 

2. 解壓安裝包

cd /home/hadoop

tar -zvxf hadoop-2.5.2.tar.gz

 

3. 修改配置

cd /home/hadoop/hadoop-2.5.2/etc/hadoop

1)  vi core-site.xml

<configuration>

  <property>

    <name>hadoop.tmp.dir</name>

      <value>/home/hadoop/tmp</value>

    <description>Abase for other   temporary directories.</description>

  </property>

  <property>

    <name>fs.defaultFS</name>

    <value>hdfs://master:9000</value>

  </property>

  <property>

    <name>io.file.buffer.size</name>

    <value>4096</value>

  </property>

</configuration>

 

2)  vi hdfs-site.xml

<configuration>

  <property>

    <name>dfs.nameservices</name>

      <value>hadoop-cluster1</value>

  </property>

  <property>

    <name>dfs.namenode.secondary.http-address</name>

    <value>master:50090</value>

  </property>

<property>

      <name>dfs.namenode.name.dir</name>

      <value>file:///home/hadoop/dfs/name</value>

  </property>

  <property>

      <name>dfs.datanode.data.dir</name>

    <value>file:///home/hadoop/dfs/data</value>

  </property>

  <property>

    <name>dfs.replication</name>

    <value>3</value>              --------------------------slave個數

  </property>

  <property>

      <name>dfs.webhdfs.enabled</name>

    <value>true</value>

  </property>

</configuration>

 

3)  vi mapred-site.xml

<configuration>

  <property>

      <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>

  <property>

      <name>mapreduce.jobtracker.http.address</name>

    <value>master:50030</value>

  </property>

  <property>

      <name>mapreduce.jobhistory.address</name>

    <value>master:10020</value>

  </property>

  <property>

      <name>mapreduce.jobhistory.webapp.address</name>

    <value>master:19888</value>

  </property>

</configuration>

 

4)  vi yarn-site.xml

<configuration> 

 

<!-- Site   specific YARN configuration properties -->   

    <property> 

        <name>yarn.nodemanager.aux-services</name> 

          <value>mapreduce_shuffle</value> 

    </property> 

    <property> 

          <name>yarn.resourcemanager.address</name> 

        <value>master:8032</value> 

    </property> 

    <property> 

        <name>yarn.resourcemanager.scheduler.address</name> 

        <value>master:8030</value> 

    </property> 

    <property> 

          <name>yarn.resourcemanager.resource-tracker.address</name> 

        <value>master:8031</value> 

    </property> 

    <property> 

          <name>yarn.resourcemanager.admin.address</name> 

        <value>master:8033</value> 

    </property> 

    <property> 

          <name>yarn.resourcemanager.webapp.address</name> 

        <value>master:8088</value> 

    </property> 

</configuration>

 

5)  vi slaves  

刪掉localhost

輸入

master  (這樣master自己也做爲一個dataNode)

slave1

slave2

 

6)  vi hadoop-env.sh

修改export JAVA_HOME=/jdk17

 

7)  vi yarn-env.sh

去掉#

export JAVA_HOME=/jdk17

 

8)  將主機配置scp到兩臺slave上

cd /home/hadoop/hadoop-2.5.2/etc

scp -r  hadoop root@slave1:/home/hadoop/hadoop-2.5.2/etc

scp -r  hadoop root@slave2:/home/hadoop/hadoop-2.5.2/etc

 

4. 主機上格式化文件系統

cd /home/hadoop/hadoop-2.5.2

bin/hdfs namenode -format 

(格式化過程當中會有一次須要輸入yes)

 

5.啓動

sbin/start-dfs.sh

sbin/start-yarn.sh

 

(或者sbin/start-all.sh)

 

查看啓動的進程jps

 

PS:中止

cd /home/hadoop/hadoop-2.5.2

sbin/stop-dfs.sh 

sbin/stop-yarn.sh

 

6. 經過瀏覽器訪問

http://192.168.0.6:50070/

http://192.168.0.6:8088/

 

 

 

能夠看到live節點數和集羣節點數均爲3,基本能夠肯定配置成功了,接下來就來測試一下吧。

 

三.集羣測試

1.圓周率

這是mongodb蒙特卡洛算法計算圓周率的測試用例,pi後跟的兩個數字分別表示使用多少個map以及計算的精度。

cd /home/hadoop/hadoop-2.5.2

bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar pi 10 1000

測試結果以下:

[root@master ~]#   cd /home/hadoop/hadoop-2.5.2

[root@master   hadoop-2.5.2]# bin/yarn jar   share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar pi 10 1000

Number of   Maps  = 10

Samples per Map   = 1000

Wrote input for   Map #0

Wrote input for   Map #1

Wrote input for   Map #2

Wrote input for   Map #3

Wrote input for   Map #4

Wrote input for   Map #5

Wrote input for   Map #6

Wrote input for   Map #7

Wrote input for   Map #8

Wrote input for   Map #9

Starting Job

15/03/15   18:59:14 INFO client.RMProxy: Connecting to ResourceManager at   master/192.168.0.6:8032

15/03/15   18:59:14 INFO input.FileInputFormat: Total input paths to process : 10

15/03/15   18:59:14 INFO mapreduce.JobSubmitter: number of splits:10

15/03/15   18:59:15 INFO mapreduce.JobSubmitter: Submitting tokens for job:   job_1426367748160_0006

15/03/15   18:59:15 INFO impl.YarnClientImpl: Submitted application   application_1426367748160_0006

15/03/15   18:59:15 INFO mapreduce.Job: The url to track the job:   http://master:8088/proxy/application_1426367748160_0006/

15/03/15   18:59:15 INFO mapreduce.Job: Running job: job_1426367748160_0006

15/03/15 18:59:21   INFO mapreduce.Job: Job job_1426367748160_0006 running in uber mode : false

15/03/15   18:59:21 INFO mapreduce.Job:  map 0%   reduce 0%

15/03/15   18:59:34 INFO mapreduce.Job:  map 40%   reduce 0%

15/03/15   18:59:41 INFO mapreduce.Job:  map 100%   reduce 0%

15/03/15   18:59:42 INFO mapreduce.Job:  map 100%   reduce 100%

15/03/15   18:59:43 INFO mapreduce.Job: Job job_1426367748160_0006 completed   successfully

15/03/15   18:59:44 INFO mapreduce.Job: Counters: 49

        File System Counters

                FILE: Number of bytes   read=226

                FILE: Number of bytes   written=1070916

                FILE: Number of read   operations=0

                FILE: Number of large read   operations=0

                FILE: Number of write   operations=0

                HDFS: Number of bytes   read=2610

                HDFS: Number of bytes   written=215

                HDFS: Number of read   operations=43

                HDFS: Number of large read   operations=0

                HDFS: Number of write   operations=3

        Job Counters

                Launched map tasks=10

                Launched reduce tasks=1

                Data-local map tasks=10

                Total time spent by all maps   in occupied slots (ms)=154713

                Total time spent by all   reduces in occupied slots (ms)=5591

                Total time spent by all map   tasks (ms)=154713

                Total time spent by all   reduce tasks (ms)=5591

                Total vcore-seconds taken by   all map tasks=154713

                Total vcore-seconds taken by   all reduce tasks=5591

                Total megabyte-seconds taken   by all map tasks=158426112

                Total megabyte-seconds taken   by all reduce tasks=5725184

        Map-Reduce Framework

                Map input records=10

                Map output records=20

                Map output bytes=180

                Map output materialized   bytes=280

                Input split bytes=1430

                Combine input records=0

                Combine output records=0

                Reduce input groups=2

                Reduce shuffle bytes=280

                Reduce input records=20

                Reduce output records=0

                Spilled Records=40

                Shuffled Maps =10

                Failed Shuffles=0

                Merged Map outputs=10

                GC time elapsed (ms)=930

                CPU time spent (ms)=5540

                Physical memory (bytes)   snapshot=2623418368

                Virtual memory (bytes)   snapshot=9755574272

                Total committed heap usage   (bytes)=1940914176

        Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

        File Input Format Counters

                Bytes Read=1180

        File Output Format Counters

                Bytes Written=97

Job Finished in   29.79 seconds

Estimated value   of Pi is 3.14080000000000000000

[root@master   hadoop-2.5.2]#

 

2. 單詞統計

1.進入到hadoop-2.5.2目錄下

  cd /home/hadoop/hadoop-2.5.2

2.建立tmp文件

  bin/hdfs dfs -mkdir /tmp

3.事先在hadoop-2.5.2目錄下建立好一個test.txt文件,做爲測試文件

4.將測試文件上傳到tmp文件中

  bin/hdfs dfs -copyFromLocal / home/hadoop/hadoop-2.5.2/test.txt   /tmp

5.查看是否上傳成功

  bin/hdfs dfs -ls /tmp

6.執行hadoop-mapreduce-examples-2.5.2.jar裏面的wordcount

  bin/hadoop jar   ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar wordcount   /tmp/test.txt /tmp-output

7.查看結果

bin/hdfs dfs -ls   /tmp-output

bin/hadoop fs   -cat /tmp-output/part-r-00000

Ps:再次運行時須要先刪除hdfs系統目錄裏的tmp-output文件夾

[root@master   hadoop-2.5.2]# bin/hdfs dfs -rmdir /tmp-output

 

 

四.實驗中遇到的問題

1.Namenode沒法啓動

形成有個問題的緣由最多見的是屢次格式化namenode形成的,即 namespaceID 不一致。這種狀況清空logs,重啓啓動有時候甚至有時候都沒有datanode的日誌產生。

解決方法:找到不一致的 VERSION 修改  namespaceID

或者:刪除 hdfs/data 中所有文件,從新初始化namenode,這樣作數據就所有沒了(看到的結果是這樣)

PS : 還有一種說法形成啓動不了datanode的緣由是 data文件的權限問題,這個問題目前沒有遇到

刪除了data中全部文件從新初始化後,問題解決

 

2. nodemanager沒法啓動

一開始他運行了,後來又中止了,查資料將全部的name文件夾清空,從新格式化,以後nodemanager出現。

 

3.結點信息中有兩個slave的信息,可是集羣信息中只有master的信息。

解決辦法:將主機配置文件scp到slave上

cd /home/hadoop/hadoop-2.5.2/etc

scp -r  hadoop root@slave1:/home/hadoop/hadoop-2.5.2/etc

scp -r  hadoop root@slave2:/home/hadoop/hadoop-2.5.2/etc

 

4. 單詞統計失敗

出錯緣由與解決方法

http://www.ituring.com.cn/article/63927

相關文章
相關標籤/搜索