Hadoop2.2.0單節點安裝及測試

時間 2019-12-23

標籤 hadoop2.2.0 hadoop 節點安裝測試欄目 Hadoop 简体版

原文原文鏈接

一：環境準備：基於Windows下的VMware Player4.0.3中的ubuntu12.04-64server. html

下載免費的VMware Player並安裝好;
下載免費的Ubuntu 12.04 server版並在VMware中安裝好；

二：基礎安裝： java

執行以下命令升級部分軟件和把ssh安裝好：

(1) sudo apt-get update; node

(2) sudo apt-get upgrade; python

(3) sudo apt-get install openssh-server; linux

有兩種方法能夠安裝Oracle JDK(本文采用第一種)。

方法一：經過webupd8team自動安裝，執行命令以下： web

(1) sudo apt-get install python-software-properties apache

(2) sudo add-apt-repository ppa:webupd8team/java ubuntu

(3) sudo apt-get update oracle

(4) sudo apt-get install oracle-java6-installer ssh

方法二：手動安裝JDK1.6

(1) 下載jdk1.6http://www.oracle.com/technetwork/java/javase/downloads/jdk6u37-downloads-1859587.html，選擇jdk-6u37-linux-x64.bin。

(2) 執行chmod +x jdk-6u37-linux-x64.bin增長可執行權限；

(3) ./ jdk-6u37-linux-x64.bin直接解壓便可，建議放在/opt目錄下。

(4) 而後將解壓後的bin目錄加入到PATH環境變量中便可。

建立hadoop用戶。

(1) sudo addgroup hadoop

(2) sudo adduser –ingroup hadoop hduser

創建SSH信任關係，登陸localhost就不須要密碼

$ cd /home/hduser

$ ssh-keygen -t rsa -P "" #直接回車

$cat .ssh/id_rsa.pub >>.ssh/authorized_keys

注：可經過ssh localhost命令驗證。

三：正式安裝：

注：如下操做以hduser登陸進行操做。

下載hadoop2.2版本。地址：http://apache.dataguru.cn/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz。
執行tar zxf hadoop-2.2.0.tar.gz解壓至當前目錄/home/hduser目錄下。
mv hadoop-2.2.0 hadoop

四：配置hadoop:

編輯/home/hduser/hadoop/etc/hadoop/hadoop-env.sh

替換exportJAVA_HOME=${JAVA_HOME}爲以下：

exportJAVA_HOME=/usr/lib/jvm/java-6-oracle

編輯/home/hduser/hadoop/etc/hadoop/core-site.xml，在<configuration>中添加以下：

<name>hadoop.tmp.dir</name>

<value>/home/hduser/hadoop/tmp/hadoop-${user.name}</value>

<description>A base for other temporarydirectories.</description>

</property>

<name>fs.default.name</name>

<value>hdfs://localhost:8010</value>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri’s scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri’s authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

備註：配置了/home/hduser/hadoop/tmp/這個目錄，必須執行mkdir /home/hduser/hadoop/tmp/建立它，不然後面運行會報錯。

編輯/home/hduser/hadoop/etc/hadoop/mapred-site.xml：

(1) mv /home/hduser/hadoop/etc/hadoop/mapred-site.xml.template/home/hduser/hadoop/etc/hadoop/mapred-site.xml

(2) 在<configuration>中添加以下:

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", thenjobs are run in-process as a single map

and reduce task.

</description>

</property>

<name>mapred.map.tasks</name>

<description>As a rule of thumb, use 10x the number of slaves(i.e., number of tasktrackers).

</description>

</property>

<name>mapred.reduce.tasks</name>

<description>As a rule of thumb, use 2x the number of slaveprocessors (i.e., number of tasktrackers).

</description>

</property>

編輯/home/hduser/hadoop/etc/hadoop/hdfs-site.xml，在<configuration>中添加以下：

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file iscreated.

The default is used if replication is not specified in create time.

</description>

</property>

五：運行Hadoop

在初次運行Hadoop的時候須要初始化Hadoop文件系統，命令以下：

$cd /home/hduser/hadoop/bin

$./hdfs namenode -format

若是執行成功，你會在日誌中(倒數幾行)找到以下成功的提示信息：

common.Storage: Storage directory/home/hduser/hadoop/tmp/hadoop-hduser/dfs/name has been successfully formatted.

運行命令以下：

$cd /home/hduser/hadoop/sbin/

$./start-dfs.sh

注：該過程須要屢次輸入密碼, 若是不想屢次輸入密碼，可先用ssh創建信任。

hduser@ubuntu :~/hadoop/sbin$ jps

4266 SecondaryNameNode

4116 DataNode

4002 NameNode

注：用jps查看啓動了三個進程。

$./start-yarn.sh

hduser@ubuntu :~/hadoop/sbin$ jps

4688 NodeManager

4266 SecondaryNameNode

4116 DataNode

4002 NameNode

4413 ResourceManager

六：查看Hadoop資源管理器

http://192.168.128.129:8088/，將其中的192.168.128.129替換爲你的實際IP地址。

七：測試Hadoop

cd /home/hduser

$wget http://www.gutenberg.org/cache/epub/20417/pg20417.txt

$cd hadoop

$ bin/hdfs dfs -mkdir /tmp

$ bin/hdfs dfs -copyFromLocal /home/hduser/pg20417.txt /tmp

bin/hdfs dfs -ls /tmp

$bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /tmp/ /tmp-output

若是一切正常的話，會輸入相應的結果，能夠從屏幕輸出看到。

bin/hadoop fs -ls /tmp-output能夠查看/tmp-output任務完成狀況，會顯示兩個文件：

-rw-r--r-- 1 hadoop supergroup 0 2013-10-28 23:09 /tmp-output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 196192 2013-10-28 23:09 /tmp-output/part-r-00000

經過 bin/hadoop fs -cat /tmp-output/part-r-00000 查看結果

八：中止Hadoop

若中止hadoop，依次運行以下命令：

$./stop-yarn.sh

$./stop-dfs.sh