一:環境準備:基於Windows下的VMware Player4.0.3中的ubuntu12.04-64server. html
二:基礎安裝: java
(1) sudo apt-get update; node
(2) sudo apt-get upgrade; python
(3) sudo apt-get install openssh-server; linux
方法一:經過webupd8team自動安裝,執行命令以下: web
(1) sudo apt-get install python-software-properties apache
(2) sudo add-apt-repository ppa:webupd8team/java ubuntu
(3) sudo apt-get update oracle
(4) sudo apt-get install oracle-java6-installer ssh
方法二:手動安裝JDK1.6
(1) 下載jdk1.6http://www.oracle.com/technetwork/java/javase/downloads/jdk6u37-downloads-1859587.html,選擇jdk-6u37-linux-x64.bin。
(2) 執行chmod +x jdk-6u37-linux-x64.bin增長可執行權限;
(3) ./ jdk-6u37-linux-x64.bin直接解壓便可,建議放在/opt目錄下。
(4) 而後將解壓後的bin目錄加入到PATH環境變量中便可。
(1) sudo addgroup hadoop
(2) sudo adduser –ingroup hadoop hduser
$ cd /home/hduser
$ ssh-keygen -t rsa -P "" #直接回車
$cat .ssh/id_rsa.pub >>.ssh/authorized_keys
注:可經過ssh localhost命令驗證。
三:正式安裝:
注:如下操做以hduser登陸進行操做。
四:配置hadoop:
替換exportJAVA_HOME=${JAVA_HOME}爲以下:
exportJAVA_HOME=/usr/lib/jvm/java-6-oracle
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop/tmp/hadoop-${user.name}</value>
<description>A base for other temporarydirectories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8010</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
備註:配置了/home/hduser/hadoop/tmp/這個目錄,必須執行mkdir /home/hduser/hadoop/tmp/建立它,不然後面運行會報錯。
(1) mv /home/hduser/hadoop/etc/hadoop/mapred-site.xml.template/home/hduser/hadoop/etc/hadoop/mapred-site.xml
(2) 在<configuration>中添加以下:
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", thenjobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>10</value>
<description>As a rule of thumb, use 10x the number of slaves(i.e., number of tasktrackers).
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>As a rule of thumb, use 2x the number of slaveprocessors (i.e., number of tasktrackers).
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file iscreated.
The default is used if replication is not specified in create time.
</description>
</property>
五:運行Hadoop
在初次運行Hadoop的時候須要初始化Hadoop文件系統,命令以下:
$cd /home/hduser/hadoop/bin
$./hdfs namenode -format
若是執行成功,你會在日誌中(倒數幾行)找到以下成功的提示信息:
common.Storage: Storage directory/home/hduser/hadoop/tmp/hadoop-hduser/dfs/name has been successfully formatted.
運行命令以下:
$cd /home/hduser/hadoop/sbin/
$./start-dfs.sh
注:該過程須要屢次輸入密碼, 若是不想屢次輸入密碼,可先用ssh創建信任。
hduser@ubuntu :~/hadoop/sbin$ jps
4266 SecondaryNameNode
4116 DataNode
4002 NameNode
注:用jps查看啓動了三個進程。
$./start-yarn.sh
hduser@ubuntu :~/hadoop/sbin$ jps
4688 NodeManager
4266 SecondaryNameNode
4116 DataNode
4002 NameNode
4413 ResourceManager
六:查看Hadoop資源管理器
http://192.168.128.129:8088/,將其中的192.168.128.129替換爲你的實際IP地址。
七:測試Hadoop
cd /home/hduser
$wget http://www.gutenberg.org/cache/epub/20417/pg20417.txt
$cd hadoop
$ bin/hdfs dfs -mkdir /tmp
$ bin/hdfs dfs -copyFromLocal /home/hduser/pg20417.txt /tmp
bin/hdfs dfs -ls /tmp
$bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /tmp/ /tmp-output
若是一切正常的話,會輸入相應的結果,能夠從屏幕輸出看到。
bin/hadoop fs -ls /tmp-output能夠查看/tmp-output任務完成狀況,會顯示兩個文件:
-rw-r--r-- 1 hadoop supergroup 0 2013-10-28 23:09 /tmp-output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 196192 2013-10-28 23:09 /tmp-output/part-r-00000
經過 bin/hadoop fs -cat /tmp-output/part-r-00000 查看結果
八:中止Hadoop
若中止hadoop,依次運行以下命令:
$./stop-yarn.sh
$./stop-dfs.sh