Hadoop詳細安裝配置過程

步驟一基礎環境搭建 html

 

1.下載並安裝ubuntukylin-15.10-desktop-amd64.iso java

2.安裝ssh node

sudo apt-get install openssh-server openssh-client linux

3.搭建vsftpd apache

#sudo apt-get update ubuntu

#sudo apt-get install vsftpd vim

配置參考 http://www.linuxidc.com/Linux/2015-01/111970.htm dom

http://jingyan.baidu.com/article/67508eb4d6c4fd9ccb1ce470.html ssh

http://zhidao.baidu.com/link?url=vEmPmg5sV6IUfT4qZqivtiHtXWUoAQalGAL7bOC5XrTumpLRDfa-OmFcTzPetNZUqAi0hgjBGGdpnldob6hL5IhgtGVWDGSmS88iLvhCO4C ide

vsftpd的開始、關閉和重啓

$sudo /etc/init.d/vsftpd start   #開始
$sudo /etc/init.d/vsftpd stop    #關閉
$sudo /etc/init.d/vsftpd restart   #重啓

4.安裝jdk1.7

sudo chown -R hadoop:hadoop /opt

cp /soft/jdk-7u79-linux-x64.gz /opt

sudo vi /etc/profile

alias untar='tar -zxvf'

sudo source /etc/profile

source /etc/profile

untar jdk*

環境變量配置
# vi /etc/profile
●在profile文件最後加上
# set java environment
export JAVA_HOME=/opt/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
配置完成後,保存退出。
●不重啓,更新命令
#source /etc/profile
●測試是否安裝成功
# Java –version

其餘問題:

1.sudo 出現unable to resolve host 解決方法

參考 http://blog.csdn.net/yuzhiyuxia/article/details/19998665

2.Linux開機時停在 Starting sendmail 不動了的解決方案

參考 http://blog.chinaunix.net/uid-21675795-id-356995.html

3.ubuntu 安裝軟件時出現 E: Unable to locate package vsftpd

參考 http://www.ithao123.cn/content-2584008.html

4.[Linux/Ubuntu] vi/vim 使用方法講解

參考 http://www.cnblogs.com/emanlee/archive/2011/11/10/2243930.html

 

步驟二環境克隆

 

1.克隆master虛擬機至node1 node2

分別修改master的主機名爲master、node1的主機名爲node一、node2的主機名爲node2

(啓動node一、node2系統默認分配遞增ip,無需手動修改)

分別修改/etc/hosts中的ip和主機名(包含其餘節點ip和主機名)

---------

步驟三配置ssh免密碼連入

hadoop@node1:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

Generating public/private dsa key pair.

Created directory '/home/hadoop/.ssh'.

Your identification has been saved in /home/hadoop/.ssh/id_dsa.

Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub.

The key fingerprint is:

SHA256:B8vBju/uc3kl/v9lrMqtltttttCcXgRkQPbVoU hadoop@node1

The key's randomart image is:

+---[DSA 1024]----+

| ...o.o. |

| o+.E . |

| . oo + |

| .. + + |

|o +. o ooo +|

|=o. . o. ooo. o.|

|*o... .+=o .+++.+|

+----[SHA256]-----+

hadoop@node1:~$ cd .ssh

hadoop@node1:~/.ssh$ ll

總用量 16

drwx------ 2 hadoop hadoop 4096 Jul 24 20:31 ./

drwxr-xr-x 18 hadoop hadoop 4096 Jul 24 20:31 ../

-rw------- 1 hadoop hadoop 668 Jul 24 20:31 id_dsa

-rw-r--r-- 1 hadoop hadoop 602 Jul 24 20:31 id_dsa.pub

hadoop@node1:~/.ssh$ cat id_dsa.pub >> authorized_keys

hadoop@node1:~/.ssh$ ll

總用量 20

drwx------ 2 hadoop hadoop 4096 Jul 24 20:32 ./

drwxr-xr-x 18 hadoop hadoop 4096 Jul 24 20:31 ../

-rw-rw-r-- 1 hadoop hadoop 602 Jul 24 20:32 authorized_keys

-rw------- 1 hadoop hadoop 668 Jul 24 20:31 id_dsa

-rw-r--r-- 1 hadoop hadoop 602 Jul 24 20:31 id_dsa.pub

步驟四機迴環ssh免密碼登陸測試

hadoop@node1:~/.ssh$ ssh localhost

The authenticity of host 'localhost (127.0.0.1)' can't be established.

ECDSA key fingerprint is SHA256:daO0dssyqt12tt9yGUauImOh6tt6A1SgxzSfSmpQqJVEiQTxas.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.

Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)

* Documentation: https://help.ubuntu.com/

270 packages can be updated.

178 updates are security updates.

New release '16.04 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Sun Jul 24 20:21:39 2016 from 192.168.219.1

hadoop@node1:~$ exit

註銷

Connection to localhost closed.

hadoop@node1:~/.ssh$

出現以上信息說明操做成功,其餘兩節點一樣操做

結點(master)能經過SSH免密碼錄兩子結點slave

hadoop@node1:~/.ssh$ scp hadoop@master:~/.ssh/id_dsa.pub ./master_dsa.pub

The authenticity of host 'master (192.168.219.128)' can't be established.

ECDSA key fingerprint is SHA256:daO0dssyqtt9yGUuImOh646A1SgxzSfatSmpQqJVEiQTxas.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'master,192.168.219.128' (ECDSA) to the list of known hosts.

hadoop@master's password:

id_dsa.pub 100% 603 0.6KB/s 00:00

hadoop@node1:~/.ssh$ cat master_dsa.pub >> authorized_keys

如上過程顯示node1結點經過scp令遠程master結點並複製master的公鑰文件到當前的目錄下

這一過程須要密碼驗證接着master結點的公鑰文件追加至authorized_keys文件經過這步操做

若是不出問題master結點就能夠經過ssh遠程免密碼鏈接node1結點了master結點操做以下

hadoop@master:~/.ssh$ ssh node1

The authenticity of host 'node1 (192.168.219.129)' can't be established.

ECDSA key fingerprint is SHA256:daO0dssyqt9yGUuImOh3466A1SttgxzSfSmpQqJVEiQTxas.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'node1,192.168.219.129' (ECDSA) to the list of known hosts.

Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)

* Documentation: https://help.ubuntu.com/

270 packages can be updated.

178 updates are security updates.

New release '16.04 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Sun Jul 24 20:39:30 2016 from 192.168.219.1

hadoop@node1:~$ exit

註銷

Connection to node1 closed.

hadoop@master:~/.ssh$

由上圖能夠看出node1結點首鏈接時須要「YES」確認鏈接

這意味着master結點鏈接node1結點時須要人工詢問沒法自動鏈接

輸入yes後成功接入緊接着註銷退出至master結點要實現ssh免密碼鏈接至其它結點

還差一步只須要再執行一遍ssh node1若是沒有要求你輸入」yes」就算成功了過程以下

hadoop@master:~/.ssh$ ssh node1

Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)

* Documentation: https://help.ubuntu.com/

270 packages can be updated.

178 updates are security updates.

New release '16.04 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Sun Jul 24 20:47:20 2016 from 192.168.219.128

hadoop@node1:~$ exit

註銷

Connection to node1 closed.

hadoop@master:~/.ssh$

如上圖所,master已經能夠經過ssh免密碼錄至node1結點了。

對node2結點也能夠上面一樣的方法進行

表面上看,這兩結點的ssh免密碼錄已經配置成功,但咱們還須要對結點master也要進行上面的一樣工做,

這一步有點讓人困惑,但有緣由的,具體緣由現也說不太好,聽說真實物理結點時須要作這項工做,

jobtracker有可能會分佈其它結點上,jobtracker有不存master結點上的可能性。

對master自身進行ssh免密碼錄測試工做:

hadoop@master:~/.ssh$ scp hadoop@master:~/.ssh/id_dsa.pub ./master_dsa.pub

The authenticity of host 'master (127.0.0.1)' can't be established.

ECDSA key fingerprint is SHA256:daO0dssttqt9yGUuImOahtt166AgxttzSfSmpQqJVEiQTxas.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'master' (ECDSA) to the list of known hosts.

id_dsa.pub 100% 603 0.6KB/s 00:00

hadoop@master:~/.ssh$ cat master_dsa.pub >> authorized_key

hadoop@master:~/.ssh$ ssh master

Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)

* Documentation: https://help.ubuntu.com/

270 packages can be updated.

178 updates are security updates.

New release '16.04 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Sun Jul 24 20:39:24 2016 from 192.168.219.1

hadoop@master:~$ exit

註銷

Connection to master closed.

至此,SSH免密碼錄已經配置成功。

-------------------------

解壓hadoop-2.6.4.tar.gz

/opt$untar hadoop-2.6.4.tar.gz

mv hadoop-2.6.4.tar.gz hadoop

步驟五更新環境變量

vi /etc/profile

export JAVA_HOME=/opt/jdk1.7.0_79

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export HADOOP_HOME=/opt/hadoop

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

alias untar='tar -zxvf'

alias viprofile='vi /etc/profile'

alias sourceprofile='source /etc/profile'

alias catprofile='cat /etc/profile'

alias cdhadoop='cd /opt/hadoop/'

source /etc/profile

------------------

步驟六修改配置

一共有7文件要修改:

$HADOOP_HOME/etc/hadoop/hadoop-env.sh

$HADOOP_HOME/etc/hadoop/yarn-env.sh

$HADOOP_HOME/etc/hadoop/core-site.xml

$HADOOP_HOME/etc/hadoop/hdfs-site.xml

$HADOOP_HOME/etc/hadoop/mapred-site.xml

$HADOOP_HOME/etc/hadoop/yarn-site.xml

$HADOOP_HOME/etc/hadoop/slaves

$HADOOP_HOME表hadoop根目錄

a) hadoop-env.sh yarn-env.sh

這二文件修改JAVA_HOME後的目錄,改爲實際本機jdk所目錄位置

vi etc/hadoop/hadoop-env.sh (及 vi etc/hadoop/yarn-env.sh)

到下面這行的位置,改爲(jdk目錄位置,你們根據實際狀況修改)

export JAVA_HOME=/opt/jdk1.7.0_79

另外 hadoop-env.sh , 建議加上這句:

export HADOOP_PREFIX=/opt/hadoop

b) core-site.xml 參考下面的內容修改

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://master:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/opt/hadoop/tmp</value>

</property>

</configuration>

注:/opt/hadoop/tmp 目錄如不存,則先mkdir手動建立

core-site.xml的完整參數請參考

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/core-default.xml

c) hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.datanode.ipc.address</name>

<value>0.0.0.0:50020</value>

</property>

<property>

<name>dfs.datanode.http.address</name>

<value>0.0.0.0:50075</value>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

</configuration>

:dfs.replication 表數據副數,一不大於 datanode 的節點數。

hdfs-site.xml的完整參數請參考

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

d) mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

mapred-site.xml的完整參數請參考

http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

e)yarn-site.xml

<?xml version="1.0"?>

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

yarn-site.xml的完整參數請參考

http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

另外,hadoop 1.x與2.x相比, 1.x的不少參數已經被標識過期,具體可參考

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

最後一文件slaves暫時無論(能夠先mv slaves slaves.bak 將它更名),上述配置弄後,就能夠master上啓用 NameNode測試了,方法:

$HADOOP_HOME/bin/hdfs namenode –format 式化

16/07/25 。。。

16/07/25 20:34:42 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1076359968-127.0.0.1-140082506

16/07/25 20:34:42 INFO common.Storage: Storage directory /opt/hadoop/tmp/dfs/name has been successfully formatted.

16/07/25 20:34:43 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

16/07/25 20:34:43 INFO util.ExitUtil: Exiting with status 0

16/07/25 20:34:43 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at master/127.0.0.1

************************************************************/

等看到這時,表示格式化ok

$HADOOP_HOME/sbin/start-dfs.sh

動完成後,輸入jps (ps -ef | grep ...)查看進程,若是看到如下二進程:

5161 SecondaryNameNode

4989 NameNode

master節點基ok了

再輸入$HADOOP_HOME/sbin/start-yarn.sh ,完成後,再輸入jps查看進程

5161 SecondaryNameNode

5320 ResourceManager

4989 NameNode

若是看到這3進程,表yarn也ok了

f) 修改 /opt/hadoop/etc/hadoop/slaves

若是剛纔mv slaves slaves.bak對該文件重名過,先運行 mv slaves.bak slaves 把名字改回來,再

vi slaves 編輯該文件,輸入

node1

node2

保存退出,最後運行

$HADOOP_HOME/sbin/stop-dfs.sh

$HADOOP_HOME/sbin/stop-yarn.sh

停掉剛纔動的服務

步驟七master上的hadoop目錄複製到 node1,node2

仍然保持master器上

cd 先進入目錄 cd /opt

zip -r hadoop.zip hadoop

scp -r hadoop.zip hadoop@node1:/opt/

scp -r hadoop.zip hadoop@node2:/opt/

unzip hadoop.zip

注: node1 、 node2 上的hadoop臨時目錄(tmp)及數據目錄(data),仍然要先手動建立。

-----

步驟八驗證

master節點上,從新

$HADOOP_HOME/sbin/start-dfs.sh

$HADOOP_HOME/sbin/start-yarn.sh

------

hadoop@master:/opt/hadoop/sbin$ start-dfs.sh

Starting namenodes on [master]

master: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-master.out

node1: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-node1.out

node2: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-node2.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out

------

hadoop@master:/opt/hadoop/sbin$ start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-master.out

node1: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-node1.out

node2: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-node2.out

------

的話,master節點上有幾下3進程:

ps -ef | grep ResourceManager

ps -ef | grep SecondaryNameNode

ps -ef | grep NameNode

7482 ResourceManager

7335 SecondaryNameNode

7159 NameNode

slave0一、slave02上有幾下2進程:

ps -ef | grep DataNode

ps -ef | grep NodeManager

2296 DataNode

2398 NodeManager

同時可瀏覽:

http://master:50070/

http://master:8088/

查看狀態

t1JPG

t2JPG

 

另外也能夠經過 bin/hdfs dfsadmin -report 查看hdfs的狀態報告

其它注意事項:

a) master(即:namenode節點)若要從新式化,請先清空各datanode上的data目錄(最連tmp目錄也一塊兒清空),式化完成後,動dfs時,datanode會動失

b) 若是以爲master器上只運行namenode比較浪費,想把master也當成一datanode,直接slaves文件裏,添加一行master便可

c) 了方便操做,可修改/etc/profile,把hadoop所需的lib目錄,先加到CLASSPATH環境變量,同時把hadoop/bin,hadoop/sbin目錄也加入到PATH變量,可參考下面的內容(根據實際狀況修改)

export HADOOP_HOME=/home/hadoop/hadoop-2.6.0

export JAVA_HOME=/usr/java/jdk1.7.0_51

export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

 

 

                                                                                                                                                                                                                                                                                                                 by colplay

                                                                                                                                                                                                                                                                                                              2016.07.25

相關文章
相關標籤/搜索