步驟一:基礎環境搭建 html
1.下載並安裝ubuntukylin-15.10-desktop-amd64.iso java
2.安裝ssh node
sudo apt-get install openssh-server openssh-client linux
3.搭建vsftpd apache
#sudo apt-get update ubuntu
#sudo apt-get install vsftpd vim
配置參考 http://www.linuxidc.com/Linux/2015-01/111970.htm dom
http://jingyan.baidu.com/article/67508eb4d6c4fd9ccb1ce470.html ssh
http://zhidao.baidu.com/link?url=vEmPmg5sV6IUfT4qZqivtiHtXWUoAQalGAL7bOC5XrTumpLRDfa-OmFcTzPetNZUqAi0hgjBGGdpnldob6hL5IhgtGVWDGSmS88iLvhCO4C ide
vsftpd的開始、關閉和重啓
$sudo /etc/init.d/vsftpd start #開始
$sudo /etc/init.d/vsftpd stop #關閉
$sudo /etc/init.d/vsftpd restart #重啓
4.安裝jdk1.7
sudo chown -R hadoop:hadoop /opt
cp /soft/jdk-7u79-linux-x64.gz /opt
sudo vi /etc/profile
alias untar='tar -zxvf'
sudo source /etc/profile
source /etc/profile
untar jdk*
環境變量配置
# vi /etc/profile
●在profile文件最後加上
# set java environment
export JAVA_HOME=/opt/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
配置完成後,保存退出。
●不重啓,更新命令
#source /etc/profile
●測試是否安裝成功
# Java –version
其餘問題:
1.sudo 出現unable to resolve host 解決方法
參考 http://blog.csdn.net/yuzhiyuxia/article/details/19998665
2.Linux開機時停在 Starting sendmail 不動了的解決方案
參考 http://blog.chinaunix.net/uid-21675795-id-356995.html
3.ubuntu 安裝軟件時出現 E: Unable to locate package vsftpd
參考 http://www.ithao123.cn/content-2584008.html
4.[Linux/Ubuntu] vi/vim 使用方法講解
參考 http://www.cnblogs.com/emanlee/archive/2011/11/10/2243930.html
步驟二:環境克隆
1.克隆master虛擬機至node1 、node2
分別修改master的主機名爲master、node1的主機名爲node一、node2的主機名爲node2
(啓動node一、node2系統默認分配遞增ip,無需手動修改)
分別修改/etc/hosts中的ip和主機名(包含其餘節點ip和主機名)
---------
步驟三:配置ssh免密碼連入
hadoop@node1:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_dsa.
Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub.
The key fingerprint is:
SHA256:B8vBju/uc3kl/v9lrMqtltttttCcXgRkQPbVoU hadoop@node1
The key's randomart image is:
+---[DSA 1024]----+
| ...o.o. |
| o+.E . |
| . oo + |
| .. + + |
|o +. o ooo +|
|=o. . o. ooo. o.|
|*o... .+=o .+++.+|
+----[SHA256]-----+
hadoop@node1:~$ cd .ssh
hadoop@node1:~/.ssh$ ll
總用量 16
drwx------ 2 hadoop hadoop 4096 Jul 24 20:31 ./
drwxr-xr-x 18 hadoop hadoop 4096 Jul 24 20:31 ../
-rw------- 1 hadoop hadoop 668 Jul 24 20:31 id_dsa
-rw-r--r-- 1 hadoop hadoop 602 Jul 24 20:31 id_dsa.pub
hadoop@node1:~/.ssh$ cat id_dsa.pub >> authorized_keys
hadoop@node1:~/.ssh$ ll
總用量 20
drwx------ 2 hadoop hadoop 4096 Jul 24 20:32 ./
drwxr-xr-x 18 hadoop hadoop 4096 Jul 24 20:31 ../
-rw-rw-r-- 1 hadoop hadoop 602 Jul 24 20:32 authorized_keys
-rw------- 1 hadoop hadoop 668 Jul 24 20:31 id_dsa
-rw-r--r-- 1 hadoop hadoop 602 Jul 24 20:31 id_dsa.pub
步驟四:單機迴環ssh免密碼登陸測試
hadoop@node1:~/.ssh$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:daO0dssyqt12tt9yGUauImOh6tt6A1SgxzSfSmpQqJVEiQTxas.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)
* Documentation: https://help.ubuntu.com/
270 packages can be updated.
178 updates are security updates.
New release '16.04 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Sun Jul 24 20:21:39 2016 from 192.168.219.1
hadoop@node1:~$ exit
註銷
Connection to localhost closed.
hadoop@node1:~/.ssh$
出現以上信息說明操做成功,其餘兩個節點一樣操做
讓主結點(master)能經過SSH免密碼登錄兩個子結點(slave)
hadoop@node1:~/.ssh$ scp hadoop@master:~/.ssh/id_dsa.pub ./master_dsa.pub
The authenticity of host 'master (192.168.219.128)' can't be established.
ECDSA key fingerprint is SHA256:daO0dssyqtt9yGUuImOh646A1SgxzSfatSmpQqJVEiQTxas.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master,192.168.219.128' (ECDSA) to the list of known hosts.
hadoop@master's password:
id_dsa.pub 100% 603 0.6KB/s 00:00
hadoop@node1:~/.ssh$ cat master_dsa.pub >> authorized_keys
如上過程顯示了node1結點經過scp命令遠程登錄master結點,並複製master的公鑰文件到當前的目錄下,
這一過程須要密碼驗證。接着,將master結點的公鑰文件追加至authorized_keys文件中,經過這步操做,
若是不出問題,master結點就能夠經過ssh遠程免密碼鏈接node1結點了。在master結點中操做以下:
hadoop@master:~/.ssh$ ssh node1
The authenticity of host 'node1 (192.168.219.129)' can't be established.
ECDSA key fingerprint is SHA256:daO0dssyqt9yGUuImOh3466A1SttgxzSfSmpQqJVEiQTxas.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.219.129' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)
* Documentation: https://help.ubuntu.com/
270 packages can be updated.
178 updates are security updates.
New release '16.04 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Sun Jul 24 20:39:30 2016 from 192.168.219.1
hadoop@node1:~$ exit
註銷
Connection to node1 closed.
hadoop@master:~/.ssh$
由上圖能夠看出,node1結點首次鏈接時須要,「YES」確認鏈接,
這意味着master結點鏈接node1結點時須要人工詢問,沒法自動鏈接,
輸入yes後成功接入,緊接着註銷退出至master結點。要實現ssh免密碼鏈接至其它結點,
還差一步,只須要再執行一遍ssh node1,若是沒有要求你輸入」yes」,就算成功了,過程以下:
hadoop@master:~/.ssh$ ssh node1
Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)
* Documentation: https://help.ubuntu.com/
270 packages can be updated.
178 updates are security updates.
New release '16.04 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Sun Jul 24 20:47:20 2016 from 192.168.219.128
hadoop@node1:~$ exit
註銷
Connection to node1 closed.
hadoop@master:~/.ssh$
如上圖所示,master已經能夠經過ssh免密碼登錄至node1結點了。
對node2結點也能夠用上面一樣的方法進行
表面上看,這兩個結點的ssh免密碼登錄已經配置成功,但是咱們還須要對主結點master也要進行上面的一樣工做,
這一步有點讓人困惑,但是這是有緣由的,具體緣由現在也說不太好,聽說是真實物理結點時須要作這項工做,
因爲jobtracker有可能會分佈在其它結點上,jobtracker有不存在master結點上的可能性。
對master自身進行ssh免密碼登錄測試工做:
hadoop@master:~/.ssh$ scp hadoop@master:~/.ssh/id_dsa.pub ./master_dsa.pub
The authenticity of host 'master (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:daO0dssttqt9yGUuImOahtt166AgxttzSfSmpQqJVEiQTxas.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master' (ECDSA) to the list of known hosts.
id_dsa.pub 100% 603 0.6KB/s 00:00
hadoop@master:~/.ssh$ cat master_dsa.pub >> authorized_key
hadoop@master:~/.ssh$ ssh master
Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)
* Documentation: https://help.ubuntu.com/
270 packages can be updated.
178 updates are security updates.
New release '16.04 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Sun Jul 24 20:39:24 2016 from 192.168.219.1
hadoop@master:~$ exit
註銷
Connection to master closed.
至此,SSH免密碼登錄已經配置成功。
-------------------------
解壓hadoop-2.6.4.tar.gz
/opt$untar hadoop-2.6.4.tar.gz
mv hadoop-2.6.4.tar.gz hadoop
步驟五:更新環境變量
vi /etc/profile
export JAVA_HOME=/opt/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
alias untar='tar -zxvf'
alias viprofile='vi /etc/profile'
alias sourceprofile='source /etc/profile'
alias catprofile='cat /etc/profile'
alias cdhadoop='cd /opt/hadoop/'
source /etc/profile
------------------
步驟六:修改配置
一共有7個文件要修改:
$HADOOP_HOME/etc/hadoop/hadoop-env.sh
$HADOOP_HOME/etc/hadoop/yarn-env.sh
$HADOOP_HOME/etc/hadoop/core-site.xml
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
$HADOOP_HOME/etc/hadoop/mapred-site.xml
$HADOOP_HOME/etc/hadoop/yarn-site.xml
$HADOOP_HOME/etc/hadoop/slaves
其中$HADOOP_HOME表示hadoop根目錄
a) hadoop-env.sh 、yarn-env.sh
這二個文件主要是修改JAVA_HOME後的目錄,改爲實際本機jdk所在目錄位置
vi etc/hadoop/hadoop-env.sh (及 vi etc/hadoop/yarn-env.sh)
找到下面這行的位置,改爲(jdk目錄位置,你們根據實際狀況修改)
export JAVA_HOME=/opt/jdk1.7.0_79
另外 hadoop-env.sh中 , 建議加上這句:
export HADOOP_PREFIX=/opt/hadoop
b) core-site.xml 參考下面的內容修改:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
</configuration>
注:/opt/hadoop/tmp 目錄如不存在,則先mkdir手動建立
core-site.xml的完整參數請參考
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/core-default.xml
c) hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
注:dfs.replication 表示數據副本數,一般不大於 datanode 的節點數。
hdfs-site.xml的完整參數請參考
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
d) mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
mapred-site.xml的完整參數請參考
http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
e)yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
yarn-site.xml的完整參數請參考
http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
另外,hadoop 1.x與2.x相比, 1.x中的不少參數已經被標識爲過期,具體可參考
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
最後一個文件slaves暫時無論(能夠先用mv slaves slaves.bak 將它更名),上述配置弄好後,就能夠在master上啓用 NameNode測試了,方法:
$HADOOP_HOME/bin/hdfs namenode –format 先格式化
16/07/25 。。。
16/07/25 20:34:42 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1076359968-127.0.0.1-140082506
16/07/25 20:34:42 INFO common.Storage: Storage directory /opt/hadoop/tmp/dfs/name has been successfully formatted.
16/07/25 20:34:43 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
16/07/25 20:34:43 INFO util.ExitUtil: Exiting with status 0
16/07/25 20:34:43 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/127.0.0.1
************************************************************/
等看到這個時,表示格式化ok
$HADOOP_HOME/sbin/start-dfs.sh
啓動完成後,輸入jps (ps -ef | grep ...)查看進程,若是看到如下二個進程:
5161 SecondaryNameNode
4989 NameNode
表示master節點基本ok了
再輸入$HADOOP_HOME/sbin/start-yarn.sh ,完成後,再輸入jps查看進程
5161 SecondaryNameNode
5320 ResourceManager
4989 NameNode
若是看到這3個進程,表示yarn也ok了
f) 修改 /opt/hadoop/etc/hadoop/slaves
若是剛纔用mv slaves slaves.bak對該文件重命名過,先運行 mv slaves.bak slaves 把名字改回來,再
vi slaves 編輯該文件,輸入
node1
node2
保存退出,最後運行
$HADOOP_HOME/sbin/stop-dfs.sh
$HADOOP_HOME/sbin/stop-yarn.sh
停掉剛纔啓動的服務
步驟七:將master上的hadoop目錄複製到 node1,node2
仍然保持在master機器上
cd 先進入主目錄 cd /opt
zip -r hadoop.zip hadoop
scp -r hadoop.zip hadoop@node1:/opt/
scp -r hadoop.zip hadoop@node2:/opt/
unzip hadoop.zip
注: node1 、 node2 上的hadoop臨時目錄(tmp)及數據目錄(data),仍然要先手動建立。
-----
步驟八:驗證
master節點上,從新啓動
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
------
hadoop@master:/opt/hadoop/sbin$ start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-master.out
node1: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-node1.out
node2: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-node2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
------
hadoop@master:/opt/hadoop/sbin$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-master.out
node1: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-node1.out
node2: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-node2.out
------
順利的話,master節點上有幾下3個進程:
ps -ef | grep ResourceManager
ps -ef | grep SecondaryNameNode
ps -ef | grep NameNode
7482 ResourceManager
7335 SecondaryNameNode
7159 NameNode
slave0一、slave02上有幾下2個進程:
ps -ef | grep DataNode
ps -ef | grep NodeManager
2296 DataNode
2398 NodeManager
同時可瀏覽:
http://master:50070/
http://master:8088/
查看狀態
另外也能夠經過 bin/hdfs dfsadmin -report 查看hdfs的狀態報告
其它注意事項:
a) master(即:namenode節點)若要從新格式化,請先清空各datanode上的data目錄(最好連tmp目錄也一塊兒清空),否則格式化完成後,啓動dfs時,datanode會啓動失敗
b) 若是以爲master機器上只運行namenode比較浪費,想把master也當成一個datanode,直接在slaves文件裏,添加一行master便可
c) 爲了方便操做,可修改/etc/profile,把hadoop所需的lib目錄,先加到CLASSPATH環境變量中,同時把hadoop/bin,hadoop/sbin目錄也加入到PATH變量中,可參考下面的內容(根據實際狀況修改):
export HADOOP_HOME=/home/hadoop/hadoop-2.6.0
export JAVA_HOME=/usr/java/jdk1.7.0_51
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
by colplay
2016.07.25