hadoop集羣搭建


hadoop集羣搭建

1 集羣規劃

三臺虛擬機
操做系統:CentOS7 Minimal
經過橋接方式聯網(NAT和Host-only應該也能夠)
IP地址分別是:
192.168.1.101
192.168.1.102
192.168.1.103
java

2 集羣基本配置

修改三臺機器的/etc/hosts文件,增長以下內容:
node

192.168.1.101 master
192.168.1.102 slave1
192.168.1.103 slave2

分別修改/etc/hostname,內容爲master/slave1/slave2
sql

3 配置ssh免密碼訪問

在slave1中
vim

su
vim /etc/ssh/sshd_config
StrictModes no
RSAAuthentication yes
PubkeyAuthentication yes
/bin/systemctl restart  sshd.service
mkdir .ssh

在master中
segmentfault

ssh-keygen -t dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_dsa.pub | ssh galaxy@slave1 'cat - >> ~/.ssh/authorized_keys'
ssh slave1

用一樣的方式處理slave2和master(master本身也要可以ssh免密碼訪問)
centos

參考資料:
http://my.oschina.net/u/1169607/blog/175899
http://segmentfault.com/a/1190000002911599
bash

4 安裝hadoop

省略
ssh

5 配置hadoop

hadoop-env.sh
ide

#export JAVA_HOME=$JAVA_HOME                  //錯誤,不能這麼改
export JAVA_HOME=/usr/java/jdk1.8.0_45

core-site.xml
oop

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://master:9000</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/tmp</value>
	</property>
</configuration>

hdfs-site.xml

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
</configuration>

mapred-site.xml

<configuration>
	<property>
		<name>mapred.job.tracker</name>
		<value>master:9001</value>
	</property>
</configuration>

masters

master

slaves

slave1
slave2

將配置複製到另外兩臺機器上

scp etc/hadoop/* galaxy@slave1:/home/galaxy/hadoop-2.5.1/etc/hadoop/
scp etc/hadoop/* galaxy@slave2:/home/galaxy/hadoop-2.5.1/etc/hadoop/

6 啓動hadoop集羣

格式化namenode

./bin/hadoop namenode -format
出現:15/11/09 19:25:59 INFO common.Storage: Storage directory /tmp/dfs/name has been successfully formatted.

啓動hadoop

./sbin/start-dfs.sh

經過jps驗證是否都正常運行

[galaxy@master hadoop-2.5.1]$ jps
5924 ResourceManager
6918 SecondaryNameNode
7718 Jps
6743 NameNode

[galaxy@slave1 ~]$ jps
6402 Jps
6345 DataNode

[galaxy@slave2 ~]$ jps
25552 Jps
25495 DataNode

7 查看集羣狀態

命令行方式

./bin/hdfs dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

網頁方式
http://192.168.1.101:50070
注意:須要關閉centos7的防火牆:systemctl stop firewalld

8 運行測試程序

建立本地測試文件

mkdir input
vim input/f1
vim input/f2

建立hadoop目錄

./bin/hadoop fs  -mkdir /tmp
./bin/hadoop fs  -mkdir /tmp/input
./bin/hadoop fs -ls /

上傳測試文件

./bin/hadoop fs -put input/ /tmp
注意:須要關閉全部節點centos7的防火牆:systemctl stop firewalld,不然上傳文件會報錯
./bin/hadoop fs -ls /tmp/input
-rw-r--r--   2 galaxy supergroup         16 2015-11-11 04:30 /tmp/input/f1
-rw-r--r--   2 galaxy supergroup         24 2015-11-11 04:30 /tmp/input/f2

運行wordcount

./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount /tmp/input /output

查看輸出結果

[galaxy@master hadoop-2.5.1]$ ./bin/hadoop fs -ls /output
Found 2 items
-rw-r--r--   2 galaxy supergroup          0 2015-11-11 04:44 /output/_SUCCESS
-rw-r--r--   2 galaxy supergroup         31 2015-11-11 04:44 /output/part-r-00000
[galaxy@master hadoop-2.5.1]$ ./bin/hadoop fs -cat /output/*
bye	2
hadoop	2
hello	2
world	1

Author: galaxy

Created: 2015-11-11 Wed 18:00

Emacs 24.5.6 (Org mode 8.2.10)

Validate

相關文章
相關標籤/搜索