hadoop集羣搭建
Table of Contents
1 集羣規劃
三臺虛擬機
操做系統:CentOS7 Minimal
經過橋接方式聯網(NAT和Host-only應該也能夠)
IP地址分別是:
192.168.1.101
192.168.1.102
192.168.1.103
java
2 集羣基本配置
修改三臺機器的/etc/hosts文件,增長以下內容:
node
192.168.1.101 master 192.168.1.102 slave1 192.168.1.103 slave2
分別修改/etc/hostname,內容爲master/slave1/slave2
sql
3 配置ssh免密碼訪問
在slave1中
vim
su vim /etc/ssh/sshd_config StrictModes no RSAAuthentication yes PubkeyAuthentication yes /bin/systemctl restart sshd.service mkdir .ssh
在master中
segmentfault
ssh-keygen -t dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys cat ~/.ssh/id_dsa.pub | ssh galaxy@slave1 'cat - >> ~/.ssh/authorized_keys' ssh slave1
用一樣的方式處理slave2和master(master本身也要可以ssh免密碼訪問)
centos
參考資料:
http://my.oschina.net/u/1169607/blog/175899
http://segmentfault.com/a/1190000002911599
bash
4 安裝hadoop
省略
ssh
5 配置hadoop
hadoop-env.sh
ide
#export JAVA_HOME=$JAVA_HOME //錯誤,不能這麼改 export JAVA_HOME=/usr/java/jdk1.8.0_45
core-site.xml
oop
<configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> </property> </configuration>
masters
master
slaves
slave1 slave2
將配置複製到另外兩臺機器上
scp etc/hadoop/* galaxy@slave1:/home/galaxy/hadoop-2.5.1/etc/hadoop/ scp etc/hadoop/* galaxy@slave2:/home/galaxy/hadoop-2.5.1/etc/hadoop/
6 啓動hadoop集羣
格式化namenode
./bin/hadoop namenode -format 出現:15/11/09 19:25:59 INFO common.Storage: Storage directory /tmp/dfs/name has been successfully formatted.
啓動hadoop
./sbin/start-dfs.sh
經過jps驗證是否都正常運行
[galaxy@master hadoop-2.5.1]$ jps 5924 ResourceManager 6918 SecondaryNameNode 7718 Jps 6743 NameNode [galaxy@slave1 ~]$ jps 6402 Jps 6345 DataNode [galaxy@slave2 ~]$ jps 25552 Jps 25495 DataNode
7 查看集羣狀態
命令行方式
./bin/hdfs dfsadmin -report Configured Capacity: 0 (0 B) Present Capacity: 0 (0 B) DFS Remaining: 0 (0 B) DFS Used: 0 (0 B) DFS Used%: NaN% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0
網頁方式
http://192.168.1.101:50070
注意:須要關閉centos7的防火牆:systemctl stop firewalld
8 運行測試程序
建立本地測試文件
mkdir input vim input/f1 vim input/f2
建立hadoop目錄
./bin/hadoop fs -mkdir /tmp ./bin/hadoop fs -mkdir /tmp/input ./bin/hadoop fs -ls /
上傳測試文件
./bin/hadoop fs -put input/ /tmp 注意:須要關閉全部節點centos7的防火牆:systemctl stop firewalld,不然上傳文件會報錯 ./bin/hadoop fs -ls /tmp/input -rw-r--r-- 2 galaxy supergroup 16 2015-11-11 04:30 /tmp/input/f1 -rw-r--r-- 2 galaxy supergroup 24 2015-11-11 04:30 /tmp/input/f2
運行wordcount
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount /tmp/input /output
查看輸出結果
[galaxy@master hadoop-2.5.1]$ ./bin/hadoop fs -ls /output Found 2 items -rw-r--r-- 2 galaxy supergroup 0 2015-11-11 04:44 /output/_SUCCESS -rw-r--r-- 2 galaxy supergroup 31 2015-11-11 04:44 /output/part-r-00000 [galaxy@master hadoop-2.5.1]$ ./bin/hadoop fs -cat /output/* bye 2 hadoop 2 hello 2 world 1