!!!該系列使用三臺虛擬機搭建一個完整的spark集羣,集羣環境以下:html
virtualBox5.二、Ubuntu14.0四、securecrt7.3.6_x64英文版(鏈接虛擬機)node
jdk1.7.0、hadoop2.6.五、zookeeper3.4.五、Scala2.12.六、kafka_2.9.2-0.8.一、park1.3.1-bin-hadoop2.6vim
第一篇:準備三臺虛擬機環境,配置靜態IP,ssh免密碼登陸瀏覽器
第四篇:搭建kafka集羣
post
前面搭建了spark集羣須要的系統環境,本文在前文基礎上搭建hadoop集羣code
1、配置幾個配置文件orm
hadoop的下載和配置只需在spark1上操做,而後拷貝到另外兩臺機器上便可,下面的配置均在spark1上進行
$ cd /usr/local/bigdata/hadoop #進入hadoop安裝目錄
$ cd ./etc/hadoop
一、core-site.xml
$ vim core-site.xml
添加以下,指定namenode的地址:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://spark1:9000</value> </property> </configuration>
二、hdfs-site.xml
$ vim hdfs-site.xml
<configuration> <property> <name>dfs.name.dir</name> <value>/usr/local/hadoop/data/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop/data/datanode</value> </property> <property> <name>dfs.tmp.dir</name> <value>/usr/local/hadoop/data/tmp</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
三、mapred-site.xml,指定hadoop運行在yarn之上
$ mv mapred-site.xml.template mapred-site.xml
$ vim mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
四、yarn-site.xml
$ vim yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>spark1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
五、slaves
$ vim slaves
spark1 spark2 spark3
六、hadoop-env.sh
vim hadoop-env.sh
輸入jdk完整路徑
export JAVA_HOME=/usr/local/bigdata/jdk
2、另外兩臺機器
使用拷貝命令將hadoop拷貝過去
$ cd /usr/local/bigdata $ scp -r hadoop root@spark2:/usr/local/bigdata
$ scp -r hadoop root@spark3:/usr/local/bigdata
3、配置hadoop環境變量,三臺機器均須要配置
export HADOOP_HOME=/usr/local/bigdata/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export HADOOP_COMMOM_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
4、啓動hadoop集羣
格式化namenode
$ hdfs namenode -format
$ start-dfs.sh
此時三臺機器啓動以下,纔算成功
spark1
root@spark1:/usr/local/bigdata/hadoop/etc/hadoop# jps 4275 Jps 3859 NameNode 4120 SecondaryNameNode 3976 DataNode
spark2
root@spark2:/usr/local/bigdata/hadoop/etc/hadoop# jps 6546 DataNode 6612 Jps
spark3
root@spark3:/usr/local/bigdata/hadoop/etc/hadoop# jps
4965 DataNode
5031 Jps
進入瀏覽器,訪問http://spark1:50070
5、啓動yarn集羣
$ start-yarn.sh
此時spark1
root@spark1:/usr/local/bigdata/hadoop/etc/hadoop# jps
3859 NameNode
4803 Jps
4120 SecondaryNameNode
3976 DataNode
4443 ResourceManager
4365 NodeManager
spark2
root@spark2:/usr/local/bigdata/hadoop/etc/hadoop# jps
6546 DataNode
6947 Jps
6771 NodeManager
spark3
root@spark3:/usr/local/bigdata/hadoop/etc/hadoop# jps
5249 Jps
4965 DataNode
5096 NodeManager
瀏覽器輸入 spark1:8088