參見Hadoop 1.2.1 僞分佈式模式安裝中Java安裝部分
html
咱們仍是以spark-master, ubuntu-worker, spark-worker1三臺機器爲例。
java
參見博客Spark集羣搭建——SSH免密碼驗證登錄
node
下載地址:http://hadoop.apache.org/releases.html#Downloadweb
解壓文件: tar -zxvf hadoop-2.4.1.tar.gzshell
進入hadoop-2.4.1/etc/hadoop目錄下,須要配置如下7個文件有:apache
hadoop-env.sh, yarn-env.sh, slaves, core-site.xml, hdfs-site.xml, maprd-site.xml, yarn-site.xmlubuntu
1. hadoop-env.sh配置JAVA_HOMEapp
export JAVA_HOME=/home/mupeng/java/jdk1.6.0_35
2. yarn-env.sh配置JAVA_HOMEwebapp
# some Java parameters export JAVA_HOME=/home/mupeng/java/jdk1.6.0_35
3. slaves配置slave結點
分佈式
ubuntu-worker spark-worker1
4. core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://spark-master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/mupeng/opt/hadoop-2.4.0/tmp</value> <description>Abasefor other temporary directories.</description> </property> <property> <name>hadoop.proxyuser.spark.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.spark.groups</name> <value>*</value> </property> </configuration>
5. hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>spark-master:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/mupeng/opt/hadoop-2.4.0/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/mupeng/opt/hadoop-2.4.0/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
6. maprd-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>spark-master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>spark-master:19888</value> </property> </configuration>
7. yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>spark-master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>spark-master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>spark-master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>spark-master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>spark-master:8088</value> </property> </configuration>
最後將配置好的hadoop-2.4.1文件夾拷貝到另外兩個結點便可。
查看hdfs:http://192.168.248.150:50070/dfshealth.html#tab-datanode,能夠看到有兩個結點:
查看yarn:http://192.168.248.150:8088/cluster/nodes
OK, 咱們的Hadoop2.4.1集羣搭建成功。接下來搭建spark集羣參見博客Spark1.2.1集羣環境搭建——Standalone模式