本文詳細闡述如何搭建Spark集羣,並分別使用Maven和SBT編譯部署Spark,最後在CentOS中安裝IDEA進行開發。html
CentOS 7java
JDK:1.8node
Spark:2.0.0、2.2.0linux
Scala:2.11.8web
Hadoop: 2.7算法
Maven:3.3.9apache
如下全部步驟均親自完成並詳細截圖!(本文爲原創,轉載請標明本文地址,謝謝!)windows
CentOS鏡像:http://mirrors.163.com/centos/7/isos/x86_64/CentOS-7-x86_64-DVD-1708.isocentos
master:1核、4G內存網絡
slave1:1核、2G內存
slave2:1核、2G內存
接下來進入安裝,期間需設置語言、密碼。
選擇語言,我這裏選擇英文,您也能夠下拉選擇中文
等待數秒。。。
選擇軟件安裝(重要)
這裏給出中文翻譯,參考選擇
等待數秒,設置安裝位置
點擊一下就行
點進去自行設置密碼、用戶
完成,重啓
而後
網絡進去後在配置
————————————
設置ip地址
注意:這裏的ip,子網掩碼,網關,DNS是根據個人筆記本電腦上的網絡狀況設定的,請您根據您的實際狀況設置
這是我windows筆記本網絡
設置主機名
打開終端,以root登陸
[root@localhost spark]# vi /etc/sysconfig/network
添加或修改:
NETWORKING=yes
HOSTNAME=master
[root@localhost spark]# vi /etc/hosts
添加:
192.168.1.191 master
192.168.1.192 slave1
192.168.1.193 slave2
(slave一、2 見下文)
關閉防火牆和SELinux(Hadoop和Spark運行時須要經過端口通訊,關閉其就不會受到阻攔)
查看系統
[root@localhost spark]# rpm -q centos-release centos-release-7-4.1708.el7.centos.x86_64
CentOS 7 查看防火牆狀態
[root@localhost spark]# firewall-cmd --state running
關閉並禁止開機啓動防火牆
[root@localhost spark]# systemctl stop firewalld.service [root@localhost spark]# systemctl disable firewalld.service Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service. Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service. [root@localhost spark]# firewall-cmd --state not running
關閉SELinux
[root@localhost spark]# vi /etc/selinux/config
修改:SELINUX=disabled
重啓
————————
更新OpenSSL(防止編譯過程沒法經過ssh鏈接節點)
[root@localhost spark]# yum update openssl
更改OpenSSL配置(確認使用RSA算法)
[root@localhost spark]# vi /etc/ssh/sshd_config
設置如下三項:
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
重啓ssh
[root@localhost spark]# service sshd restart
建立Spark用戶目錄,方便管理(前面裝系統時設置了用戶和用戶組)
[root@localhost spark]# vi /etc/sysconfig/network [root@localhost spark]# mkdir /spark [root@localhost spark]# mkdir /spark/soft [root@localhost spark]# mkdir /spark/compile [root@localhost spark]# chown -R spark:spark /spark/
下載
jdk:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
scala:https://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz
[root@localhost Downloads]# mv /home/spark/Downloads/jdk-8u162-linux-x64.tar.gz /spark/soft/ [root@localhost Downloads]# mv /home/spark/Downloads/scala-2.11.8.tgz /spark/soft/ [root@localhost Downloads]# mv /home/spark/Downloads/spark-2.0.0.tgz /spark/soft/
[root@localhost soft]# cd /spark/soft/ [root@localhost soft]# tar -zxvf scala-2.11.8.tgz [root@localhost soft]# tar -zxvf jdk-8u162-linux-x64.tar.gz
配置環境變量
[root@localhost soft]# vi /etc/profile
最後一行添加:
export JAVA_HOME=/spark/soft/jdk1.8.0_162
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$CLASSPATH
export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export SCALA_HOME=/spark/soft/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin
保存配置:
[root@localhost soft]# source /etc/profile
配置使用上面的jdk
[root@localhost soft]# update-alternatives --install /usr/bin/java java /spark/soft/jdk1.8.0_162 170130 [root@localhost soft]# update-alternatives --config java
將上面作好的樣板虛擬機,克隆2份,再修改ip和主機名。
先關機
右鍵-管理-克隆
而後重複克隆操做
更改slave一、2的內存配置
slave一、2操做相同
而後啓動master、slave一、slave2
修改網絡鏈接(3個節點操做相同)
打開終端,root登陸
[root@localhost spark]# vi /etc/sysconfig/network-scripts/ifcfg-ens33
修改:
修改主機名(salve一、2相同操做)
[root@slave1 spark]# vi /etc/sysconfig/network
重啓master、slave一、2
————————————
配置SSH無密碼登陸
3個節點都打開終端(注意不是root登陸,而是spark用戶登陸)
[spark@master ~]$ ssh-keygen -t rsa
[spark@slave1 ~]$ ssh-keygen -t rsa
[spark@slave2 ~]$ ssh-keygen -t rsa
而後按回車屢次獲得
3個節點對應如下操做(注意看主機名操做,別搞混淆了)
[spark@master ~]$ cd /home/spark/.ssh/ [spark@master .ssh]$ mv id_rsa.pub master.pub
[spark@slave1 .ssh]$ cd /home/spark/.ssh/ [spark@slave1 .ssh]$ mv id_rsa.pub slave1.pub [spark@slave1 .ssh]$ scp slave1.pub spark@master:/home/spark/.ssh/(注:以後輸入yes,再輸入密碼)
[spark@slave2 ~]$ cd /home/spark/.ssh/ [spark@slave2 .ssh]$ mv id_rsa.pub slave2.pub [spark@slave2 .ssh]$ scp slave2.pub spark@master:/home/spark/.ssh/
以下
在master節點是合併
[spark@master .ssh]$ cat master.pub >> authorized_keys [spark@master .ssh]$ cat slave1.pub >> authorized_keys [spark@master .ssh]$ cat slave2.pub >> authorized_keys
獲得
把合併的公鑰發送到兩個節點
[spark@master .ssh]$ scp authorized_keys spark@slave1:/home/spark/.ssh/ (一樣須要輸入yes,密碼) [spark@master .ssh]$ scp authorized_keys spark@slave2:/home/spark/.ssh/
配置權限(3個節點均須要)
[spark@master .ssh]$ chmod 400 authorized_keys
[spark@slave1 .ssh]$ chmod 400 authorized_keys
[spark@slave2 .ssh]$ chmod 400 authorized_keys
驗證ssh(3個節點都驗證)
ssh master ssh slave1 ssh slave2
若是通就好,可能讓輸入yes,輸入便可,如
在master節點上進行如下操做(不想編譯可直接看:四)
下載spark源碼(2.0.0):https://archive.apache.org/dist/spark/spark-2.0.0/spark-2.0.0.tgz
下載spark源碼(2.2.0):https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0.tgz
下載完成後解壓,配置國內阿里鏡像
[spark@master ~]$ mv /home/spark/Downloads/spark-2.0.0.tgz /spark/compile/ [spark@master ~]$ cd /spark/compile/ [spark@master compile]$ tar -zxf spark-2.0.0.tgz [spark@master compile]$ cd spark-2.0.0/ [spark@master spark-2.0.0]$ vi pom.xml
修改url爲:<url>http:
//maven.aliyun.com/nexus/content/groups/public/</url>
注:修改兩處,您能夠輸入 / 而後輸入repo1定位到那裏
Maven:https://archive.apache.org/dist/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
[spark@master ~]$ mv /home/spark/Downloads/apache-maven-3.3.9-bin.tar.gz /spark/soft/ [spark@master ~]$ cd /spark/soft/ [spark@master soft]$ tar -zxvf apache-maven-3.3.9-bin.tar.gz
配置環境變量(使用root用戶)
[root@master soft]# vi /etc/profile
最後添加:
export MAVEN_HOME=/spark/soft/apache-maven-3.3.9
export PATH=$PATH:${MAVEN_HOME}/bin
驗證
[root@master soft]# vi /etc/profile [root@master soft]# source /etc/profile [root@master soft]# mvn -version
——————————
注:下面的編譯,閣下能夠選擇2.2.0,也能夠選擇2.0.0,個人截圖爲2.0.0的,使用root權限編譯,初次編譯會下載不少東西,根據網絡狀況,可能須要一個多小時。我在測試時編譯過屢次,因此19分鐘完事。
開始編譯(2.0.0)
[root@master ~]$ cd /spark/compile/ [root@master compile]# cp -r spark-2.0.0 spark-2.0.0-mvn [root@master compile]# cd spark-2.0.0-mvn/
[root@master spark-2.0.0-mvn]# ./build/mvn -Pyarn -Phadoop-2.7 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive -DskipTests clean package
而後構建部署包(生成tar.gz)
./dev/make-distribution.sh --name dev --tgz -Pyarn -Phadoop-2.7 -Phadoop-provided -Phive -Phive-thriftserver -Pnetlib-lgpl
最後出現
——————————
同理編譯2.2.0也能夠獲得
[root@master spark]# cd /spark/compile/ [root@master compile]# cp -r spark-2.0.0 spark-2.0.0-sbt [root@master compile]# cd spark-2.0.0-sbt/ [root@master spark-2.0.0-sbt]# build/sbt assembly -Pyarn -Phadoop-2.7 -Pspark-ganglia-lgpl -Pkinesis-asl-Phive
注:我在SBT編譯時,提示SBT下載網絡代理有錯,試了多種方法未果,遂放置。建議使用maven編譯,或者無定製版本需求的話直接下載官網提供編譯好的包。
安裝包準備:
能夠用上一步編譯好的(2.0.0):spark-2.0.0-bin-dev.tgz(位於/spark/compile/spark-2.0.0-mvn/)
或者上一步編譯好的(2.2.0):spark-2.2.0-bin-dev.tgz(位於/spark/compile/spark-2.2.0mvn/)
也能夠到官網下載編譯好Spark的版本:http://spark.apache.org/downloads.html
Hadoop:http://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
master節點下操做:
[spark@master ~]$ mkdir /spark/hadoop/ [spark@master ~]$ mkdir /spark/hadoop/data [spark@master ~]$ mkdir /spark/hadoop/name [spark@master ~]$ mkdir /spark/hadoop/tmp [spark@master ~]$ mv /home/spark/Downloads/hadoop-2.7.2.tar.gz /spark/soft/ [spark@master soft]$ tar -zxf hadoop-2.7.2.tar.gz配置環境:
[spark@master soft]$ cd hadoop-2.7.2/etc/hadoop/ [spark@master hadoop]$ vi core-site.xml
<configuration> <!-- 指定HDFS(namenode)的通訊地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <!-- 指定hadoop運行時產生文件的存儲路徑 --> <property> <name>hadoop.tmp.dir</name> <value>/spark/hadoop/tmp</value> </property> </configuration>
[spark@master hadoop]$ vi hdfs-site.xml
添加:
<configuration> <!-- 設置namenode存放的路徑 --> <property> <name>dfs.namenode.name.dir</name> <value>/spark/hadoop/name</value> </property> <!-- 設置hdfs副本數量 --> <property> <name>dfs.replication</name> <value>2</value> </property> <!-- 設置datanode存放的路徑 --> <property> <name>dfs.datanode.data.dir</name> <value>/spark/hadoop/data</value> </property> </configuration>
[spark@master hadoop]$ vi mapred-site.xml
添加:
<configuration> <!-- 通知框架MR使用YARN --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- mapreduce任務記錄訪問地址--> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
[spark@master hadoop]$ vi yarn-site.xml
添加:
<configuration> <!-- Site specific YARN configuration properties --> <!-- 設置 resourcemanager 在哪一個節點--> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <!-- reducer取數據的方式是mapreduce_shuffle --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
[spark@master hadoop]$ vi slaves
添加:
master slave1 slave2
[spark@master hadoop]$ vi hadoop-env.sh
添加修改:
export JAVA_HOME=/spark/soft/jdk1.8.0_162 export HADOOP_HOME=/spark/soft/hadoop-2.7.2 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_LOG_DIR=$HADOOP_HOME/logs
root 權限添加環境變量
[root@master hadoop]# vi /etc/profile
最下面添加:
export HADOOP_HOME=/spark/soft/hadoop-2.7.2 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export CLASSPATH=.:$HADOOP_HOME/lib:$CLASSPATH export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_ROOT_LOGGER=INFO,console export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export HADOOP_CONF_DIR=/spark/soft/hadoop-2.7.2/etc/hadoop export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
[root@master hadoop]# source /etc/profile
將hadoop複製到兩個從節點
[spark@master hadoop]$ scp -r /spark/soft/hadoop-2.7.2 spark@slave1:/spark/soft/ [spark@master hadoop]$ scp -r /spark/soft/hadoop-2.7.2 spark@slave2:/spark/soft/
啓動驗證
[spark@master hadoop]$ cd /spark/soft/hadoop-2.7.2/ [spark@master hadoop-2.7.2]$ ./sbin/start-all.sh
用spark用戶建立一個目錄
[spark@master ~]$ mkdir /spark/work
放置、解壓安裝包
[spark@master ~]$ mkdir /spark/work [spark@master ~]$ mkdir /spark/work/mvn1 [spark@master ~]$ cp /spark/compile/spark-2.0.0-mvn/spark-2.0.0-bin-dev.tgz /spark/work/mvn1/ [spark@master ~]$ cd /spark/work/mvn1/ [spark@master mvn1]$ tar -zxf spark-2.0.0-bin-dev.tgz
配置conf/slavers
[spark@master mvn1]$ cd spark-2.0.0-bin-dev/conf/ [spark@master conf]$ cp slaves.template slaves [spark@master conf]$ vi slaves
修改添加:
master slave1 slave2
配置conf/spark-enc.sh
[spark@master conf]$ cp spark-env.sh.template spark-env.sh [spark@master conf]$ vi spark-env.sh
最下面添加:
export JAVA_HOME=/spark/soft/jdk1.8.0_162 export SCALA_HOME=/spark/soft/scala-2.11.8 export SPARK_HOME=/spark/work/mvn1/spark-2.0.0-bin-dev export HADOOP_HOME=/spark/soft/hadoop-2.7.2 export HADOOP_CONF_DIR=/spark/soft/hadoop-2.7.2/etc/hadoop export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath) export SPARK_MASTER_IP=master export SPARK_MASTER_PORT=7077 export SPARK_EXECUTOR_INSTANCES=1 export SPARK_WORKER_INSTANCES=1 export SPARK_WORKER_CORES=1 export SPARK_WORKER_MEMORY=1024M export SPARK_MASTER_WEBUI_PORT=8080 export SPARK_CONF_DIR=$SPARK_HOME/conf
複製Spark到其餘兩個節點
[spark@master conf]$ cd /spark/work/mvn1/ [spark@master mvn1]$ rm -rf spark-2.0.0-bin-dev.tgz [spark@master mvn1]$ scp -r /spark/work/ spark@slave1:/spark/ [spark@master mvn1]$ scp -r /spark/work/ spark@slave2:/spark/
啓動Spark
[spark@master mvn1]$ cd /spark/work/mvn1/spark-2.0.0-bin-dev/ [spark@master spark-2.0.0-bin-dev]$ ./sbin/start-all.sh
結果:
master
slave1
salve2
至此編譯部署、搭建Spark集羣完成,IDEA開發見下一篇,謝謝!