首先準備3臺電腦或虛擬機,分別是Master,Worker1,Worker2,安裝操做系統(本文中使用CentOS7)。java
一、配置集羣,如下步驟在Master機器上執行node
1.一、關閉防火牆:systemctl stop firewalld.serviceweb
1.二、設置機器ip爲靜態ipshell
1.2.一、修改配置apache
cd /etc/sysconfig/network-scripts/ vim ifcfg-eno16777736 更改內容以下: BOOTPROTO=static #配置靜態IP,網關,子網掩碼 IPADDR=192.168.232.133 NETMASK=255.255.255.0 GATEWAY=192.168.232.2 #取消networkmanager 管理 NM_CONTROLLED=no ONBOOT=yes
1.2.二、重啓網絡服務:systemctl restart network.servicevim
1.三、設置機器名hostname:hostnamectl set-hostname Master瀏覽器
1.四、設置/etc/hosts服務器
192.168.232.133 Master 192.168.232.134 Worker1 192.168.232.135 Worker2
1.五、按以上5個步驟配置Worker1,Worker2網絡
1.六、測試集羣內機器是否可相互ping通:ping Worker1app
二、配置ssh免密碼登陸
2.一、 配置Master無密碼登陸全部Worker
2.1.一、在Master節點上生成密碼對,在Master上執行如下命令:
ssh-keygen -t rsa -P ''
生成的密鑰對:id_rsa和id_rsa.pub,默認存儲在"/root/.ssh"目錄下。
2.1.二、在Master節點上作以下配置,把id_rsa.pub追加到受權的key裏面去。
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
2.1.三、修改ssh配置文件"/etc/ssh/sshd_config"的下列內容:
RSAAuthentication yes # 啓用 RSA 認證 PubkeyAuthentication yes # 啓用公鑰私鑰配對認證方式 AuthorizedKeysFile .ssh/authorized_keys # 公鑰文件路徑(和上面生成的文件同)
2.1.四、重啓ssh服務,才能使剛纔設置有效:service sshd restart
2.1.五、驗證無密碼登陸本機是否成功:ssh Master
2.1.六、把公鑰複製到全部的Worker機器上。使用scp命令進行復制公鑰:
scp /root/.ssh/id_rsa.pub root@Worker1:/root/ scp /root/.ssh/id_rsa.pub root@Worker2:/root/
2.二、配置Worker1機器
2.2.一、在"/root/"下建立".ssh"文件夾,若是已經存在就不須要建立了。
mkdir /root/.ssh
2.2.二、將Master的公鑰追加到Worker1的受權文件"authorized_keys"中去。
cat /root/id_rsa.pub >> /root/.ssh/authorized_keys
2.2.三、修改"/etc/ssh/sshd_config",具體步驟參考前面Master設置的第1.3和第1.4。
2.2.四、用Master使用ssh無密碼登陸Worker1
ssh worker1
2.2.五、刪除"/root/"目錄下的"id_rsa.pub"文件。
rm –r /root/id_rsa.pub
2.2.六、重複上面的5個步驟把Worker2服務器進行相同的配置。
2.三、 配置全部Worker無密碼登陸Master
2.3.一、在Worker1節點上生成密碼對,並把本身的公鑰追加到"authorized_keys"文件中,執行下面命令:
ssh-keygen -t rsa -P ''
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
2.3.二、將Worker1節點的公鑰"id_rsa.pub"複製到Master節點的"/root/"目錄下。
scp /root/.ssh/id_rsa.pub root@Master:/root/
2.3.三、在Master節點將Worker1的公鑰追加到Master的受權文件"authorized_keys"中去。
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
2.3.四、在Master節點刪除"id_rsa.pub"文件。
rm –r /root/id_rsa.pub
2.3.五、測試從Worker1免密碼登陸到Master:ssh Master
2.四、按照上面的步驟把Worker2和Master之間創建起無密碼登陸。這樣,Master能無密碼登陸每一個Worker,每一個Worker也能無密碼登陸到Master。
三、在Master安裝Java、Scala,把下載的安裝包解壓便可tar -xzvf ...
四、在Master安裝配置Hadoop
4.一、配置hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>Master:50090</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/usr/etc/hadoop-2.7.5/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/etc/hadoop-2.7.5/hdfs/data</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/usr/etc/hadoop-2.7.5/hdfs/namesecondary</value> </property> </configuration>
4.二、配置yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>Master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>Master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>Master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>Master:8088</value> </property> </configuration>
4.三、配置mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
4.四、配置hadoop-env.sh
export JAVA_HOME=/usr/etc/jdk1.8.0_161 export HADOOP_HOME=/usr/etc/hadoop-2.7.5 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
4.五、配置core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://Master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/etc/hadoop-2.7.5/tmp</value> </property> <property> <name>hadoop.native.lib</name> <value>true</value> </property> </configuration>
4.六、配置slaves
Worker1
Worker2
五、在Master安裝配置Spark
5.一、配置spark-env.sh
export JAVA_HOME=/usr/etc/jdk1.8.0_161 export SCALA_HOME=/usr/etc/scala-2.12.4 export HADOOP_HOME=/usr/etc/hadoop-2.7.5 export HADOOP_CONF_DIR=/usr/etc/hadoop-2.7.5/etc/hadoop export SPARK_MASTER_IP=Master export SPARK_WORKER_MEMORY=1g export SPARK_EXECUTOR_MEMORY=1g export SPARK_DRIVER_MEMORY=500m export SPARK_WORKER_CORES=2 export SPARK_HOME=/usr/etc/spark-2.3.0-bin-hadoop2.7 export SPARK_DIST_CLASSPATH=$(/usr/etc/hadoop-2.7.5/bin/hadoop classpath)
5.二、配置spark-defaults.conf
spark.eventLog.enabled true spark.eventLog.dir hdfs://Master:9000/historyserverforSpark spark.yarn.historyServer.address Master:18080 spark.history.fs.logDirectory hdfs://Master:9000/historyserverforSpark spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
5.三、配置slaves
Worker1
Worker2
六、在Master配置環境變量/etc/profile,並經過source /etc/profile使生效
export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL export JAVA_HOME=/usr/etc/jdk1.8.0_161 export JRE_HOME=/usr/etc/jdk1.8.0_161/jre export SCALA_HOME=/usr/etc/scala-2.12.4 export HADOOP_HOME=/usr/etc/hadoop-2.7.5 export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib" export SPARK_HOME=/usr/etc/spark-2.3.0-bin-hadoop2.7 export HIVE_HOME=/usr/etc/apache-hive-2.1.1-bin export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$SCALA_HOME/lib:$HADOOP_HOME/lib PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$HIVE_HOME/bin:$SCALA_HOME/bin:$JAVA_HOME/bin:$PATH export JAVA_HOME PATH
七、在Master經過scp命令拷貝java,scala,hadoop,spark,/etc/profile到Worker1,Worker2機器上
八、在Master機器上運行命令:hadoop namenode -format,格式化磁盤
九、在Master機器上運行命令:start-hdfs.sh,啓動hdfs服務,可在瀏覽器經過Master:50070訪問
十、在Master機器上運行命令:進入spark的bin目錄,start-all.sh,啓動Spark,可在瀏覽器經過Master:8080訪問
十一、在Master機器上運行命令:start-history-server.sh,啓動Spark歷史服務,可在瀏覽器經過Master:18080訪問
十二、測試集羣application運行
12.一、使用spark-submit提交Application:
./spark-submit --class org.apache.spark.examples.SparkPi --master spark://Master:7077 ../examples/jars/spark-examples_2.11-2.3.0.jar 100000
--class:命名空間(包名)+類名;--master:spark集羣的master;.jar:jar包位置;10000:任務個數
12.二、啓動spark-shell,運行woldcount程序:
sc.textFile("/README.md").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).map(pair=>(pair._2,pair._1).sortByKey(false,1).map(pair=>(pair._2,pair._1)).saveAsTextFile("/resdir/wordcount")