前提你得安裝有Hadoop 個人版本hadoop2.3-cdh5.1.0web
一、下載maven包apache
二、配置M2_HOME環境變量,配置maven 的bin目錄到path路徑vim
三、export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"maven
四、到官方下載spark-1.0.2.gz壓縮包、解壓oop
五、進入spark解壓包目錄spa
六、執行./make-distribution.sh --hadoop 2.3.0-cdh5.1.0 --with-yarn --tgz日誌
七、漫長的等待code
八、完成後會在當前目錄下生成spark-1.0.2-bin-2.3.0-cdh5.1.0.tgzorm
九、複製到安裝目錄解壓server
十、配置conf下的配置文件
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
配置參數:對應便可
export JAVA_HOME=/home/hadoop/jdk
export HADOOP_HOME=/home/hadoop/hadoop-2.3.0-cdh5.1.0
export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.3.0-cdh5.1.0/etc/hadoop
export SPARK_YARN_APP_NAME=spark-on-yarn
export SPARK_EXECUTOR_INSTANCES=1
export SPARK_EXECUTOR_CORES=2
export SPARK_EXECUTOR_MEMORY=3500m
export SPARK_DRIVER_MEMORY=3500m
export SPARK_MASTER_IP=master
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=3500m
export SPARK_WORKER_INSTANCES=1
十一、配置slaves
slave01
slave02
slave03
slave04
slave05
十二、分發
拷貝spark安裝目錄到各個slave節點
1三、啓動
sbin/start-all.sh
1四、運行實例
$SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 /home/hadoop/spark/lib/spark-examples-1.0.2-hadoop2.3.0-cdh5.1.0.jar 100
1五、發送實例居然沒成功
在yarn監控界面點擊日誌出現一堆這些錯誤
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).
1六、解決問題
將spark目錄下lib包的spark核心包拿到本地,發現裏面有一個yarn-defaul.xml文件,打開發現
<!-- Resource Manager Configs --> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>0.0.0.0</value> </property>
可想而知,到本地找resorcemanager,若是運行節點不是在yarn節點的resourcemanager上運行,怎麼可能找到呢
1七、修改這個配置以下
<!-- Resource Manager Configs --> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property>
1八、打包從新分發spark到各個節點