spark cdh5編譯安裝[spark-1.0.2 hadoop2.3.0 cdh5.1.0]

 

前提你得安裝有Hadoop 個人版本hadoop2.3-cdh5.1.0web

一、下載maven包apache

二、配置M2_HOME環境變量,配置maven 的bin目錄到path路徑vim

三、export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"maven

四、到官方下載spark-1.0.2.gz壓縮包、解壓oop

五、進入spark解壓包目錄spa

六、執行./make-distribution.sh --hadoop 2.3.0-cdh5.1.0 --with-yarn --tgz日誌

七、漫長的等待code

八、完成後會在當前目錄下生成spark-1.0.2-bin-2.3.0-cdh5.1.0.tgzorm

九、複製到安裝目錄解壓server

十、配置conf下的配置文件

cp spark-env.sh.template spark-env.sh

vim spark-env.sh

配置參數:對應便可

export JAVA_HOME=/home/hadoop/jdk
export HADOOP_HOME=/home/hadoop/hadoop-2.3.0-cdh5.1.0
export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.3.0-cdh5.1.0/etc/hadoop
export SPARK_YARN_APP_NAME=spark-on-yarn
export SPARK_EXECUTOR_INSTANCES=1
export SPARK_EXECUTOR_CORES=2
export SPARK_EXECUTOR_MEMORY=3500m
export SPARK_DRIVER_MEMORY=3500m
export SPARK_MASTER_IP=master
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=3500m
export SPARK_WORKER_INSTANCES=1

十一、配置slaves

slave01
slave02
slave03
slave04
slave05

十二、分發

拷貝spark安裝目錄到各個slave節點

1三、啓動

sbin/start-all.sh

1四、運行實例

$SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn-client     --num-executors 3     --driver-memory 4g     --executor-memory 2g     --executor-cores 1     /home/hadoop/spark/lib/spark-examples-1.0.2-hadoop2.3.0-cdh5.1.0.jar     100

1五、發送實例居然沒成功

在yarn監控界面點擊日誌出現一堆這些錯誤

INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

1六、解決問題

將spark目錄下lib包的spark核心包拿到本地,發現裏面有一個yarn-defaul.xml文件,打開發現

 

  <!-- Resource Manager Configs --> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>0.0.0.0</value> </property> 

可想而知,到本地找resorcemanager,若是運行節點不是在yarn節點的resourcemanager上運行,怎麼可能找到呢

1七、修改這個配置以下

  <!-- Resource Manager Configs --> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> 

1八、打包從新分發spark到各個節點

相關文章
相關標籤/搜索