HAjava
export JAVA_HOME=/usr/java/jdk1.7.0_71 export HADOOP_CONF_DIR={hadoop-home}/etc/hadoop export SPARK_WORKER_CORES=4 # 這個是可以使用core export SPARK_WORKER_MEMORY=12g # 這個可以使用內存 export SPARK_MASTER_IP={ip_addr} # 主要是 用於避免多網卡 的問題 export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url={zeekeeper-address}:2181,{zeekeeper-address}:2181
-Dspark.deploy.zookeeper.dir=/spark" ```node
* vi log4j.properties 在生產環境下 應該使用error 模式,能打打減小 空間消耗 >log4j.rootCategory=INFO, console 改成 log4j.rootCategory=ERROR, console * vi slaves 加上其它節點hostname * vi spark-defaults.conf >spark.serialize org.apache.spark.serializer.KryoSerializer 取消其註釋 * 製做job啓動腳本 **可選操做** 只適用於如今的job 結構 ``` #!/bin/bash Spark_Master=spark://{HA 就兩個地址,若是不開就一個地址}:7077 if [ -z "$2" ]; then echo "jars path is required param." echo "Usage: run.sh start <mainJar> <mainClass> <isbackground (true/false) >" echo "Usage: run.sh stop <mainJar> " exit 1 fi bin="`dirname "$0"`" bin="`cd "$bin"; pwd`" . "$bin/../conf/spark-env.sh" jardir="`dirname "$2"`" PID_FILE=$jardir/spark.pid for jarz in $jardir/lib/*.jar; do if [ "$libs" != "" ]; then libs=$libs,$jarz else libs=$jarz fi done case "$1" in start) if [ -z "$4" ];then $bin/spark-submit --master $Spark_Master --jars $libs --class $3 $2 else if [ "$4" != "true" ];then $bin/spark-submit \ --master $Spark_Master \ --jars $libs \ --class $3 $2 else nohup $bin/spark-submit \ --master $Spark_Master \ --jars $libs \ --class $3 $2 \ >$jardir/stdout.log 2> $jardir/stderr.log & echo $! > $PID_FILE fi fi ;; stop) if [ -e $PID_FILE ] ; then pid=`cat $PID_FILE` kill -9 $pid else echo "[ERROR] Cannot find $PID_FILE !" fi esac ``` * 發送spark 到 master2 和其它node >scp -r {spark-install-dir}/spark-xxx other-node:{spark-install-dir} * master2 修改ip 地址 > vi {spark-home}/conf/spark-env.sh ```
export SPARK_MASTER_IP={ip_addr} ``` * 啓動 master1 > {spark-home}/sbin/start-all.sh master2 > {spark-home}/sbin/start-master.sh * 打開網頁地址 master1:7077apache