啓動Spark任務時,在沒有配置spark.yarn.archive
或者spark.yarn.jars
時, 會看到不停地上傳jar,很是耗時;使用spark.yarn.archive
能夠大大地減小任務的啓動時間,整個處理過程以下。spa
1.在本地建立zip文件code
silent@bd01:~/env/spark$ cd jars/ silent@bd01:~/env/spark$ zip spark2.0.0.zip ./*
注:zip包爲全量包blog
2.上傳至HDFS並更改權ip
silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -mkdir /tmp/spark-archive silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -put ./spark2.0.0.zip /tmp/spark-archive silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -chmod 775 /tmp/spark-archive/spark2.0.0.zip.zip
3.配置spark-defaut.confspark
spark.yarn.archive hdfs:///tmp/spark-archive/spark2.0.0.zip
1. 上傳依賴jar包class
silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -mkdir hdfs://bd01/user/asiainfo/jars/
silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -put ./spark2.0.0.zip hdfs://bd01/user/asiainfo/jars/
silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -chmod 775 hdfs://bd01/user/asiainfo/jars/spark2.0.0.zip.zip
2.配置spark-defaut.confcli
spark.yarn.jars=local:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/jars/*,local:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/hive/*,hdfs://bd01/user/asiainfo/jars/*.jar
注:本地配置local,hdfs標記爲hdfs目錄便可配置