(1) spark的通常開發與運行流程是在本地Idea或Eclipse中寫好對應的spark代碼,而後打包部署至驅動節點,而後運行spark-submit。然而,當運行時異常,如空指針或數據庫鏈接等出現問題時,又須要再次修改優化代碼,而後再打包....有木有可能只需一次部署?git
(2) 當新版本的spark發佈時,想馬上立刻體驗新特性,而當前沒有現成的spark集羣,或spark集羣版本較老,又如何體驗新特性呢?github
(1) 無需屢次打包測試,直接在本地測試或調試經過,而後只須要打包部署一次便可。數據庫
spark支持standalone本地模式,初始化SparkConf時,設置master時,僅需指定"local[*]"或"local[1]"apache
(2) 基於本地模式,即便無現有的spark集羣,也能夠調試新版本的sparkwindows
只需在sbt或maven的配置文件中增長新版本的依賴便可。app
(3) 設置spark的日誌級別maven
spark默認打印INFO信息,好比我只想打印take操做後的少量數據,但調用spark時打印日誌太多,就得從一大堆日誌中進行查找。所以更改spark的默認日誌級別。具體配置以下:ide
# Set everything to be logged to the console log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.spark_project.jetty=ERROR log4j.logger.org.spark_project=ERROR log4j.logger.org.apache.spark=ERROR log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR log4j.logger.io.netty=ERROR log4j.logger.org.apache.hadoop=FATAL # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL # 控制檯輸出 log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %5p %c{1}:%L - %m%n
(4) 測試代碼oop
import org.apache.spark.{SparkConf, SparkContext} object Test { def main(args: Array[String]): Unit = { val sc = new SparkContext(new SparkConf().setMaster("local[1]").setAppName("test")) println(sc.version) sc.parallelize(List(1,2,3,4)).foreach(println) sc.stop() } }
運行結果測試
log4j: Trying to find [log4j.xml] using context classloader sun.misc.Launcher$AppClassLoader@18b4aac2. log4j: Trying to find [log4j.xml] using sun.misc.Launcher$AppClassLoader@18b4aac2 class loader. log4j: Trying to find [log4j.xml] using ClassLoader.getSystemResource(). log4j: Trying to find [log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@18b4aac2. log4j: Using URL [file:/E:/IntelliJWorkSpace/AIMind-backend/aimind_backend/pipeline-tools/target/classes/log4j.properties] for automatic log4j configuration. log4j: Reading configuration from URL file:/E:/IntelliJWorkSpace/AIMind-backend/aimind_backend/pipeline-tools/target/classes/log4j.properties log4j: Parsing for [root] with value=[INFO, console]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "console". log4j: Parsing layout options for "console". log4j: Setting property [conversionPattern] to [%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n]. log4j: End of parsing for "console". log4j: Setting property [target] to [System.err]. log4j: Parsed "console" options. log4j: Parsing for [org.spark_project.jetty] with value=[ERROR]. log4j: Level token is [ERROR]. log4j: Category org.spark_project.jetty set to ERROR log4j: Handling log4j.additivity.org.spark_project.jetty=[null] log4j: Parsing for [org.spark_project] with value=[ERROR]. log4j: Level token is [ERROR]. log4j: Category org.spark_project set to ERROR log4j: Handling log4j.additivity.org.spark_project=[null] log4j: Parsing for [org.apache.spark] with value=[ERROR]. log4j: Level token is [ERROR]. log4j: Category org.apache.spark set to ERROR log4j: Handling log4j.additivity.org.apache.spark=[null] log4j: Parsing for [org.apache.hadoop.hive.metastore.RetryingHMSHandler] with value=[FATAL]. log4j: Level token is [FATAL]. log4j: Category org.apache.hadoop.hive.metastore.RetryingHMSHandler set to FATAL log4j: Handling log4j.additivity.org.apache.hadoop.hive.metastore.RetryingHMSHandler=[null] log4j: Parsing for [parquet] with value=[ERROR]. log4j: Level token is [ERROR]. log4j: Category parquet set to ERROR log4j: Handling log4j.additivity.parquet=[null] log4j: Parsing for [io.netty] with value=[ERROR]. log4j: Level token is [ERROR]. log4j: Category io.netty set to ERROR log4j: Handling log4j.additivity.io.netty=[null] log4j: Parsing for [org.apache.hadoop] with value=[FATAL]. log4j: Level token is [FATAL]. log4j: Category org.apache.hadoop set to FATAL log4j: Handling log4j.additivity.org.apache.hadoop=[null] log4j: Parsing for [org.apache.parquet] with value=[ERROR]. log4j: Level token is [ERROR]. log4j: Category org.apache.parquet set to ERROR log4j: Handling log4j.additivity.org.apache.parquet=[null] log4j: Finished configuring. 2.4.1 1 2 3 4
(1) https://www.jianshu.com/p/c4b6ed734e72
(2) https://blog.csdn.net/weixin_41122339/article/details/81141913
按照如上兩個連接的方法,在windows環境上調試spark:下載winutils.exe -> 配置環境變量,重啓womdows, 增長spark依賴....
(1) 按照如上第一個連接配置spark的輸出日誌級別時,老是還能顯示出spark的INFO、DEBUG信息,隨單步調試排查了下,發現"Class path contains multiple SLF4J bindings."異常,找到本地的包倉庫地址,刪除非slf4j對應的包便可