一、用./bin/spark-shell啓動spark時遇到異常:java.net.BindException: Can't assign requested address: Service 'sparkDriver' failed after 16 retries!java
解決方法:add export SPARK_LOCAL_IP="127.0.0.1" to spark-env.shnode
二、java Kafka producer error:ERROR kafka.utils.Utils$ - fetching topic metadata for topics [Set(words_topic)] from broker [ArrayBuffer(id:0,host: xxxxxx,port:9092)] failedmysql
解決方法:Set 'advertised.host.name' on server.properties of Kafka broker to server's realIP(same to producer's 'metadata.broker.list' property)linux
三、java.net.NoRouteToHostException: No route to hostgit
解決方法:zookeeper的IP要配對github
四、Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) java.net.UnknownHostException: linux-pic4.site:web
解決方法:add your hostname to /etc/hosts: 127.0.0.1 localhost linux-pic4.site正則表達式
五、org.apache.spark.SparkException: A master URL must be set in your configurationsql
解決方法:SparkConf sparkConf = new SparkConf().setAppName("JavaDirectKafkaWordCount").setMaster("local");shell
六、Failed to locate the winutils binary in the hadoop binary path
解決方法:先安裝好hadoop
七、啓動spark時: Failed to get database default, returning NoSuchObjectException
解決方法:1)Copy winutils.exe from here(https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin) to some folder say, C:\Hadoop\bin. Set HADOOP_HOME to C:\Hadoop.2)Open admin command prompt. Run C:\Hadoop\bin\winutils.exe chmod 777 /tmp/hive
八、org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.
解決方法:Use this constructor JavaStreamingContext(sparkContext: JavaSparkContext, batchDuration: Duration) 替代 new JavaStreamingContext(sparkConf, Durations.seconds(5));
九、Reconnect due to socket error: java.nio.channels.ClosedChannelException
解決方法:kafka服務器broker ip寫對
十、java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute
解決方法:tranformation最後一步產生的那個RDD必須有相應Action操做,例如massages.print()等
十一、經驗:spark中數據寫入ElasticSearch的操做必須在action中以RDD爲單位執行十二、 Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use;
解決方法:master和slave配置成同一個IP致使的,要配成不一樣IP
1三、CALL TO LOCALHOST/127.0.0.1:9000
解決方法:host配置正確,/etc/sysconfig/network /etc/hosts /etc/sysconfig/network-scripts/ifcfg-eth0
1三、打開namenode:50070頁面,Datanode Infomation只顯示一個節點
解決方法:SSH配置錯誤致使,主機名必定要嚴格匹配,從新配置ssh免密碼登陸
1四、經驗:搭建集羣時要首先配置好主機名,並重啓機器讓配置的主機名生效1五、INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoRouteToHostException: No route to host
解決方法:若是主從節點能相互ping通,那就關掉防火牆 service iptables stop
1六、經驗:不要隨意格式化HDFS,這會帶來數據版本不一致等諸多問題,格式化前要清空數據文件夾1七、namenode1: ssh: connect to host namenode1 port 22: Connection refused
解決方法:sshd被關閉或沒安裝致使,which sshd檢查是否安裝,若已經安裝,則sshd restart,並ssh 本機hostname,檢查是否鏈接成功
1八、Log aggregation has not completed or is not enabled.
解決方法:在yarn-site.xml中增長相應配置,以支持日誌聚合
1九、failed to launch org.apache.spark.deploy.history.History Server full log in
解決方法:正確配置spark-defaults.xml,spark-en.sh中SPARK_HISTORY_OPTS屬性
20、Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
解決方法:yarn-lient模式出現的異常,暫時無解
2一、hadoop的文件不能下載以及YARN中Tracking UI不能訪問歷史日誌
解決方法:windows系統不能解析域名所致,把hosts文件hostname複製到windows的hosts中
2二、經驗:HDFS文件路徑寫法爲:hdfs://master:9000/文件路徑,這裏的master是namenode的hostname,9000是hdfs端口號。2三、Yarn JobHistory Error: Failed redirect for container
解決方法:將 http://:19888/jobhistory/logs 配置到yarn-site.xml中,重啓yarn和JobHistoryServer
2四、經過hadoop UI訪問hdfs文件夾時,出現提示 Permission denied: user=dr.who
解決方法:namonode節點終端執行:hdfs dfs -chmod -R 755 /
2五、經驗:Spark的Driver只有在Action時纔會收到結果2八、java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
解決方法:統一ES版本,儘可能避免直接在spark中建立ES client
2九、eturned Bad Request(400) - failed to parse;Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes; Bailing out..
解決方法:寫入ES的數據格式糾正
30、java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds
解決方法:確保全部節點之間可以免密碼登陸
3一、集羣模式下,spark沒法向elasticsearch寫入數據
解決方法:採用這種寫入方式(帶上es配置的Map參數)results.foreachRDD(javaRDD -> {JavaEsSpark.saveToEs(javaRDD, esSchema, cfg);return null;});
3二、經驗:全部自定義類要實現serializable接口,不然在集羣中沒法生效3四、經過nio讀取資源文件時,java.nio.file.FileSystemNotFoundException at com.sun.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:171)
解決方法:打成jar包後URI發生變化所致,形如jar:file:/C:/path/to/my/project.jar!/my-folder,要採用如下解析方式,
final Map env = new HashMap<>();3七、java.io.NotSerializableException: org.apache.log4j.Logger
解決方法:序列化類中不能包含不可序列化對象,you have to prevent logger instance from default serializabtion process, either make it transient or static. Making it static final is preferred option due to many reason because if you make it transient than after deserialization logger instance will be null and any logger.debug() call will result in NullPointerException in Java because neither constructor not instance initializer block is called during deserialization. By making it static and final you ensure that its thread-safe and all instance of Customer class can share same logger instance, By the way this error is also one of the reason Why Logger should be declared static and final in Java program.
3八、log4j:WARN Unsupported encoding
解決方法:1.把UTF改爲小寫utf-8 2.設置編碼那行有空格
3九、MapperParsingException[Malformed content, must start with an object
解決方法:採用接口JavaEsSpark.saveJsonToEs,由於saveToEs只能處理對象不能處理字符串
40、 ERROR ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application
解決方法:資源不能分配過大,或者沒有把.setMaster("local[*]")去掉
4一、WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
解決方法:配置文件broker編號要寫對,命令中的IP寫真實IP
4二、 User class threw exception: org.apache.spark.SparkException: org.apache.spark.SparkException: Couldn't find leaders for Set([mywaf,7], [mywaf,1])
解決方法:正確配置kafka,並從新建立topic
4三、在ES界面發現有節點shard分片不顯示
解決方法:該節點磁盤容量不足,清理磁盤增長容量
4四、The method updateStateByKey(Function2,Optional,Optional>, int) in the type JavaPairDStream is not applicable for the arguments (Function2,Optional,Optional>, int)
解決方法:Spark use com.google.common.base.Optional not jdk default package java.util.Optional
4五、NativeCrc32.nativeComputeChunkedSumsByteArray
解決方法:配置eclipse的hadoop-home,bin和system32文件夾中加入64位的2.6版本的hadoop.dll
4六、經驗:Spark Streaming包含三種計算模式:nonstate 、stateful 、window
4七、Yarn的RM單點故障
解決方法:經過三節點zookeeper集羣和yarn-site.xml配置文件完成Yarn HA
4八、經驗:kafka可經過配置文件使用自帶的zookeeper集羣
4九、經驗:Spark一切操做歸根結底是對RDD的操做
50、如何保證kafka消息隊列的強有序
解決方法:把須要強有序的topic只設置一個partition
5一、linux批量多機互信
解決方法:pub祕鑰配成一個
5二、org.apache.spark.SparkException: Failed to get broadcast_790_piece0 of broadcast_790
解決方法:去除spark-defaults.conf中spark.cleaner.ttl配置
5三、Yarn HA環境下,經過web訪問history日誌被跳轉到8088而沒法顯示
解決方法:恢復Yarn Http默認端口8088
5四、but got no response. Marking as slave lost
解決方法:使用yarn client提交做業遇到這種狀況,暫時無解
5五、Using config: /work/poa/zookeeper-3.4.6/bin/../conf/zoo.cfg Error contacting service. It is probably not running.
解決方法:配置文件不正確,例如hostname不匹配等
5六、經驗:部署Spark任務,不用拷貝整個架包,只需拷貝被修改的文件,而後在目標服務器上編譯打包。
5七、Spark setAppName doesn't appear in Hadoop running applications UI
解決方法:set it in the command line for spark-submit "--name BetterName"
5八、如何監控Sprak Streaming做業是否掛掉
解決方法:經過監控Driver端口或者根據yarn指令寫Linux定時腳本監控
5九、kafka內外網問題
解決方法:kafka機器雙網卡,配置文件server.properties中advertised.host.name不要寫IP,用域名形式,外網的生產者和內網的消費者各自解析成本身所需的IP。
60、經驗:kafka的log.dirs不要設置成/tmp下的目錄,貌似tmp目錄有文件數和磁盤容量限制
6一、kafka搬機器後,在新的集羣,topic被自動建立,且只有一臺broker負載
解決方法:server.properties中加上delete.topic.enable=true和auto.create.topics.enable=false,刪除舊的topic,從新建立topic,重啓kafka
6二、安裝sbt,運行sbt命令卡在Getting org.scala-sbt sbt 0.13.6 ...
解決方法:sbt takes some time to download its jars when it is run first time,不要退出,直至sbt處理完
6三、經驗:ES的分片相似kafka的partition
6四、kafka出現OOM異常
解決方法:進入kafka broker啓動腳本中,在export KAFKA_HEAP_OPTS="-Xmx24G -Xms1G"調大JVM堆內存參數
6五、linux服務器磁盤爆滿,檢查超過指定大小的文件
解決方法:find / -type f -size +10G
6六、spark-direct kafka streaming限速
解決方法:spark.streaming.kafka.maxRatePerPartition,配置每秒每一個kafka分區讀取速率
6七、org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error returned Not Found(404) - [EngineClosedException CurrentState[CLOSED]
解決方法:在kopf插件中對該索引先close再open便可。形成緣由多是Index建立時有shard壞掉。
6八、Job aborted due to stage failure: Task not serializable:
解決方法:Serializable the class;Declare the instance only within the lambda function passed in map;Make the NotSerializable object as a static and create it once per machine;Call rdd.forEachPartition and create the NotSerializable object in there
6九、Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable
解決方法:this cannot be done as of Spark 1.6,需升級spark版本
70、IDEA從git導入scala項目,通篇提示變量never used
解決方法:將src文件夾mark directory as sources root
7一、Run configuration in IntelliJ result in "Cannot start compilation: the output path is not specified for module "xxx". Specify the output path in Configure Project.
解決方法:In the default intellij options, "Make" was checked as "Before Launch". Unchecking it fixed the issue.
7二、UDFRegistration$$anonfun$register$26$$anonfun$apply$2 cannot be cast to scala.Function1
解決方法:聚合函數不能用UDF,而應該定義UDAF
7三、SPARK SQL replacement for mysql GROUP_CONCAT aggregate function
解決方法:自定義UDAF
7四、在intellij idea的maven項目中,沒法New scala文件
解決方法:pom.xml加入scala-tools插件相關配置,下載並更新
7五、Error:scala: Error: org.jetbrains.jps.incremental.scala.remote.ServerException
解決方法:修改pom.xml配置文件,把scala換到最新版本
7六、HADOOP 磁盤滿的各節點平衡
解決方法:運行指令hdfs balancer -Threshold 3 或者 運行 start-balancer.sh 腳本格式:$Hadoop_home/bin/start-balancer.sh -threshold,參數3是比例參數,表示3%,也就是平各個DataNode直接磁盤使用率誤差在3%之內
7七、經驗:sparkSQL UDAF中update函數的第二個參數 input: Row 對應的並不是DataFrame的行,而是被inputSchema投影了的行
7八、Error: No TypeTag available for String sqlContext.udf.register()
解決方法:scala版本不一致,統一全部scala版本
7九、How to add a constant column in a Spark DataFrame?
解決方法:The second argument for DataFrame.withColumn should be a Column so you have to use a literal: df.withColumn('new_column', lit(10))
80、Error:scalac:Error:object VolatileDoubleRef does not have a member create
解決方法:scala版本不一致,統一開發環境和系統的scala版本
8一、java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet
解決方法:統一scala和spark的scala版本
8二、maven項目打包去除不要的依賴,防止目標jar容量過大
解決方法:在中加入provided標明該依賴不放進目標jar,並用maven shaded方式打包
8三、maven打包scala和java的混合項目
解決方法:使用指令 mvn clean scala:compile compile package
8四、sparkSQL的udf沒法註冊UDAF聚合函數
解決方法:把UDAF自定義類的object關鍵字改爲class聲明
8五、經驗:運行時刪除hadoop數據目錄會致使依賴HDFS的JOB失效
8六、[IllegalArgumentException[Document contains at least one immense term in field=XXX
解決方法:在ES中建立索引時對長文本字段要分詞
8七、maven shade打包資源文件沒有打進去
解決方法:把resources文件夾放到src/main/下面,與scala或java文件夾並排
8八、經驗:spark Graph根據邊集合構建圖,頂點集合只是指定圖中哪些頂點有效
8九、ES寫query用到正則匹配時,Determinizing automaton would result in more than 10000 states.
解決方法:正則表達式的字符串太長,複雜度太高,正則匹配要精練,不要枚舉式匹配
90、java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53)
解決方法:sql語句的where條件過長,字符串棧溢出
9一、org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
解決方法:加大executor內存,減小executor個數,加大executor併發度
9二、ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 61.0 GB of 61 GB physical memory used
解決方法:移除RDD緩存操做,增長該JOB的spark.storage.memoryFraction係數值,增長該job的spark.yarn.executor.memoryOverhead值
9三、EsRejectedExecutionException[rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction
解決方法:減小spark併發數,下降對ES的併發讀取
9四、經驗:單個spark任務的excutor核數不宜設置太高,不然會致使其餘JOB延遲
9五、經驗:數據傾斜只發生在shuffle過程,可能觸發shuffle操做的算子有:distinct groupByKey reduceByKey aggregateByKey join cogroup repartition等
9六、如何定位spark的數據傾斜
解決方法:在Spark Web UI看一下當前stage各個task分配的數據量以及執行時間,根據stage劃分原理定位代碼中shuffle類算子
9七、如何解決spark數據傾斜
解決方法:1)過濾少數致使傾斜的key(僅限於拋棄的Key對做業影響很小),2)提升shuffle操做並行度(提高效果有限),3)兩階段聚合(局部聚合+全局聚合),先對相同的key加前綴變成多個key,局部shuffle後再去掉前綴,再次進行全局shuffle(僅適用於聚合類的shuffle操做,效果明顯,對於join類的shuffle操做無效),4)將reduce join轉爲map join,將小表進行廣播,對大表map操做,遍歷小表數據(僅適用於大小表或RDD狀況),5)使用隨機前綴和擴容RDD進行join,對其中一個RDD每條數據打上n之內的隨機前綴,用flatMap算子對另外一個RDD進行n倍擴容並擴容後的每條數據依次打上0~n的前綴,最後將兩個改造key後的RDD進行join(能大幅緩解join類型數據傾斜,須要消耗鉅額內存)
9八、經驗:shuffle write就是在一個stage結束計算以後,爲了下一個stage能夠執行shuffle類的算子,而將每一個task處理的數據按key進行分類,將相同key都寫入同一個磁盤文件中,而每個磁盤文件都只屬於下游stage的一個task,在將數據寫入磁盤以前,會先將數據寫入內存緩存中,下一個stage的task有多少個,當前stage的每一個task就要建立多少份磁盤文件。
9九、java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 0
解決方法:元字符記得轉義
100、spark彈性資源分配
解決方法:配置spark shuffle service,打開spark.dynamicAllocation.enabled
10一、經驗:kafka的comsumer groupID對於spark direct streaming無效
10二、啓動hadoop yarn,發現只啓動了ResourceManager,沒有啓動NodeManager
解決方法:yarn-site.xml配置有問題,檢查並規範各項配置
10三、如何查看hadoop系統日誌
解決方法:Hadoop 2.x中YARN系統的服務日誌包括ResourceManager日誌和各個NodeManager日誌,它們的日誌位置以下:ResourceManager日誌存放位置是Hadoop安裝目錄下的logs目錄下的yarn-*-resourcemanager-*.log,NodeManager日誌存放位置是各個NodeManager節點上hadoop安裝目錄下的logs目錄下的yarn-*-nodemanager-*.log
10四、經驗:小於128M的小文件都會佔據一個128M的BLOCK,合併或者刪除小文件節省磁盤空間
10五、how to remove Non DFS Used
解決方法:1)清除hadoop數據目錄中用戶緩存文件:cd /data/hadoop/storage/tmp/nm-local-dir/usercache;du -h;rm -rf `find -type f -size +10M`; 2)清理Linux文件系統中的垃圾數據
10六、經驗:Non DFS Used指的是非HDFS的全部文件
10七、linux profile配置文件隔離
解決方法:cd /etc/profile.d;在這裏新建相應配置腳本
10八、The reference to entity "autoReconnect" must end with the ';' delimiter
解決方法:把&替換成&
10九、Service hiveserver not found
解決方法:Try to run bin/hive --service hiveserver2 instead of hive --service hiveserver for this version of apache hive
1十、Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
解決方法:不要預編譯的spark,從新編譯spark,並保證與hive pom中的版本一致
1十一、java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS at org.apache.hive.spark.client.rpc.RpcConfiguration.(RpcConfiguration.java:45)
解決方法:hive spark版本要匹配,同時必須是沒有-phive參數編譯的spark
1十二、javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
解決方法:把mysql connector加入hive的lib中
11三、org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
解決方法:緣由有多種,去hive.log查看日誌進一步定位問題
11四、Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
解決方法:編譯spark用了hadoop-provided參數,致使缺乏hadoop相關包
11五、linux 輸入錯誤命令 按刪除鍵顯示^H
解決方法:執行指令 stty erase ^H
11六、經驗:經過hive源文件pom.xml查看適配的spark版本,只要打版本保持一致就行,例如spark1.6.0和1.6.2都能匹配
11七、經驗:打開Hive命令行客戶端,觀察輸出日誌是否有打印「SLF4J: Found binding in [jar:file:/work/poa/hive-2.1.0-bin/lib/spark-assembly-1.6.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]」來判斷hive有沒有綁定spark
11八、啓動yarn,發現只啓動了部分Nodemanager
解決方法:未啓動的節點缺乏yarn相關包,要保持全部節點jar包一致
11九、Error: Could not find or load main class org.apache.hive.beeline.BeeLine
解決方法:從新編譯Hive,並帶上參數-Phive-thriftserver
120、經驗:編譯spark,hive on spark就不要加-Phive參數,若需sparkSQL支持hive語法則要加-Phive參數
12一、User class threw exception: org.apache.spark.sql.AnalysisException: path hdfs://XXXXXX already exists.;
解決方法:df.write.format("parquet").mode("append").save("path.parquet")
12二、check the manual that corresponds to your MySQL server version for the right syntax to use near 'OPTION SQL_SELECT_LIMIT=DEFAULT' at line 1
解決方法:用新版mysql-connector
12三、org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate
解決方法:vim core-site.xml,hadoop.proxyuser.root.hosts,value = *,hadoop.proxyuser.root.groups,value = *,restart yarn
12四、java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$MessageTypeBuilder.addFields([Lorg/apache/parquet/schema/Type;)Lorg/apache/parquet/schema/Types$BaseGroupBuilder;
解決方法:版本衝突所致,統一hive和spark中parquet組件版本
12五、經驗:能夠經過hive-site.xml修改spark.executor.instances、spark.executor.cores、spark.executor.memory等配置來優化hive on spark執行性能,不過最好配成動態資源分配。
12六、WARN SparkContext: Dynamic Allocation and num executors both set, thus dynamic allocation disabled.
解決方法:若是要使用動態資源分配,就不要設置執行器個數
12七、Invalid configuration property node.environment: is malformed (for class io.airlift.node.NodeConfig.environment)
解決方法:the node.environment property (in the node.properties file) is set but fails to match the following regular expression: [a-z0-9][_a-z0-9]*. 從新規範命名
12八、com.facebook.presto.server.PrestoServerNo factory for connector hive-XXXXXX
解決方法:在hive.properties中 connector.name寫錯了,應該爲指定的版本,以便於presto使用對應的適配器,修改成:connector.name=hive-hadoop2
12九、org.apache.spark.SparkException: Task failed while writing rows Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: null
解決方法:ES負載太高,修復ES
130、經驗:若是maven下載很慢,極可能是被天朝的GFW牆了,能夠在maven安裝目錄的setting.conf配置文件mirrors標籤下加入國內鏡像抵制**黨的網絡封鎖,例如:
nexus-aliyun
*
Nexus aliyun
http://maven.aliyun.com/nexus/content/groups/public
13一、RROR ApplicationMaster: Uncaught exception: java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
解決方法:pom.xml文件中標籤下加入
META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA
13二、scala.MatchError: Buffer(10.113.80.29, None) (of class scala.collection.convert.Wrappers$JListWrapper)
解決方法:清除ES中跟scala數據類型不兼容的髒數據
13三、HDFS誤刪文件如何恢復解決方法:core-site文件中加入
fs.trash.interval
2880
HDFS垃圾箱設置,能夠恢復誤刪除,配置的值爲分鐘數,0爲禁用
恢復文件執行 hdfs dfs -mv /user/root/.Trash/Current/誤刪文件 /原路徑
13四、改了linux定時腳本里邊部分任務順序,致使有些任務未執行,而有些重複執行
解決方法:Linux腳本修改後實時生效,務必在腳本所有執行完再修改,以避免產生反作用
13五、經驗:spark兩個分區方法coalesce和repartition,前者窄依賴,分區後數據不均勻,後者寬依賴,引起shuffle操做,分區後數據均勻
13六、org.apache.spark.SparkException: Task failed while writing rows scala.MatchError: Buffer(10.113.80.29, None) (of class scala.collection.convert.Wrappers$JListWrapper)
解決方法:ES數據在sparksql類型轉化時不兼容,可經過EsSpark.esJsonRDD以字符串形式取ES數據,再把rdd轉換成dataframe
13七、Container exited with a non-zero exit code 143 Killed by external signal
解決方法:分配的資源不夠,加大內存或者調整代碼,儘可能避免相似JsonObject這樣的大對象過分消耗內存,或者Include below properties in yarn-site.xml and restart VM,
yarn.nodemanager.vmem-check-enabled
false
Whether virtual memory limits will be enforced for containers
yarn.nodemanager.vmem-pmem-ratio
4
Ratio between virtual memory to physical memory when setting memory limits for containers
13八、對已有jar手動生成maven依賴
解決方法:mvn install:install-file -Dfile=spark-assembly-1.6.2-hadoop2.6.0.jar -DgroupId=org.apache.repack -DartifactId=spark-assembly-1.6.2-hadoop2.6.0 -Dversion=2.6 -Dpackaging=jar
13九、FAILED: SemanticException [Error 10006]: Line 1:122 Partition not found ''2016-08-01''
解決方法:hive版本太新,hive自身bug,把hive版本從2.1.0降到1.2.1
140、ParseException line 1:17 mismatched input 'hdfs' expecting StringLiteral near 'inpath' in load statement
解決方法:去掉以hdfs開頭的IP端口號前綴,直接寫HDFS中的絕對路徑,並用單引號括起來
14一、[ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected解決方案:export HADOOP_USER_CLASSPATH_FIRST=true
14二、crontab中啓動的shell腳本不能正常運行,可是使用手動執行沒有問題
解決方法:在腳本第一行寫上source /etc/profile,由於cront進程不會自動加載用戶目錄下的.profile文件
14三、SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted
解決方法:集羣資源不夠,確保真實剩餘內存大於spark job申請的內存
14四、PrestoException: ROW comparison not supported for fields with null elements
解決方法:把 !=null 換成 is not null
14五、啓動presto服務器,部分節點啓動不成功
解決方法:JVM所分配的內存,必須小於真實剩餘內存
14六、經驗:presto進程一旦啓動,JVM server會一直佔用內存
14七、Error injecting constructor, java.lang.IllegalArgumentException: query.max-memory-per-node set to 20GB, but only 10213706957B of useable heap available
解決方法:Presto will claim 0.40 * max heap size for the system pool, so your query.max-memory-per-node must not exceed this. You can increase the heap or decrease query.max-memory-per-node.
14八、failed: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. failed java.util.concurrent.CancellationException: Task was cancelled
解決方法:such exceptions caused by timeout limits,延長等待時間,在work節點config配置中set exchange.http-client.request-timeout=50s
14九、大數據ETL可視化有哪些主流方案
解決方法:能夠考慮的技術棧有ELK(elasticsearch+logstash+kibana)或者HPA(hive+presto+airpal)
150、經驗:presto集羣不必採用on yarn模式,由於hadoop依賴HDFS,若是部分機器磁盤很小,HADOOP會很尷尬,而presto是純內存計算,不依賴磁盤,獨立安裝能夠跨越多個集羣,能夠說有內存的地方就能夠有presto