Spark任務啓動後,咱們一般都是經過跳板機去Spark UI界面查看對應任務的信息,一旦任務多了以後,這將會是讓人頭疼的問題。若是能將全部任務信息集中起來監控,那將會是很完美的事情。html
經過Spark官網指導文檔,發現Spark只支持如下sinknode
Each instance can report to zero or more sinks. Sinks are contained in the org.apache.spark.metrics.sink
package:git
ConsoleSink
: Logs metrics information to the console.CSVSink
: Exports metrics data to CSV files at regular intervals.JmxSink
: Registers metrics for viewing in a JMX console.MetricsServlet
: Adds a servlet within the existing Spark UI to serve metrics data as JSON data.GraphiteSink
: Sends metrics to a Graphite node.Slf4jSink
: Sends metrics to slf4j as log entries.StatsdSink
: Sends metrics to a StatsD node.
沒有比較經常使用的Influxdb和Prometheus ~~~github
谷歌一把發現要支持influxdb須要使用第三方包,比較有參考意義的是這篇,Monitoring Spark Streaming with InfluxDB and Grafana ,在提交任務的時候增長file和配置文件,但成功永遠不會這麼輕鬆。。。apache
寫入influxdb的數據都是以application_id命名的,相似這種application_1533838659288_1030_1_jvm_heap_usage,也就是說每一個任務的指標都是在單獨的表,最終咱們展現在grafana不還得一個一個配置麼?
api
顯然這個不是我想要的結果,最終目的就是:一次配置後每提交一個任務自動會在監控上看到。bash
谷歌是治癒一切的良藥,終究找到一個比較完美的解決方案,就是經過graphite_exporter中轉數據後接入Prometheus,再經過grafana展現出來。微信
因此,目前已經實踐可行的方案有兩個app
監控數據直接寫入influxdb,再經過grafana讀取數據作展現,步驟以下:ssh
1.在spark下 conf/metrics.properties 加入如下配置
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource driver.source.jvm.class=org.apache.spark.metrics.source.JvmSourc executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
*.sink.influx.class=org.apache.spark.metrics.sink.InfluxDbSink
*.sink.influx.protocol=http
*.sink.influx.host=xx.xx.xx.xx
*.sink.influx.port=8086
*.sink.influx.database=sparkonyarn
*.sink.influx.auth=admin:admin
2.在提交任務的時候增長如下配置,並確保如下jar存在
--files /spark/conf/metrics.properties \
--conf spark.metrics.conf=metrics.properties \
--jars /spark/jars/metrics-influxdb-1.1.8.jar,/spark/jars/spark-influx-sink-0.4.0.jar \
--conf spark.driver.extraClassPath=metrics-influxdb-1.1.8.jar:spark-influx-sink-0.4.0.jar \
--conf spark.executor.extraClassPath=metrics-influxdb-1.1.8.jar:spark-influx-sink-0.4.0.jar
缺點:application_id發生變化須要從新配置grafana
經過graphite_exporter將原生數據經過映射文件轉化爲有 label 維度的 Prometheus 數據
1.下載graphite_exporter,解壓後執行如下命令,其中graphite_exporter_mapping須要咱們本身建立,內容爲數據映射文件
nohup ./graphite_exporter --graphite.mapping-config=graphite_exporter_mapping &
例如
mappings: - match: '*.*.jvm.*.*' name: jvm_memory_usage labels: application: $1 executor_id: $2 mem_type: $3 qty: $4
會將數據轉化成 metric name
爲 jvm_memory_usage
,label
爲 application
,executor_id
,mem_type
,qty
的格式。
application_1533838659288_1030_1_jvm_heap_usage
-> jvm_memory_usage{application="application_1533838659288_1030",executor_id="driver",mem_type="heap",qty="usage"}
2.配置 Prometheus 從 graphite_exporter 獲取數據,重啓prometheus
/path/to/prometheus/prometheus.yml
scrape_configs: - job_name: 'spark' static_configs: - targets: ['localhost:9108']
3.在spark下 conf/metrics.properties 加入如下配置
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource driver.source.jvm.class=org.apache.spark.metrics.source.JvmSourc executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.protocol=tcp
*.sink.graphite.host=xx.xx.xx.xx
*.sink.graphite.port=9109
*.sink.graphite.period=5
*.sink.graphite.unit=seconds
4.提交spark任務的時候增長 --files /spark/conf/metrics.properties
5.最後在grafana建立prometheus數據源,建立須要的指標,最終效果以下,有新提交的任務不須要再配置監控,直接選擇application_id就能夠看對應的信息
須要用到的jar包
https://repo1.maven.org/maven2/com/izettle/metrics-influxdb/1.1.8/metrics-influxdb-1.1.8.jar
https://mvnrepository.com/artifact/com.palantir.spark.influx/spark-influx-sink
模板
mappings:
- match: '*.*.executor.filesystem.*.*'
name: filesystem_usage
labels:
application: $1
executor_id: $2
fs_type: $3
qty: $4
- match: '*.*.executor.threadpool.*'
name: executor_tasks
labels:
application: $1
executor_id: $2
qty: $3
- match: '*.*.executor.jvmGCTime.count'
name: jvm_gcTime_count
labels:
application: $1
executor_id: $2
- match: '*.*.executor.*.*'
name: executor_info
labels:
application: $1
executor_id: $2
type: $3
qty: $4
- match: '*.*.jvm.*.*'
name: jvm_memory_usage
labels:
application: $1
executor_id: $2
mem_type: $3
qty: $4
- match: '*.*.jvm.pools.*.*'
name: jvm_memory_pools
labels:
application: $1
executor_id: $2
mem_type: $3
qty: $4
- match: '*.*.BlockManager.*.*'
name: block_manager
labels:
application: $1
executor_id: $2
type: $3
qty: $4
- match: '*.driver.DAGScheduler.*.*'
name: DAG_scheduler
labels:
application: $1
type: $2
qty: $3
- match: '*.driver.*.*.*.*'
name: task_info
labels:
application: $1
task: $2
type1: $3
type2: $4
qty: $5
參考資料
https://github.com/palantir/spark-influx-sink
https://spark.apache.org/docs/latest/monitoring.html
https://www.linkedin.com/pulse/monitoring-spark-streaming-influxdb-grafana-christian-g%C3%BCgi
https://github.com/prometheus/prometheus/wiki/Default-port-allocations
https://github.com/prometheus/graphite_exporter
https://prometheus.io/download/
https://rokroskar.github.io/monitoring-spark-on-hadoop-with-prometheus-and-grafana.html