Windows環境下JvisulaVM通常存在於安裝了JDK的目錄${JAVA_HOME}/bin/JvisualVM.exe,它支持(本地和遠程)jstatd和JMX兩種方式鏈接遠程JVM。java
jstatd (Java Virtual Machine jstat Daemon)——監聽遠程服務器的CPU,內存,線程等信息node
JMX(Java Management Extensions,即Java管理擴展)是一個爲應用程序、設備、系統等植入管理功能的框架。JMX能夠跨越一系列異構操做系統平臺、系統體系結構和網絡傳輸協議,靈活的開發無縫集成的系統、網絡和服務管理應用。python
備註:針對jstatd我嘗試未成功,所以也不在這裏誤導別人。sql
正常配置:apache
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname=<ip> -Dcom.sun.management.jmxremote.port=<port>
在Spark中監控executor時,須要先配置jmx而後再啓動spark應用程序,配置方式有三種:windows
1)在spark-defaults.conf中配置那三個參數bash
2)在spark-env.sh中配置:配置master,worker的JavaOptions服務器
3)在spark-submit提交時配置網絡
這裏採用如下spark-submit提交時配置:app
spark-submit \ --class myTest.KafkaWordCount \ --master yarn \ --deploy-mode cluster \
--conf "spark.executor.extraJavaOptions=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=0 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false" \ --verbose \ --executor-memory 1G \ --total-executor-cores 6 \ /hadoop/spark/app/spark/20151223/testSpark.jar *.*.*.*:* test3 wordcount 4 kafkawordcount3 checkpoint4
注意:
1)不能指定具體的 ip 和 port------由於spark中運行時,極可能一個節點上分配多個container進程,此時佔用同一個端口,會致使spark應用程序經過spark-submit提交失敗。
2)由於不指定具體的ip和port,因此在任務提交階段會自動分配端口。
3)上邊三種配置方式可能會致使監控級別不一樣(好比spark-submit只針對一個應用程序,spark-env.sh多是全局一個節點全部executor監控【未驗證】,請讀者注意。)
經過yarn applicationattempt -list appicationId查找到applicationattemptid
[root@cdh-143 bin]# yarn applicationattempt -list application_1559203334026_0015 19/06/01 17:57:18 INFO client.RMProxy: Connecting to ResourceManager at CDH-143/10.dx.dx.143:8032 Total number of application attempts :1 ApplicationAttempt-Id State AM-Container-Id Tracking-URL appattempt_1559203334026_0015_000001 RUNNING container_1559203334026_0015_01_000001 http://CDH-143:8088/proxy/application_1559203334026_0015/
經過yarn container -list aaplicationattemptId查找container id list
[root@cdh-143 bin]# yarn container -list appattempt_1559203334026_0015_000001 19/06/01 17:57:52 INFO client.RMProxy: Connecting to ResourceManager at CDH-143/10.dx.dx.143:8032 Total number of containers :16 Container-Id Start Time Finish Time State Host LOG-URL container_1559203334026_0015_01_000012 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000012/dx container_1559203334026_0015_01_000013 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000013/dx container_1559203334026_0015_01_000010 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000010/dx container_1559203334026_0015_01_000011 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000011/dx container_1559203334026_0015_01_000016 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000016/dx container_1559203334026_0015_01_000014 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000014/dx container_1559203334026_0015_01_000015 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000015/dx container_1559203334026_0015_01_000004 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000004/dx container_1559203334026_0015_01_000005 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000005/dx container_1559203334026_0015_01_000002 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000002/dx container_1559203334026_0015_01_000003 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000003/dx container_1559203334026_0015_01_000008 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000008/dx container_1559203334026_0015_01_000009 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000009/dx container_1559203334026_0015_01_000006 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000006/dx container_1559203334026_0015_01_000007 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000007/dx container_1559203334026_0015_01_000001 Sat Jun 01 13:27:38 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000001/dx
到具體executor所在節點服務器上,使用以下命令找到運行的線程,和 pid
[root@cdh-146 ~]# ps -axu | grep container_1559203334026_0015_01_000013 yarn 8844 0.0 0.0 113144 1496 ? S 13:27 0:00 bash /data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/default_container_executor.sh yarn 8857 0.0 0.0 113280 1520 ? Ss 13:27 0:00 /bin/bash -c /usr/java/jdk1.8.0_171-amd64/bin/java -server -Xmx6144m '-Dcom.sun.management.jmxremote' '-Dcom.sun.management.jmxremote.port=0' '-Dcom.sun.management.jmxremote.authenticate=false' '-Dcom.sun.management.jmxremote.ssl=false' -Djava.io.tmpdir=/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/tmp '-Dspark.network.timeout=10000000' '-Dspark.driver.port=47564' '-Dspark.port.maxRetries=32' -Dspark.yarn.app.container.log.dir=/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@CDH-143:47564 --executor-id 12 --hostname CDH-146 --cores 2 --app-id application_1559203334026_0015 --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/__app__.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/streaming-dx-perf-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx-common-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-sql-kafka-0-10_2.11-2.4.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-avro_2.11-3.2.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/shc-core-1.1.2-2.2-s_2.11-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/rocksdbjni-5.17.2.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/kafka-clients-0.10.0.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/elasticsearch-spark-20_2.11-6.4.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx_Spark_State_Store_Plugin-1.0-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-core_2.11-0.9.5.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-avro_2.11-0.9.5.jar 1>/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013/stdout 2>/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013/stderr yarn 9000 143 3.3 8736712 4379648 ? Sl 13:27 24:35 /usr/java/jdk1.8.0_171-amd64/bin/java -server -Xmx6144m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=0 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.io.tmpdir=/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/tmp -Dspark.network.timeout=10000000 -Dspark.driver.port=47564 -Dspark.port.maxRetries=32 -Dspark.yarn.app.container.log.dir=/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@CDH-143:47564 --executor-id 12 --hostname CDH-146 --cores 2 --app-id application_1559203334026_0015 --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/__app__.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx-domain-perf-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx-common-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-sql-kafka-0-10_2.11-2.4.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-avro_2.11-3.2.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/shc-core-1.1.2-2.2-s_2.11-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/rocksdbjni-5.17.2.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/kafka-clients-0.10.0.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/elasticsearch-spark-20_2.11-6.4.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx_Spark_State_Store_Plugin-1.0-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-core_2.11-0.9.5.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-avro_2.11-0.9.5.jar root 25939 0.0 0.0 112780 956 pts/1 S+ 13:45 0:00 grep --color=auto container_1559203334026_0015_01_000013
而後經過 pid 找到對應JMX的端口
[root@cdh-146 ~]# sudo netstat -antp | grep 9000 tcp 0 0 10.dx.dx.146:9000 0.0.0.0:* LISTEN 2642/python2.7 tcp6 0 0 :::48169 :::* LISTEN 9000/java tcp6 0 0 :::37692 :::* LISTEN 9000/java tcp6 0 0 10.dx.dx.146:52710 :::* LISTEN 9000/java tcp6 0 0 10.dx.dx.146:55535 10.dx.dx.142:38397 ESTABLISHED 9000/java tcp6 64088 0 10.dx.dx.146:45410 10.206.186.35:9092 ESTABLISHED 9000/java tcp6 0 0 10.dx.dx.146:60259 10.dx.dx.143:47564 ESTABLISHED 9000/java
結果中看,疑似爲48169或37692,稍微嘗試一下便可連上對應的 spark executor
在本地windows服務器上找到JDK的目錄,找到文件${JAVA_HOME}/bin/JvisualVM.exe,並運行它。啓動後選擇「遠程」右鍵,添加JMX監控
填寫監控executor所在節點ip
而後就能夠啓動監控: