最近的一個項目中涉及到文件上傳和下載,使用到JUC的線程池ThreadPoolExecutor
,在生產環境中出現了某些時刻線程池滿負載運做,因爲使用了CallerRunsPolicy
拒絕策略,致使滿負載狀況下,應用接口調用沒法響應,處於假死狀態。考慮到以前用micrometer + prometheus + grafana搭建過監控體系,因而考慮使用micrometer作一次主動的線程池度量數據採集,最終能夠相對實時地展現在grafana的面板中。java
下面經過真正的實戰過程作一個仿真的例子用於覆盤。web
首先咱們要整理一下ThreadPoolExecutor
中提供的度量數據項和micrometer對應的Tag的映射關係:spring
thread.pool.name
,這個很重要,用於區分各個線程池的數據,若是使用IOC容器管理,可使用BeanName代替。int getCorePoolSize()
:核心線程數,Tag:thread.pool.core.size
。int getLargestPoolSize()
:歷史峯值線程數,Tag:thread.pool.largest.size
。int getMaximumPoolSize()
:最大線程數(線程池線程容量),Tag:thread.pool.max.size
。int getActiveCount()
:當前活躍線程數,Tag:thread.pool.active.size
。int getPoolSize()
:當前線程池中運行的線程總數(包括核心線程和非核心線程),Tag:thread.pool.thread.count
。thread.pool.queue.size
,這個須要動態計算得出。接着編寫具體的代碼,實現的功能以下:json
ThreadPoolExecutor
實例,核心線程和最大線程數爲10,任務隊列長度爲10,拒絕策略爲AbortPolicy
。ThreadPoolExecutor
實例中上面列出的度量項,保存到micrometer內存態的收集器中。因爲這些統計的值都會跟隨時間發生波動性變動,能夠考慮選用Gauge
類型的Meter進行記錄。app
// ThreadPoolMonitor import io.micrometer.core.instrument.Metrics; import io.micrometer.core.instrument.Tag; import org.springframework.beans.factory.InitializingBean; import org.springframework.stereotype.Service; import java.util.Collections; import java.util.concurrent.*; import java.util.concurrent.atomic.AtomicInteger; /** * @author throwable * @version v1.0 * @description * @since 2019/4/7 21:02 */ @Service public class ThreadPoolMonitor implements InitializingBean { private static final String EXECUTOR_NAME = "ThreadPoolMonitorSample"; private static final Iterable<Tag> TAG = Collections.singletonList(Tag.of("thread.pool.name", EXECUTOR_NAME)); private final ScheduledExecutorService scheduledExecutor = Executors.newSingleThreadScheduledExecutor(); private final ThreadPoolExecutor executor = new ThreadPoolExecutor(10, 10, 0, TimeUnit.SECONDS, new ArrayBlockingQueue<>(10), new ThreadFactory() { private final AtomicInteger counter = new AtomicInteger(); @Override public Thread newThread(Runnable r) { Thread thread = new Thread(r); thread.setDaemon(true); thread.setName("thread-pool-" + counter.getAndIncrement()); return thread; } }, new ThreadPoolExecutor.AbortPolicy()); private Runnable monitor = () -> { //這裏須要捕獲異常,儘管實際上不會產生異常,可是必須預防異常致使調度線程池線程失效的問題 try { Metrics.gauge("thread.pool.core.size", TAG, executor, ThreadPoolExecutor::getCorePoolSize); Metrics.gauge("thread.pool.largest.size", TAG, executor, ThreadPoolExecutor::getLargestPoolSize); Metrics.gauge("thread.pool.max.size", TAG, executor, ThreadPoolExecutor::getMaximumPoolSize); Metrics.gauge("thread.pool.active.size", TAG, executor, ThreadPoolExecutor::getActiveCount); Metrics.gauge("thread.pool.thread.count", TAG, executor, ThreadPoolExecutor::getPoolSize); // 注意若是阻塞隊列使用無界隊列這裏不能直接取size Metrics.gauge("thread.pool.queue.size", TAG, executor, e -> e.getQueue().size()); } catch (Exception e) { //ignore } }; @Override public void afterPropertiesSet() throws Exception { // 每5秒執行一次 scheduledExecutor.scheduleWithFixedDelay(monitor, 0, 5, TimeUnit.SECONDS); } public void shortTimeWork() { executor.execute(() -> { try { // 5秒 Thread.sleep(5000); } catch (InterruptedException e) { //ignore } }); } public void longTimeWork() { executor.execute(() -> { try { // 500秒 Thread.sleep(5000 * 100); } catch (InterruptedException e) { //ignore } }); } public void clearTaskQueue() { executor.getQueue().clear(); } } //ThreadPoolMonitorController import club.throwable.smp.service.ThreadPoolMonitor; import lombok.RequiredArgsConstructor; import org.springframework.http.ResponseEntity; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RestController; /** * @author throwable * @version v1.0 * @description * @since 2019/4/7 21:20 */ @RequiredArgsConstructor @RestController public class ThreadPoolMonitorController { private final ThreadPoolMonitor threadPoolMonitor; @GetMapping(value = "/shortTimeWork") public ResponseEntity<String> shortTimeWork() { threadPoolMonitor.shortTimeWork(); return ResponseEntity.ok("success"); } @GetMapping(value = "/longTimeWork") public ResponseEntity<String> longTimeWork() { threadPoolMonitor.longTimeWork(); return ResponseEntity.ok("success"); } @GetMapping(value = "/clearTaskQueue") public ResponseEntity<String> clearTaskQueue() { threadPoolMonitor.clearTaskQueue(); return ResponseEntity.ok("success"); } }
配置以下:jvm
server: port: 9091 management: server: port: 9091 endpoints: web: exposure: include: '*' base-path: /management
prometheus的調度Job也能夠適當調高頻率,這裏默認是15秒拉取一次/prometheus端點,也就是會每次提交3個收集週期的數據。項目啓動以後,能夠嘗試調用/management/prometheus查看端點提交的數據:ide
由於ThreadPoolMonitorSample
是咱們自定義命名的Tag,看到相關字樣說明數據收集是正常的。若是prometheus的Job沒有配置錯誤,在本地的spring-boot項目起來後,能夠查下prometheus的後臺:spring-boot
OK,完美,能夠進行下一步。性能
確保JVM應用和prometheus的調度Job是正常的狀況下,接下來重要的一步就是配置grafana面板。若是暫時不想認真學習一下prometheus的PSQL的話,能夠從prometheus後臺的/graph
面板直接搜索對應的樣本表達式拷貝進去grafana配置中就行,固然最好仍是去看下prometheus的文檔系統學習一下怎麼編寫PSQL。學習
查詢配置具體以下:
{{instance}}-{{thread_pool_name}}線程池活躍線程數
。{{instance}}-{{thread_pool_name}}線程池歷史峯值線程數
。{{instance}}-{{thread_pool_name}}線程池容量
。{{instance}}-{{thread_pool_name}}線程池核心線程數
。{{instance}}-{{thread_pool_name}}線程池運行中的線程數
。{{instance}}-{{thread_pool_name}}線程池積壓任務數
。多調用幾回例子中提供的幾個接口,就能獲得一個監控線程池呈現的圖表:
針對線程池ThreadPoolExecutor
的各項數據進行監控,有利於及時發現使用線程池的接口的異常,若是想要快速恢復,最有效的途徑是:清空線程池中任務隊列中積壓的任務。具體的作法是:能夠把ThreadPoolExecutor
委託到IOC容器管理,而且把ThreadPoolExecutor
的任務隊列清空的方法暴露成一個REST端點便可。像HTTP客戶端的鏈接池如Apache-Http-Client或者OkHttp等的監控,能夠用相似的方式實現,數據收集的時候可能因爲加鎖等緣由會有少許的性能損耗,不過這些都是能夠忽略的,若是真的怕有性能影響,能夠嘗試用反射API直接獲取ThreadPoolExecutor
實例內部的屬性值,這樣就能夠避免加鎖的性能損耗。
我的博客原文連接:http://www.throwable.club/2019/04/14/jvm-micrometer-thread-pool-monitor
(本文完 c-2-d 20190414)