一個JAVA應用啓動緩慢問題排查 --來自jdk securerandom 的問候

  開發某個項目過程當中,就需求,搭建了一套測試環境。很快完成!html

  後來代碼中加入了許多新功能,會涉及到反覆重啓,而後就發現了啓動特別慢。這給本來功能就很少的應用增添了許多的負擔。java

  我決定改變這一切!找到啓動緩慢的根源,加快啓動速度!git

思路展開:spring

  1. 啓動慢,先看日誌,有沒有什麼異常?
  2. 看下是否是gc 有問題,是否佔用很長時間?
  3. 日誌卡住,是不是存在網絡請求不通狀況?
  4. 抓包查看卡住的時候,應用都作什麼?
  5. 把線程堆棧打印出來,線程都在作什麼?把內存dump 出來瞅瞅吧?
  6. 實在不行,本地debug不行,換一臺測試機試試?
  7. 實在不行,遠程debug吧?
  8. 定位到點後,再來思考解決好的解決方案?
  9. 除了解決這個問題,咱們還能思考點什麼?網絡

讓我按思路一個個來看解決過程!app

1. 啓動慢,先看日誌,有沒有什麼異常?

  這個基本上很快就有答案了,看日誌,等待期間沒有問題!less

  可是,咱們每每會停留在這個階段較長一段時間,由於咱們總想經過其餘方式,打印點什麼出來!dom

  因此,咱們能夠會改日誌級別,try... catch... 等等一頓操做,而後,才說算了吧,咱們想一想別的招!我我的仍是以爲這種思路是ok的!ssh

 

2. 看下是否是gc 有問題,是否佔用很長時間?

  這裏可能就會涉及,以前並無打印gc日誌,而後還得回去在啓動參數里加上gc輸出!姿式以下:tcp

-Xloggc:/opt/logs/a_gc.log -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/logs/ -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+ParallelRefProcEnabled -XX:ErrorFile=/run/dump/hs_err_pid%p.lo3g -XX:HeapDumpPath=/run/dump

  打開後,你看到的gc 日誌多是這樣的。 你想從裏面發現點什麼,好像頻繁沒有 FullGC 之類的操做,而後又悻悻而歸!

10.224: [GC concurrent-root-region-scan-start]
10.249: [GC concurrent-root-region-scan-end, 0.0252691 secs]
10.249: [GC concurrent-mark-start]
10.250: [GC concurrent-mark-end, 0.0005479 secs]
10.250: [GC remark 10.251: [Finalize Marking, 0.0001862 secs] 10.251: [GC ref-proc, 0.0026452 secs] 10.253: [Unloading, 0.0125200 secs], 0.0156381 secs]
 [Times: user=0.06 sys=0.00, real=0.02 secs]
10.266: [GC cleanup 33M->33M(1024M), 0.0012152 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs]
13.768: [GC pause (G1 Evacuation Pause) (young), 0.0411037 secs]
   [Parallel Time: 34.5 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 13768.7, Avg: 13769.8, Max: 13773.1, Diff: 4.4]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 1.8, Max: 4.1, Diff: 4.1, Sum: 7.3]
      [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]
      [Code Root Scanning (ms): Min: 0.0, Avg: 3.1, Max: 8.8, Diff: 8.8, Sum: 12.5]
      [Object Copy (ms): Min: 21.1, Avg: 28.0, Max: 30.5, Diff: 9.4, Sum: 111.9]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
         [Termination Attempts: Min: 1, Avg: 12.2, Max: 22, Diff: 21, Sum: 49]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 29.9, Avg: 33.2, Max: 34.3, Diff: 4.4, Sum: 132.6]
      [GC Worker End (ms): Min: 13803.0, Avg: 13803.0, Max: 13803.0, Diff: 0.0]
   [Code Root Fixup: 0.4 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 5.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 4.6 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.9 ms]
   [Eden: 544.0M(544.0M)->0.0B(573.0M) Survivors: 31.0M->41.0M Heap: 575.0M(1024.0M)->41.0M(1024.0M)]
 [Times: user=0.13 sys=0.02, real=0.04 secs]
17.178: [GC pause (G1 Evacuation Pause) (young), 0.0648366 secs]
   [Parallel Time: 59.7 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 17177.7, Avg: 17181.1, Max: 17184.4, Diff: 6.6]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 3.9, Max: 12.7, Diff: 12.7, Sum: 15.6]
      [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.2, Sum: 0.6]
      [Code Root Scanning (ms): Min: 0.0, Avg: 6.5, Max: 19.4, Diff: 19.4, Sum: 26.0]
      [Object Copy (ms): Min: 34.9, Avg: 45.5, Max: 51.2, Diff: 16.3, Sum: 182.1]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3]
         [Termination Attempts: Min: 1, Avg: 16.5, Max: 24, Diff: 23, Sum: 66]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
      [GC Worker Total (ms): Min: 52.9, Avg: 56.2, Max: 59.5, Diff: 6.6, Sum: 224.8]
      [GC Worker End (ms): Min: 17237.3, Avg: 17237.3, Max: 17237.3, Diff: 0.0]
   [Code Root Fixup: 0.5 ms]
20.648: [GC pause (G1 Evacuation Pause) (young), 0.0653569 secs]
   [Parallel Time: 55.9 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 20648.5, Avg: 20650.4, Max: 20652.3, Diff: 3.7]
      [Ext Root Scanning (ms): Min: 0.1, Avg: 2.9, Max: 9.4, Diff: 9.4, Sum: 11.7]
      [Update RS (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.5, Sum: 0.7]
         [Processed Buffers: Min: 0, Avg: 2.5, Max: 7, Diff: 7, Sum: 10]
      [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.6, Diff: 0.6, Sum: 1.3]
      [Code Root Scanning (ms): Min: 0.0, Avg: 3.2, Max: 10.8, Diff: 10.8, Sum: 12.9]
      [Object Copy (ms): Min: 39.4, Avg: 46.4, Max: 52.1, Diff: 12.8, Sum: 185.5]
      [Termination (ms): Min: 0.0, Avg: 0.9, Max: 1.5, Diff: 1.5, Sum: 3.7]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [GC Worker Total (ms): Min: 52.1, Avg: 54.0, Max: 55.8, Diff: 3.7, Sum: 215.8]
      [GC Worker End (ms): Min: 20704.4, Avg: 20704.4, Max: 20704.4, Diff: 0.0]
   [Code Root Fixup: 0.2 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.1 ms]
   [Other: 9.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 8.5 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.2 ms]
   [Eden: 38.0M(8192.0K)->0.0B(610.0M) Survivors: 43.0M->4096.0K Heap: 83.3M(1024.0M)->54.4M(1024.0M)]
 [Times: user=0.24 sys=0.02, real=0.06 secs]
2826.037: [GC pause (Metadata GC Threshold) (young) (initial-mark), 0.0101504 secs]
   [Parallel Time: 7.1 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 2826037.4, Avg: 2826037.5, Max: 2826037.5, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 1.7, Avg: 2.0, Max: 2.2, Diff: 0.5, Sum: 7.8]
      [Update RS (ms): Min: 0.9, Avg: 1.0, Max: 1.0, Diff: 0.1, Sum: 3.8]
         [Processed Buffers: Min: 7, Avg: 14.2, Max: 25, Diff: 18, Sum: 57]
      [Scan RS (ms): Min: 0.3, Avg: 0.7, Max: 0.9, Diff: 0.6, Sum: 2.8]
      [Code Root Scanning (ms): Min: 0.2, Avg: 0.4, Max: 0.8, Diff: 0.6, Sum: 1.5]
      [Object Copy (ms): Min: 2.7, Avg: 2.9, Max: 3.2, Diff: 0.5, Sum: 11.7]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 6.9, Avg: 7.0, Max: 7.0, Diff: 0.1, Sum: 27.8]
      [GC Worker End (ms): Min: 2826044.4, Avg: 2826044.4, Max: 2826044.4, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.1 ms]
   [Other: 2.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.6 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.0 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.5 ms]
   [Eden: 354.0M(438.0M)->0.0B(603.0M) Survivors: 4096.0K->11.0M Heap: 408.3M(1024.0M)->61.4M(1024.0M)]
 [Times: user=0.03 sys=0.00, real=0.01 secs]
2826.047: [GC concurrent-root-region-scan-start]
2826.054: [GC concurrent-root-region-scan-end, 0.0066742 secs]
2826.054: [GC concurrent-mark-start]
2826.116: [GC concurrent-mark-end, 0.0616457 secs]
2826.116: [GC remark 2826.116: [Finalize Marking, 0.0002202 secs] 2826.116: [GC ref-proc, 0.0031923 secs] 2826.120: [Unloading, 0.0131458 secs], 0.0169584 secs]
 [Times: user=0.05 sys=0.00, real=0.01 secs]
2826.134: [GC cleanup 62M->62M(1024M), 0.0010641 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs]

  無論怎麼樣,你知道內存不是問題了。不過通常地,在啓動時就存在內存問題的應用可能確實很少!

 

3. 日誌卡住,是不是存在網絡請求不通狀況?

  通常來講,咱們的應用,每作一些關鍵操做時,都會有相應的日誌輸出。因此,當你日誌卡住的時候,應該就是哪裏出現了問題了!

  而在排除了中間出現full gc 卡頓的問題後,咱們能夠認爲多是網絡出現問題了。

  查詢網絡問題必備的工具: tcpdump, lsof ...

  用法也很簡單,咱們能夠在徹底沒有任何信息的前提下,進行網絡抓包,從而給排查問題一點提示:

tcpdump -iany -XX     # 這樣就能夠看到應用的網絡io狀況了, 若是量太大能夠先把 -XX 選項去除,只看來往狀況:

  大概結果以下:

16:54:25.770463 IP 10.5.3.215.63773 > zt-d-xx.ssh: Flags [.], ack 125340, win 552, length 0
16:54:25.770478 IP zt-d-xx.ssh > 10.5.3.215.63773: Flags [P.], seq 127152:127332, ack 1, win 561, length 180
16:54:25.770482 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 125340, win 552, options [nop,nop,sack 1 {125504:125684}], length 0
16:54:25.770487 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 125684, win 556, length 0
16:54:25.770554 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 127332:127864, ack 1, win 561, length 532
16:54:25.770579 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 127864:127916, ack 1, win 561, length 52
16:54:25.770612 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 127916:128208, ack 1, win 561, length 292
16:54:25.771813 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 125684, win 556, options [nop,nop,sack 1 {125864:126044}], length 0
16:54:25.771822 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128208:128388, ack 1, win 561, length 180
16:54:25.771837 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 126044, win 555, length 0
16:54:25.771840 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 126792, win 552, length 0
16:54:25.771868 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128388:128808, ack 1, win 561, length 420
16:54:25.771884 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128808:128972, ack 1, win 561, length 164
16:54:25.771891 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 127152, win 556, length 0
16:54:25.771934 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128972:129376, ack 1, win 561, length 404
16:54:25.771959 IP zt-d-xx.ssh > 10.5.2.215.63773: Flags [P.], seq 129376:129556, ack 1, win 561, length 180
16:54:25.771982 IP zt-d-xx.ssh > 10.5.2.215.63773: Flags [P.], seq 129556:129736, ack 1, win 561, length 180
16:54:25.779454 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 127864, win 553, length 0
16:54:25.779469 IP zt-d-xx.ssh > 10.5.2.215.63773: Flags [P.], seq 129736:129916, ack 1, win 561, length 180
16:54:25.779474 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 128208, win 552, length 0
16:54:25.779476 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 128808, win 556, length 0
16:54:25.779478 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 129376, win 554, length 0
16:54:25.779480 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 129736, win 553, length 0
16:54:25.779529 IP zt-d-xx.ssh > 10.5.3.215.63773: Flags [P.], seq 129916:130496, ack 1, win 561, length 580
16:54:25.779567 IP zt-d-xx.ssh > 10.5.3.215.63773: Flags [P.], seq 130496:130660, ack 1, win 561, length 164

  其中,咱們主要關注 Flags 參數. man tcpdump 後能夠看到解釋:

  Tcpflags are some combination of S (SYN), F (FIN), P (PUSH), R (RST), U (URG), W (ECN CWR), E (ECN-Echo) or '.' (ACK)

  詳細說明就是:

    # SYN 表示創建鏈接;
    # FIN 表示關閉鏈接;
    # ACK 表示響應;
    # PSH 表示有 DATA數據傳輸;
    # RST 表示鏈接重置;
    # URG 表示緊急中止位;

  因此,咱們在這裏須要關注這麼多嗎? 其實不須要的,我以爲我們能夠先看下是否有 RST 鏈接重置的狀況,若是存在,則說明這個網絡是不通的,基本定位網絡問題!~ 以下抓包:

17:30:41.635937 IP zt-d-xx.43062 > 172.1.0.17.http: Flags [S], seq 3503889751, win 29200, options [mss 1460,sackOK,TS val 1904549734 ecr 0,nop,wscale 7], length 0
17:30:41.636084 IP 172.1.0.17.http > zt-d-xx.43062: Flags [R.], seq 0, ack 3503889752, win 0, length 0

  另外,要查看下是否是某個網絡請求 只有 [S] 信號,沒有 [P] 的信號, 則說明只有一端在創建鏈接,另外一邊則無響應,網絡問題定位!以下請求一個不存在的地址:

17:13:57.744349 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903545842 ecr 0,nop,wscale 7], length 0
17:13:58.745584 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903546844 ecr 0,nop,wscale 7], length 0
17:14:00.749570 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903548848 ecr 0,nop,wscale 7], length 0
17:14:04.757571 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903552856 ecr 0,nop,wscale 7], length 0
17:14:12.765568 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903560864 ecr 0,nop,wscale 7], length 0
17:14:28.797584 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903576896 ecr 0,nop,wscale 7], length 0
17:15:00.861591 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903608960 ecr 0,nop,wscale 7], length 0

   發了n個 [S] 包,都沒有響應; 說明整個網絡都是不通的;

  經過這個排查,咱們通常能夠排除網絡問題!

 

4. 抓包查看卡住的時候,應用都作什麼?

  這個點其實能夠早點作,使用 top 查看內存,cpu 佔用狀況!

  使用 vmstat 查看系統虛擬內存狀態,也能夠報告關於進程、內存、I/O等系統總體運行狀態!

  使用 lsof -p <pid> 查看打開的文件狀況!應該能夠看到一些不同的東西,好比你會看到一些你不知道被打開的文件,你會看到一些你不知道的鏈接請求!

lsof -i   # 列出全部的網絡鏈接
lsof -p <pid>     # 列出進程對應的文件信息
lsof -i4:8080 # 列表8080 端口對應的 ipv4 的鏈接

 

5. 把線程堆棧打印出來,線程都在作什麼?把內存dump 出來瞅瞅吧?

  若是仍是沒有發現什麼,那咱們還得回來。看看線程卡在了哪裏!

  使用 jstack 把線程堆棧打印出來,使用 jmap 把內存dump 出來,分析!

jstack <pid> | less        # 查看線程信息,是否存在死鎖
jmap -dump:format=b,file=/tmp/dumpid.dump <pid>        # 查看堆信息,是否須要特別的對象

 

6. 實在不行,本地debug不行,換一臺測試機試試?

  對於本地debug, 我只能說,通常環境都不通的,並且本地通常也很復現場景!

  換臺機器測試的目的,實際上是想確認問題是否在全部機器上覆現,由於若是是和機器自己相關的問題,每每是很難排查的。(我說的沒有相關經驗的同窗)

  因此,找一臺另外機器,代碼跑起來,若是其餘地方正常,則該問題是機器相關的。

  那麼機器相關的bug又該怎麼辦呢?其實,還得具體問題,具體分析!

  不過幸虧,咱們還有 remote debug 功能,直接鏈接上去就能夠調試了。問題天然也就清晰起來!

 

7. 實在不行,遠程debug吧?

  remote debug 請查看上一篇文章: 排障利器之遠程調試與監控 --jmx & remote debug 

  鏈接上以後,先大概猜想可能會出問題的地方,進行斷點。判斷依據,應該是以前的排查結果,線程狀況等等!

  進行大概估計後,在可能的地方進行單步,卡住的地方,其中必然出現了問題。只需再進入內部重複剛纔的作法便可!

  優勢是這樣調試下來,一定能找到問題出現的地方。缺點是:可能須要反覆重現問題(可能就是反覆重啓)。可是這樣顯得沒有技術含量,還有就是在問題難以復現的場景,很難抓住機會解決問題。

  因此,經驗真的很重要!

  卡住時的堆棧以下:

"main@1" prio=5 tid=0x1 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at java.io.FileInputStream.readBytes(FileInputStream.java:-1)
      at java.io.FileInputStream.read(FileInputStream.java:255)
      at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:539)
      at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:144)
      at sun.security.provider.SecureRandom$SeederHolder.<clinit>(SecureRandom.java:203)
      at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:221)
      - locked <0x1af3> (a sun.security.provider.SecureRandom)
      at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
      at java.security.SecureRandom.next(SecureRandom.java:491)
      at java.util.Random.nextLong(Random.java:424)
      at com.taobao.notify.utils.StrongRandom.<init>(StrongRandom.java:45)
      at com.taobao.notify.utils.UniqId.<init>(UniqId.java:38)
      at com.taobao.notify.utils.UniqId.<clinit>(UniqId.java:36)
      at com.taobao.notify.client.impl.DefaultNotifyClient.<init>(DefaultNotifyClient.java:91)
      at com.taobao.notify.client.impl.DefaultNotifyClient.<init>(DefaultNotifyClient.java:97)
      at com.taobao.notify.client.NotifyClientFactory.<clinit>(NotifyClientFactory.java:13)
      at com.taobao.notify.remotingclient.DefaultNotifyManager.<init>(DefaultNotifyManager.java:38)
      at com.taobao.notify.remotingclient.NotifyManagerBean.init(NotifyManagerBean.java:90)
      - locked <0x1a78> (a com.alipay.common.event.tbnotify.adapter.UniformEventPublisherAdapter)
      at com.alipay.common.event.tbnotify.client.NotifyManagerBeanEx.init(NotifyManagerBeanEx.java:64)
      at com.alipay.common.event.tbnotify.adapter.EventSendFacadeTBNotifyImpl.init(EventSendFacadeTBNotifyImpl.java:99)
      at com.alipay.common.event.tbnotify.client.NotifyManagerBeanEx.init(NotifyManagerBeanEx.java:84)
      at com.alipay.common.event.tbnotify.adapter.UniformEventPublisherAdapter.init(UniformEventPublisherAdapter.java:80)
      at com.alipay.boot.dms.client.util.InitializeUtil.initializePublisher(InitializeUtil.java:14)
      at com.alipay.boot.dms.spring.factory.DmsPublisherFactoryBean.afterPropertiesSet(DmsPublisherFactoryBean.java:59)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1642)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1579)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:553)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      - locked <0x1af4> (a java.util.concurrent.ConcurrentHashMap)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
      at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:351)
      at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:108)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1486)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1231)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:551)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.autowireResource(CommonAnnotationBeanPostProcessor.java:522)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.getResource(CommonAnnotationBeanPostProcessor.java:496)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor$ResourceElement.getResourceToInject(CommonAnnotationBeanPostProcessor.java:627)
      at org.springframework.beans.factory.annotation.InjectionMetadata$InjectedElement.inject(InjectionMetadata.java:169)
      at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:88)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.postProcessPropertyValues(CommonAnnotationBeanPostProcessor.java:318)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1219)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:551)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
      at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:207)
      at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1131)
      at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1059)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.autowireResource(CommonAnnotationBeanPostProcessor.java:518)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.getResource(CommonAnnotationBeanPostProcessor.java:496)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor$ResourceElement.getResourceToInject(CommonAnnotationBeanPostProcessor.java:627)
      at org.springframework.beans.factory.annotation.InjectionMetadata$InjectedElement.inject(InjectionMetadata.java:169)
      at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:88)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.postProcessPropertyValues(CommonAnnotationBeanPostProcessor.java:318)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1219)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:551)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
      at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:754)
      at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:866)
      at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542)
      - locked <0x1af5> (a java.lang.Object)
      at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:122)
      at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:761)
      at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:371)
      at org.springframework.boot.SpringApplication.run(SpringApplication.java:315)
      at org.springframework.boot.SpringApplication.run(SpringApplication.java:1186)
      at org.springframework.boot.SpringApplication.run(SpringApplication.java:1175)
      at com.xx.SOFABootSpringApplication.main(SOFABootSpringApplication.java:15)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
      at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
      at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
      at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:58)

  而根據堆棧進行的大概猜測,開始排查,過程以下:

    // 初始代碼位置:
    // com.taobao.notify.remotingclient.NotifyManagerBean
    public synchronized void init() {
        if (inited.get()) {
            return;
        }

        if (StringUtil.isEmpty(this.name) || StringUtil.isEmpty(this.description)) {
            throw new RuntimeException("The name and description property is required.");
        }
        if (null == notifyManager) {

            AllNotifyClientProperties properties = new AllNotifyClientProperties();
            properties.setCheckMessageTPConfig(checkMessageTPConfig);
            properties.setMessageTPConfig(messageTPConfig);
            properties.setMessageProperties(messageProperties);
            properties.setAntVipConfig(antVipConfig);
            
            // 問題出在這裏,new 這個對象時,會觸發不少的其餘類加載,而這裏面就包含了有問題的類
            notifyManager = new DefaultNotifyManager(groupId, appName, name, description,
                messageListener, checkMessageListener, 1, enableCrc, properties);

            notifyManager.subscribeAllMessages(this.subscribeMessages, this.bindingList);

            /** ������ظ���groupId������ɾ��֮ǰ��publish topic*/
            //notifyManager.resetPublishTopics(publishTopics);
            for (String publishTopic : publishTopics) {
                notifyManager.addPublishTopic(publishTopic);
            }

            // ��ʼ��ʱ����servertag��url�Ķ�Ӧ�б�
            notifyManager.resetServerTagUrls();
        }

        inited.set(true);
    }
    
    // com.taobao.notify.remotingclient.DefaultNotifyManager  這個構造方法是 ok 的
    public DefaultNotifyManager(String groupId, String appName, String name, String description,
                                MessageListener msgListener, CheckMessageListener checkMsgListener,
                                int connCount, boolean enableCrc,
                                AllNotifyClientProperties properties) {
        initNotifyClient(properties);

        if (StringUtil.isBlank(groupId)) {
            throw new IllegalArgumentException("GroupID cannot be blank.");
        }
        if (StringTools.containsWhitespace(groupId)) {
            throw new IllegalArgumentException("GroupID cannot contain the blank character. ["
                                               + groupId + "]");
        }
        this.groupId = groupId.trim();
        this.appName = StringUtil.trim(appName);

        if (null == properties) {
            properties = new AllNotifyClientProperties();
        }
        this.notifyclient.addGroup(this.groupId, this.appName, name, description, msgListener,
            checkMsgListener, properties.getMessageProperties(), properties.getWaitForConnTime(),
            enableCrc, properties.getAntVipConfig());
        this.notifyclient.setConnectionCount(connCount);
    }
    // 可是問題在靜態字段的初始化問題 
    // com.taobao.notify.remotingclient.DefaultNotifyManager
    public class DefaultNotifyManager implements NotifyManager, DefaultNotifyManagerMBean {
        //��־
        private static final Logger logger                   = MsgBrokerClientLoggerFactory
                                                                .getLogger(DefaultNotifyManager.class);

        // 這看起來像是個單例的 client
        private NotifyClient        notifyclient             = NotifyClientFactory
                                                               .getSingletonNotifyClient();
    }
    // com.taobao.notify.client.NotifyClientFactory
    public class NotifyClientFactory implements FactoryBean {
    
        // 初始化類時,會先初始化一個單例
        static NotifyClient notifyClient = new DefaultNotifyClient();

        /**
         * 
         * @return
         */
        public static NotifyClient getSingletonNotifyClient() {
            return notifyClient;
        }
    }
    // 到此,一切看起來都很美好,讓我繼續往下看
    // com.taobao.notify.client.impl.DefaultNotifyClient
    public class DefaultNotifyClient implements NotifyClient {
        private static final String               LogPrefix                 = LoggerPrefix
                                                                                .makeLogPrefix(DefaultNotifyClient.class);
        static final Logger                       logger                    = MsgBrokerClientLoggerFactory
                                                                                .getLogger(DefaultNotifyClient.class);

        private volatile NotifyClientConfig       notifyClientConfig        = null;

        private static final int                  MAX_TIMES                 = 3;

        protected final NotifyGroupManager        notifyGroupManager        = new NotifyGroupManager();

        private final RemotingService             remotingService;

        protected final ClientSubscriptionManager clientSubscriptionManager = new ClientSubscriptionManager();

        protected final PublishTopicsManager      publishTopicsManager      = new PublishTopicsManager();

        private final ThreadPoolExecutor          asynSendMessageWorkTP;

        private final LoggingService              loggingService            = LoggingService
                                                                                .getInstance();

        private final ReliableAsynSendManager     reliableAsynSendManager;

        private final MessageProperties           reliableMessageProperties = new MessageProperties();

        // 問題在於 UniqId 的生成
        private final UniqId                      uniqIdInstance            = UniqId.getInstance();

        /**
         * 
         */
        public DefaultNotifyClient() {
            this(new NotifyClientConfig());
        }

        /**
         * @param notifyClientConfig
         */
        public DefaultNotifyClient(final NotifyClientConfig notifyClientConfig) {
            if (null == this.notifyClientConfig) {
                this.notifyClientConfig = notifyClientConfig;
            }
            if (notifyClientConfig.isDebug()) {
                this.remotingService = new IntegratedMockRemotingService(
                    this.clientSubscriptionManager, notifyClientConfig.getDebugRemoteSubscribers(),
                    this.notifyGroupManager, notifyClientConfig.getDebugLocalPort());
                logger.warn(LogPrefix + "Note that, this is a MsgbrokerClient in the DEBUG mode.");
                logger.warn(LogPrefix + "In the DEBUG mode, local port ["
                            + notifyClientConfig.getDebugLocalPort() + "]");
                logger.warn(LogPrefix + "In the DEBUG mode, connections ["
                            + notifyClientConfig.getDebugRemoteSubscribers() + "]");
            } else {
                this.remotingService = new DefaultRemotingService(
                    this.notifyClientConfig.getRemotingType(), this.notifyGroupManager,
                    notifyClientConfig);
            }

            // ��ֹѹ��
            this.reliableMessageProperties.setCompressSize(Integer.MAX_VALUE);
            this.reliableMessageProperties.setMaxStringMessageSize(Integer.MAX_VALUE);

            this.asynSendMessageWorkTP = new ManagedThreadPoolExecutor(notifyClientConfig
                .getAsynSendMessageTPConfig().getCorePoolSize(), notifyClientConfig
                .getAsynSendMessageTPConfig().getMaxPoolSize(), notifyClientConfig
                .getAsynSendMessageTPConfig().getKeepAliveTime(), notifyClientConfig
                .getAsynSendMessageTPConfig().getMaxQueueSize(), "asynSendMsgTP-" + this.hashCode(),
                new ThreadPoolExecutor.AbortPolicy());

            this.reliableAsynSendManager = new ReliableAsynSendManager(this, notifyGroupManager,
                notifyClientConfig);
            if (notifyClientConfig.isPreInitializeReliableAsynSendManager()) {
                this.reliableAsynSendManager.init();
            }

            // DefaultNotifyClientʵ�������ж������Ҫ���䲻ͬ��ID
            try {
                ThreadPoolConfig messageTPConfig = notifyClientConfig.getMessageTPConfig();
                Lookout.registry().info(
                    Lookout.registry().createId("msgbroker.client.group.infos")
                        .withTag("hashcode", String.valueOf(this.hashCode()))
                        .withTag("corePoolSize", String.valueOf(messageTPConfig.getCorePoolSize()))
                        .withTag("maxPoolSize", String.valueOf(messageTPConfig.getMaxPoolSize()))
                        .withTag("queueSize", String.valueOf(messageTPConfig.getMaxQueueSize()))
                        .withTag("channelSize", String.valueOf(messageTPConfig.getChannelSize())),
                    new Info<Set<String>>() {
                        public Set<String> value() {
                            return notifyGroupManager.getGroupIds();
                        }
                    });
            } catch (Exception e) {
                logger.warn("register msgbroker.client.group.infos failed, " + e.getMessage());
            }
        }
    }
    // UniqId 生成以下
    // com.taobao.notify.utils.UniqId
    public class UniqId {
        private static final Logger            log       = MsgBrokerClientLoggerFactory.getLogger(UniqId.class);
        private static final String            LogPrefix = LoggerPrefix.makeLogPrefix(UniqId.class);

        private static char[]                  digits    = { '0', '1', '2', '3', '4', '5', '6', '7',
                '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'  };

        private static Map<Character, Integer> rDigits   = new HashMap<Character, Integer>(16);
        static {
            for (int i = 0; i < digits.length; ++i) {
                rDigits.put(digits[i], i);
            }
        }

        private static UniqId                  me        = new UniqId();
        private String                         hostAddr;
        // 實際問題在於這裏了, StrongRandom 的問題
        private Random                         random    = new StrongRandom();
        private MessageDigest                  mHasher;
        private UniqTimer                      timer     = new UniqTimer();

        private ReentrantLock                  opLock    = new ReentrantLock();

        private UniqId() {
            try {
                InetAddress addr = InetAddress.getLocalHost();

                hostAddr = addr.getHostAddress();
            } catch (IOException e) {
                log.error(LogPrefix + "Get HostAddr Error", e);
                hostAddr = String.valueOf(System.currentTimeMillis());
            }

            if (Util.isBlank(hostAddr) || "127.0.0.1".equals(hostAddr)) {
                hostAddr = String.valueOf(System.currentTimeMillis());
            }

            if (log.isDebugEnabled()) {
                log.debug(LogPrefix + "hostAddr is:" + hostAddr);
            }

            try {
                mHasher = MessageDigest.getInstance("MD5");
            } catch (NoSuchAlgorithmException nex) {
                mHasher = null;
                log.error(LogPrefix + "new MD5 Hasher error", nex);
            }
        }

        /**
         * ��ȡUniqIDʵ��
         * 
         * @return UniqId
         */
        public static UniqId getInstance() {
            return me;
        }
    }
    
    // StrongRandom 生成緩慢
    // com.taobao.notify.utils.StrongRandom
    public StrongRandom() {
        try {
            // This operation may block on some systems with low entropy. See
            // this page
            // for workaround suggestions:
            // http://docs.codehaus.org.display.JETTY.Connectors+slow+to+startup
            random = SecureRandom.getInstance(SESSION_ID_RANDOM_ALGORITHM);
        } catch (NoSuchAlgorithmException e) {
            try {
                random = SecureRandom.getInstance(SESSION_ID_RANDOM_ALGORITHM_ALT);
                weakRandom = false;
            } catch (NoSuchAlgorithmException e_alt) {
                random = new Random();
                weakRandom = true;
            }
        }
        // 就是這裏緩慢了,卡住大約一分鐘,且沒法調試
        random.setSeed(random.nextLong() ^ System.currentTimeMillis() ^ hashCode()
                       ^ Runtime.getRuntime().freeMemory());
    }
    

  最後,定位到是 隨機數生成有問題了!

 

8. 定位到點後,再來思考解決好的解決方案?

  如上,咱們知道了是由於 SecureRandom 生成隨機數時出現了問題! 

  那麼,如何解決?我能想到的,就是網上搜索答案了。由於這時候已經沒有什麼能夠參考的了。

  搜索內容就是, SecureRandom 生成隨機緩慢解決方案? 而後就有答案了!

  固然,你能夠諮詢遇到過這些問題的前輩們,那樣更快速!

  緣由以下:

java生成隨機碼時,會使用兩個文件:
1. /dev/random , 隨機性高,和真實的物理環境有關,阻塞模式。(本文出現的啓動緩慢問題,就出在這個上面)
2. /dev/urandom ,僞隨機模式,非阻塞,隨機性不如 random。
因此,既然是生成random模式有問題,那麼想辦法換掉這個模式就ok了。

  jdk中,提供相應的替換方法,咱們先看 $JAVA_HOME/jre/lib/security/java.security 這個裏面有個隨機數的配置,其說明以下:

#
# Sun Provider SecureRandom seed source.
#
# Select the primary source of seed data for the "SHA1PRNG" and
# "NativePRNG" SecureRandom implementations in the "Sun" provider.
# (Other SecureRandom implementations might also use this property.)
#
# On Unix-like systems (for example, Solaris/Linux/MacOS), the
# "NativePRNG" and "SHA1PRNG" implementations obtains seed data from
# special device files such as file:/dev/random.
#
# On Windows systems, specifying the URLs "file:/dev/random" or
# "file:/dev/urandom" will enable the native Microsoft CryptoAPI seeding
# mechanism for SHA1PRNG.
#
# By default, an attempt is made to use the entropy gathering device
# specified by the "securerandom.source" Security property.  If an
# exception occurs while accessing the specified URL:
#
#     SHA1PRNG:
#         the traditional system/thread activity algorithm will be used.
#
#     NativePRNG:
#         a default value of /dev/random will be used.  If neither
#         are available, the implementation will be disabled.
#         "file" is the only currently supported protocol type.
#
# The entropy gathering device can also be specified with the System
# property "java.security.egd". For example:
#
#   % java -Djava.security.egd=file:/dev/random MainClass
#
# Specifying this System property will override the
# "securerandom.source" Security property.
#
# In addition, if "file:/dev/random" or "file:/dev/urandom" is
# specified, the "NativePRNG" implementation will be more preferred than
# SHA1PRNG in the Sun provider.
#
securerandom.source=file:/dev/random

  即 能夠直接改這個文件,也能夠經過命令行加參數修改,且命令行優先級更高,因此咱們使用命令行修改便可:

    -Djava.security.egd=file:/dev/./urandom # 在項目中添加java_opts,裏面添加啓動參數 urandom 

  如此,再次驗證後,啓動正常了。問題done。

 

9. 除了解決這個問題,咱們還能思考點什麼?

  關於 SecureRandom 爲何慢的緣由,能夠看下這篇文章:  https://blog.csdn.net.zengdeqing2012/article.details/78133370

 

老話:說了不必定有機會,但不說必定沒機會。

相關文章
相關標籤/搜索