歡迎關注微信公衆號:石杉的架構筆記(id:shishan100)面試

個人新課**《C2C 電商系統微服務架構120天實戰訓練營》在公衆號儒猿技術窩**上線了,感興趣的同窗,能夠點擊下方連接瞭解詳情:算法
《C2C 電商系統微服務架構120天實戰訓練營》性能優化
「 這篇文章給你們聊一次線上生產系統事故的解決經歷,其背後表明的是線上生產系統的JVM FullGC可能引起的嚴重故障。微信
1、業務場景介紹
先簡單說說線上生產系統的一個背景,由於僅僅是文章做爲案例來說,因此弱化大量的業務背景。markdown
簡單來講,這是一套分佈式系統,系統A須要將一個很是核心以及關鍵的數據經過網絡請求,傳輸給另一個系統B。網絡
因此這裏其實就考慮到了一個問題,若是系統A剛剛將核心數據傳遞給了系統B,結果系統B莫名其妙宕機了,豈不是會致使數據丟失?架構
因此在這個分佈式系統的架構設計中,採起了很是經典的一個Quorum算法。併發
這個算法簡單來講,就是系統B必需要部署奇數個節點,好比說至少部署3臺機器,或者是5臺機器,7臺機器,相似這樣子。app
而後系統A每次傳輸一個數據給系統,都必需要對系統B部署的所有機器都發送請求,將一份數據傳輸給系統B部署的全部機器。異步
要斷定系統A對系統B的一次數據寫是成功的,要求系統A必須在指定時間範圍內對超過Quorum數量的系統B所在機器傳輸成功。
舉個例子,假設系統B部署了3臺機器,那麼他的Quorum數量就是:3 / 2 + 1 = 2,也就是說系統B的Quorum數量就是:全部機器數量 / 2 + 1。
因此係統A要斷定一個核心數據是否寫成功,若是系統B一共部署了3臺機器的話,那麼系統A必須在指定時間內收到2臺系統B所在機器返回的寫成功的響應。
此時系統A才能認爲這條數據對系統B是寫成功了。這個就是所謂的Quorum機制。
也就是說,分佈式架構下,系統之間傳輸數據,一個系統要確保本身給另一個系統傳輸的數據不會丟失,必需要在指定時間內,收到另一個系統Quorum(大多數)數量的機器響應說寫成功。
這套機制實際上在不少分佈式系統、中間件系統中都有很是普遍的使用,咱們線上的分佈式系統也是採用了這個Quorum機制在兩個系統之間傳輸數據。
給你們上一張圖,一塊兒來看一下這套架構長啥樣。
如上圖所示,圖中很清晰的展現了系統A和系統B之間傳輸一份數據時的Quorum機制。
接下來,咱們用代碼給你們展現一下,上面的Quorum寫機制在代碼層面大概是什麼樣子的。
PS:由於實際這套機制涉及大量的底層網絡傳輸、通訊、容錯、優化的東西,因此下面代碼通過了大幅度簡化,僅僅表達出了一個核心的意思。
上面就是通過大幅精簡後的代碼,不過核心的意思是表達清晰了。你們能夠仔細看兩遍,其實仍是很容易弄懂的。
這段代碼其實含義很簡單,說白了就是異步開啓線程發送數據給系統B全部的機器,同時進入一個while循環等待系統B的Quorum數量的機器返回響應結果。
若是超過指定超時時間還沒收到預期數量的機器返回結果,那麼就斷定系統B部署的集羣出現故障,接着讓系統A直接退出,至關於系統A宕機。
整個代碼,就是這麼個意思!
2、問題凸現
光是看代碼其實沒啥難的,可是問題就在於線上運行的時候,可不是跟你寫代碼的時候想的同樣簡單。
有一次線上生產系統運行的過程當中,總體系統負載都很平穩,原本是不該該有什麼問題,可是結果忽然收到報警,說系統A忽然宕機了。
而後就開始進行排查,左排查右排查,發現系統B集羣都好好的,不該該有問題。
而後再查查系統A,發現系統A別的地方也沒什麼問題。
最後結合系統A自身的日誌,以及系統A的JVM FullGC進行垃圾回收的日誌,咱們纔算是搞清楚了具體的故障緣由。
3、定位問題
其實緣由很是的簡單,就是系統A在線上運行一段時間後,會偶發性的進行長時間Stop the World的JVM FullGC,也就是大面積垃圾回收。
可是,此時會形成系統A內部的工做線程大量的卡頓,再也不工做。要等JVM FullGC結束以後,工做線程纔會恢復運做。
咱們來看下面那個代碼片斷:

可是這種系統A的莫名宕機是不正確的,由於若是沒有JVM FullGC,原本上面那個if語句是不會成立的。
他會停頓1秒鐘進入下一輪while循環,接着就能夠收到系統B返回的Quorum數量的結果,這個while循環就能夠中斷,繼續運行了。
結果由於出現了JVM FullGC卡頓了幾十秒,致使莫名其妙就觸發了if判斷的執行,系統A莫名其妙就退出宕機了。
因此,線上的JVM FullGC致使的系統長時間卡頓,真是形成系統不穩定運行的隱形殺手之一啊!
4、解決問題
至於上述代碼穩定性的優化,也很簡單。咱們只要在代碼里加入一些東西,監控一下上述代碼中是否發生了JVM FullGC。
若是發生了JVM FullGC,就自動延長expireTime就能夠了。
好比下面代碼的改進:
經過上述代碼的改進,就能夠有效的優化線上系統的穩定性,保證其在JVM FullGC發生的狀況下,也不會隨意出現異常宕機退出的狀況了。
END
若有收穫,請幫忙轉發,您的鼓勵是做者最大的動力,謝謝!
一大波微服務、分佈式、高併發、高可用的原創系列文章正在路上
歡迎掃描下方二維碼,持續關注:
石杉的架構筆記(id:shishan100)
十餘年BAT架構經驗傾囊相授
**> **推薦閱讀:** > > 一、[拜託!面試請不要再問我Spring Cloud底層原理](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5be13b83f265da6116393fc7) > > 二、[【雙11狂歡的背後】微服務註冊中心如何承載大型系統的千萬級訪問?](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5be3f8dcf265da613a5382ca) > > 三、[【性能優化之道】每秒上萬併發下的Spring Cloud參數優化實戰](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5be83e166fb9a049a7115580) > > 四、[微服務架構如何保障雙11狂歡下的99.99%高可用](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5be99a68e51d4511a8090440) > > 五、[兄弟,用大白話告訴你小白都能聽懂的Hadoop架構原理](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5beaf02ce51d457e90196069) > > 六、[大規模集羣下Hadoop NameNode如何承載每秒上千次的高併發訪問](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5bec278c5188253e64332c76) > > 七、【[性能優化的祕密】Hadoop如何將TB級大文件的上傳性能優化上百倍](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5bed82a9e51d450f9461cfc7) > > [八、](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Fjuejin.im%2525252525252Fpost%2525252525252F5bf2c6b6e51d456693549af4)[拜託,面試請不要再問我TCC分佈式事務的實現原理坑爹呀!](https://juejin.cn/post/6844903716089233416) [](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Fjuejin.im%2525252525252Fpost%2525252525252F5bf2c6b6e51d456693549af4) > > 九、[【坑爹呀!】最終一致性分佈式事務如何保障實際生產中99.99%高可用?](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Fjuejin.im%2525252525252Fpost%2525252525252F5bf2c6b6e51d456693549af4) > > 十、[拜託,面試請不要再問我Redis分佈式鎖的實現原理!](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Fjuejin.im%2525252525252Fpost%2525252525252F5bf3f15851882526a643e207) > > **十一、****[【眼前一亮!】看Hadoop底層算法如何優雅的將大規模集羣性能提高10倍以上?](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Fjuejin.im%2525252525252Fpost%2525252525252F5bf5396f51882509a768067e)** > > **十二、****[億級流量系統架構之如何支撐百億級數據的存儲與計算](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Fjuejin.im%25252525252Fpost%25252525252F5bfab59fe51d4551584c7bcf)** > > 1三、[億級流量系統架構之如何設計高容錯分佈式計算系統](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Fjuejin.im%252525252Fpost%252525252F5bfbeeb9f265da61407e9679) > > 1四、[億級流量系統架構之如何設計承載百億流量的高性能架構](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Fjuejin.im%2525252Fpost%2525252F5bfd2df1e51d4574b133dd3a) > > 1五、[億級流量系統架構之如何設計每秒十萬查詢的高併發架構](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Fjuejin.im%25252Fpost%25252F5bfe771251882509a7681b3a) > > 1六、[億級流量系統架構之如何設計全鏈路99.99%高可用架構](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Fjuejin.im%252Fpost%252F5bffab686fb9a04a102f0022) > > 1七、[七張圖完全講清楚ZooKeeper分佈式鎖的實現原理](https://link.juejin.im/?target=https%3A%2F%2Fjuejin.im%2Fpost%2F5c01532ef265da61362232ed) > > 1八、[大白話聊聊Java併發面試問題之volatile究竟是什麼?](https://juejin.cn/post/6844903730303746061) > > 1九、[大白話聊聊Java併發面試問題之Java 8如何優化CAS性能?](https://juejin.cn/post/6844903731234865160) > > 20、[大白話聊聊Java併發面試問題之談談你對AQS的理解?](https://juejin.cn/post/6844903732061159437) > > 2一、[大白話聊聊Java併發面試問題之公平鎖與非公平鎖是啥?](https://juejin.cn/post/6844903732883226637) > > 2二、[大白話聊聊Java併發面試問題之微服務註冊中心的讀寫鎖優化](https://juejin.cn/post/6844903734267510798) > > 2三、[互聯網公司的面試官是如何360°無死角考察候選人的?(上篇)](https://juejin.cn/post/6844903734930046989) > > 2四、[互聯網公司面試官是如何360°無死角考察候選人的?(下篇)](https://juejin.cn/post/6844903735655661581) > > 2五、[Java進階面試系列之一:哥們,大家的系統架構中爲何要引入消息中間件?](https://juejin.cn/post/6844903736444207117) > > 2六、[【Java進階面試系列之二】:哥們,那你說說系統架構引入消息中間件有什麼缺點?](https://juejin.cn/post/6844903737123667975) > > 2七、[【行走的Offer收割機】記一位朋友斬獲BAT技術專家Offer的面試經歷](https://juejin.cn/post/6844903741213130765) > > 2八、[【Java進階面試系列之三】哥們,消息中間件在大家項目裏是如何落地的?](https://juejin.cn/post/6844903742114906125) > > 2九、[【Java進階面試系列之四】扎心!線上服務宕機時,如何保證數據100%不丟失?](https://juejin.cn/post/6844903742928601095)**