Oracle JDK7 bug 發現、分析與解決實戰

本文首發於 vivo互聯網技術 微信公衆號 
連接: https://mp.weixin.qq.com/s/8f34CaTp--Wz5pTHKA0Xeg
做者:vivo 官網商城開發團隊

衆所周知,Oracle JDK  是 Java 語言的絕對權威,不少時候 JDK 與 Java 語言近似一個概念。但咱們始終要保持實事求是的精神,勇於質疑。本文記錄了一次線上troubleshoot 實戰,包含問題分析、解決並提交 Oracle JDK bug 的核心過程。java

1、背景現象 

總之 就是某系統上線後 CLOSE_WAIT數量隨着時間增長而大量增長,持續觸發多個告警。安全

2、分析定位過程

部署了一個節點,用來複現以前出現的問題。服務器

Step1 問題聚焦微信

先查看究竟是哪些IP之間的鏈接產生了大量CLOSE_WAIT,另外系統還會涉及調第三方,總之要確認鏈接創建的雙方。網絡

執行命令:   oracle

netstat -np | grep tcp|grep "CLOSE_WAIT"

結果: app

(ps:xxx、yyy、zzz 均無含義,基於信息安全考慮,屏蔽掉 ip)。運維

tcp     3547      0 10.107.17.xxx:34602         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:59088         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:58028         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:51962         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp     3563      0 10.107.17.xxx:46962         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:34608         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:46496         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
           
tcp       38      0 10.107.17.xxx:50774         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:59904         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:40208         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:41064         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:36994         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
  
tcp     3547      0 10.107.17.xxx:45080         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp     6235      0 10.107.17.xxx:60966         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:56178         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp     3547      0 10.107.17.xxx:39922         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:43270         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:40926         zzz.202.32.242:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:44472         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp     2891      0 10.107.17.xxx:43036         zzz.202.32.241:443          CLOSE_WAIT  19819/java         
........
........
 
tcp       38      0 10.107.17.xxx:33472         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:51976         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:57788         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:35638         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:43778         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:46418         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:49914         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:49258         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:48718         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:51480         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:59816         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:49266         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:50246         yyy.12.230.115:443          CLOSE_WAIT  19819/java         
tcp       38      0 10.107.17.xxx:39324         yyy.12.230.115:443          CLOSE_WAIT  19819/java

總之: dom

yyy.12.230.115
zzz.202.32.241
zzz.202.32.241

這個三個IP是導火索。tcp

Step2 問題分析

這三個IP具體是誰?具體是請求了哪一個接口?

暫時沒法直接獲知!最直接的導火索暫時斷了線索。接着從側面開始查看更多信息,

  • JVM信息

    外部資源、線程 什麼的都看了,未發現明顯異常

  • 抓包

    要抓包獲取更多線索了。對於好久沒有碰過TCP層,有些吃力。

獲得線索:發現大量的RST

那麼是什麼操做會致使CLOSE_WAIT呢?什麼樣的鏈接致使大量RST呢(可參考RST一般緣由)? 

Step3 代碼分析定位

運維大佬的協助查詢,得知這三個IP是圖片CDN服務。

至此,能夠定位到具體代碼邏輯,圖片CDN請求能夠排查代碼。

仔細分析這部分源碼後,推測由於服務器 發起 URL請求,請求不存在,致使拋出異常,可是JDK中卻沒有地方關閉Socket。

javax.imageio.read(URL)

/**
   * Returns a <code>BufferedImage</code> as the result of decoding
   * a supplied <code>URL</code> with an <code>ImageReader</code>
   * chosen automatically from among those currently registered.  An
   * <code>InputStream</code> is obtained from the <code>URL</code>,
   * which is wrapped in an <code>ImageInputStream</code>.  If no
   * registered <code>ImageReader</code> claims to be able to read
   * the resulting stream, <code>null</code> is returned.
   *
   * <p> The current cache settings from <code>getUseCache</code>and
   * <code>getCacheDirectory</code> will be used to control caching in the
   * <code>ImageInputStream</code> that is created.
   *
   * <p> This method does not attempt to locate
   * <code>ImageReader</code>s that can read directly from a
   * <code>URL</code>; that may be accomplished using
   * <code>IIORegistry</code> and <code>ImageReaderSpi</code>.
   *
   * @param input a <code>URL</code> to read from.
   *
   * @return a <code>BufferedImage</code> containing the decoded
   * contents of the input, or <code>null</code>.
   *
   * @exception IllegalArgumentException if <code>input</code> is
   * <code>null</code>.
   * @exception IOException if an error occurs during reading.
   */ 
public static BufferedImage read(URL input) throws IOException {
      if (input == null) {
          throw new IllegalArgumentException("input == null!");
      }
 
      InputStream istream = null;
      try {
       //此處,創建TCP鏈接!而且直接獲取流,由於流數據不存在,進入cache塊,拋出!
          istream = input.openStream();
      } catch (IOException e) {
          throw new IIOException("Can't get input stream from URL!", e);
      }
      ImageInputStream stream = createImageInputStream(istream);
      BufferedImage bi;
      try {
          bi = read(stream);
          if (bi == null) {
              stream.close();
          }
      } finally {
          istream.close();
      }
      return bi;
  }

能夠看到JDK並無關閉 ImageIO.read(url) 代碼中封裝的Socket鏈接!CDN會請求超時關閉致使服務器處於CLOSE_WAIT?限於網絡經驗有限,並不能100%確認個人想法。因此模擬下吧。

Step4  復現與模擬

根據系統業務源碼,快速模擬:

public static void main(String[] args) throws InterruptedException {
 
    ExecutorService ex = Executors.newFixedThreadPool(100);
    for (int i = 0; i < 5000; i++) {
        ex.execute(task());
    }
}
 
/**
 * @throws IOException
 * @throws MalformedURLException
 */
private static Runnable task() {
 
    return new Runnable() {
 
        @Override
        public void run() {
            // domain must exists,but file doesnot.
            String vivofsUrl = "https://vivobbs.xx.yy.zz/wiwNWYCFW9ieGbWq/20181129/3a2adfde12cd328d81f965088890eeffff.jpg";
 
            File file = null;
 
            BufferedImage image = null;
 
            try {
                file = File.createTempFile("abc", "jpg");
 
                URL url1 = new URL(vivofsUrl);
                image = ImageIO.read(url1);
 
            } catch (Throwable e) {
                e.printStackTrace();
            } finally {
                if (null != file) {
                    file.delete();
                }
                if (null != image) {
                    image.flush();
                    image = null;
                }
 
            }
        }
    };
}

抓包

TCP查看

問題復現!

Step5 溝通後提報bug

report 給Oracle。

3、Oracle溝通

提單以後,Oracle跟我聯繫溝通。截取部分郵件內容,僅供參考。

已被採納

4、疑點與不足

TCP狀態機的流轉不夠熟悉透徹。致使一些問題不能從TCP狀態機分析推理,知識的全面精通須要不斷提升。

更多內容敬請關注 vivo 互聯網技術 微信公衆號

注:轉載文章請先與微信號:Labs2020 聯繫。

相關文章
相關標籤/搜索