BigPipe

時間 2019-11-30

標籤 bigpipe 简体版

原文原文鏈接

In the traditional model, the life cycle of a user request is the following:javascript

Browser sends an HTTP request to web server.
Web server parses the request, pulls data from storage tier then formulates an HTML document and sends it to the client in an HTTP response.
HTTP response is transferred over the Internet to browser.
Browser parses the response from web server, constructs a DOM tree representation of the HTML document, and downloads CSS and JavaScript resources referenced by the document.
After downloading CSS resources, browser parses them and applies them to the DOM tree.
After downloading JavaScript resources, browser parses and executes them.

BigPipe is a fundamental redesign of the dynamic web page serving system. The general idea is to decompose web pages into small chunks called pagelets, and pipeline them through several execution stages inside web servers and browsers. This is similar to the pipelining performed by most modern microprocessors: multiple instructions are pipelined through different execution units of the processor to achieve the best performance. Although BigPipe is a fundamental redesign of the existing web serving process, it does not require changing existing web browsers or servers; it is implemented entirely in PHP and JavaScript. BigPipe breaks the page generation process into several stages:css

Request parsing: web server parses and sanity checks the HTTP request.
Data fetching: web server fetches data from storage tier.
Markup generation: web server generates HTML markup for the response.
Network transport: the response is transferred from web server to browser.
CSS downloading: browser downloads CSS required by the page.
DOM tree construction and CSS styling: browser constructs DOM tree of the document, and then applies CSS rules on it.
JavaScript downloading: browser downloads JavaScript resources referenced by the page.
JavaScript execution: browser executes JavaScript code of the page.

The first three stages are executed by the web server, and the last four stages are executed by the browser. Each pagelet must go through all these stages sequentially, but BigPipe enables several pagelets to be executed simultaneously in different stageshtml

In BigPipe, the life cycle of a user request is the following: The browser sends an HTTP request to web server. After receiving the HTTP request and performing some sanity check on it, web server immediately sends back an unclosed HTML document that includes an HTML <head> tag and the first part of the <body> tag. The <head> tag includes BigPipe’s JavaScript library to interpret pagelet responses to be received later. In the <body> tag, there is a template that specifies the logical structure of page and the placeholders for pagelets.前端

After flushing the first response to the client, web server continues to generate pagelets one by one. As soon as a pagelet is generated, its response is flushed to the client immediately in a JSON-encoded object that includes all the CSS, JavaScript resources needed for the pagelet, and its HTML content, as well as some meta data. For example:java

1 <script type="text/javascript">
2 big_pipe.onPageletArrive({id: 「pagelet_composer」, content=<HTML>, css=[..], js=[..], …})
3 </script>

At the client side, upon receiving a pagelet response via 「onPageletArrive」 call, BigPipe’s JavaScript library first downloads its CSS resources; after the CSS resources are downloaded, BigPipe displays the pagelet by setting its corresponding placeholder div’s innerHTML to the pagelet’s HTML markup. Multiple pagelets’ CSS can be downloaded at the same time, and they can be displayed out-of-order depending on whose CSS download finishes earlier. In BigPipe, JavaScript resource is given lower priority than CSS and page content. Therefore, BigPipe won’t start downloading JavaScript for any pagelet until all pagelets in the page have been displayed. After that all pagelets’ JavaScript are downloaded asynchronously. Pagelet’s JavaScript initialization code would then be executed out-of-order depending on whose JavaScript download finishes earlier.web

It is worth noting that BigPipe was inspired by pipelining microprocessors. However, there are some differences between the pipelining performed by them. For example, although most stages in BigPipe can only operate on one pagelet at a time, some stages such as CSS downloading and JavaScript downloading can operate on multiple pagelets simultaneously, which is similar to superscalar microprocessors. Another important difference is that in BigPipe, we have implemented a ‘barrier’ concept borrowed from parallel programming, where all pagelets have to finish a particular stage, e.g. pagelet displaying stage, before any one of them can proceed further to download JavaScript and execute them. 編程

淘寶上也有一篇翻譯的文章，還有提到一些實現上的討論，不錯：json

1. 服務器端的並行化瀏覽器

理想狀況下，服務器端的實現是並行處理不一樣的pagelet 的內容，這樣能夠提高性能。服務器併發處理多個pagelet 的內容時，一個pagelet 內容生成好了，馬上將其flush 給瀏覽器。可是PHP 是不支持線程，因此服務器沒法利用多線程的概念去併發的加載多個pagelet 的內容。對於小型網站來講，使用串行的加載pagelet 的內容就已經能夠達到優化的要求了。對於大型網站，爲了達到更快的速度，服務器端能夠選擇併發的獨立不一樣的pagelet 的內容，具體實現有如下幾種方式：緩存

java 多線程。後臺邏輯使用java，可使用java 的多線程機制去同時加載不一樣的pagelet 的內容，加載完成後加頁面內容返回給瀏覽器。在最後的引用部分能夠看到網上用java多線程實現的例子。
使用PHP實現。PHP 不支持線程，沒法像java 使用多線程的機制來併發處理不一樣pagelet 的內容。可是，Facebook 和淘寶主搜索的業務邏輯是用PHP 實現的，因此咱們必須考慮如何在PHP下完成併發處理。PHP 擴展中有curl 模塊，能夠在該模塊中curl_multi_fetch()函數進行批處理請求，把原本應該串行的請求訪問併發的執行。

2. 直接調用flush函數輸出

到這裏，可能會有這樣的疑問，爲什服務器不直接把生成好的HTML 內容分部flush() 返回給客戶端，而是使用json 格式傳遞，而後用js 解析呢？這不是畫蛇添足麼？實際上，這也是目前主搜索前端使用的方法。咱們看看使用BigPipe方式的兩大好處：

(1) 若是直接調用flush()函數輸出html 源碼，當模塊較多的狀況，模塊間必須按順序加載，在html 前面的模塊必須先加載完，後面的才能加載，這樣也就沒辦法每一個模塊同時顯示一些內容。若是採用JS 的話，能夠前臺顯示多個loading，並且不須要關心到底哪一個模塊先加載完，這樣還能發揮後臺多線程處理數據的優點。

(2)使用JS 這種方式能夠是頁面結構更加清晰，管理更加方便。同時作到了頁面邏輯結構和數據解耦，首先返回的是頁面的結構，接着不斷地返回js腳本，而後動態添加頁面內容，而不是全部完整的html 源碼一塊兒輸出，增長了可維護性。

3. 訪問者是爬蟲或者訪問者瀏覽器禁止使用JS的狀況

咱們知道BigPipe 使用js 腳本加載頁面，那麼當用戶在瀏覽器裏設置禁止使用js 腳本（雖然人數不多），就會形成加載頁面失敗，這一樣是很是很差的用戶體驗。對搜索引擎的爬蟲來說，一樣會遇到相似的問題。解決辦法是當用戶發送訪問請求時，服務器端檢測user-agent 和客戶端是否支持js 腳本。若是user-agent 顯示是一個搜索引擎爬蟲或者客戶端不支持js，就不使用BigPipe ，而用原有的模式，從而解決問題。

4. 對SEO的影響

這是一個必須考慮的問題，現在是搜索引擎的時代，若是網頁對搜索引擎不友好，或者使搜索引擎很難識別內容，那麼會下降網頁在搜索引擎中的排名，直接減小網站的訪問次數。在BigPipe 中，頁面的內容都是動態添加的，因此可能會使搜索引擎沒法識別。可是正如前面所說，在服務器端首先要根據user-agent 判斷客戶端是不是搜索引擎的爬蟲，若是是的話，則轉化爲原有的模式，而不是動態添加。這樣就解決了對搜索引擎的不友好。

5. 融合其餘技術

除了使用BigPipe，Facebook的頁面加載技術還融合了其餘的頁面優化技術，具體以下：

5.1 資源文件的G-zip壓縮

這是很是重要的技術，使用G-zip 對css 和js 文件壓縮可使大小減小70%，這是多麼誘人的數字！在網絡傳輸的文件中，主要就是樣式表和腳本文件。如此能夠大大減少傳輸的內容，使頁面加載速度變得更快。具體實現能夠藉助服務器來進行，例如Apache，使用mod_deflate 模塊來完成具體配置爲： AddOutputFilterByType DEFLATE text/html text/css application/xjavascript

5.2 將js文件進行了精簡

對js 文件進行精簡，能夠從代碼中移除沒必要要的字符，註釋以及空行以減少js 文件的大小，從而改善加載的頁面的時間。精簡js 腳本的工具可使用JSMin，使用精簡後的腳本的大小會減小20%左右。這也是一個很大的提高。

5.3 將css和js文件進行合併

這是前端優化的一項原則，將多個樣式表和js 文件進行合併，這樣的話，將會減小http 的請求個數。對於上億用戶的網站來講，這也會帶來性能的提高，大約會減小5%左右的時間損耗。

5.4 使用外部JS和CSS

一樣是前端優化的一項原則。純粹就速度來言，使用內聯的js 和css 速度要更快，由於減小了http 請求。可是，使用外部的文件更有利於文件的複用，這與面向對象編程的概念很像。更爲重要的是，雖然在第一次的加載速度慢一點，但css 文件和js腳本是能夠被瀏覽器緩存。即以後用戶的屢次訪問中，使用外部的js 和css 將會將會更好的提高速度。

5.5 將樣式表放在頂部

和上面內容類似，這也是一種規範，將html 內容所需的css 文件放在首部加載是很是重要的。若是放在頁面尾部，雖然會使頁面內容更快的加載（由於將加載css 文件的時間放在最後，從而使頁面內容先顯示出來），可是這樣的內容是沒有使用樣式表的，在css 文件加載進來後，瀏覽器會對其使用樣式表，即再次改變頁面的內容和樣式，稱之爲「無樣式內容的閃爍」，這對於用戶來講固然是不友好的。實現的時候將css 文件放在<head>標籤中便可。

5.6 將腳本放在底部實現「barrier」

支持頁面動態內容的Js 腳本對於頁面的加載並無什麼做用，把它放在頂部加載只會使頁面更慢的加載，這點和前面的提到的css 文件恰好相反，因此能夠將它放在頁尾加載。是用戶能看到的頁面內容先加載，js 文件最後加載，這樣會使用戶以爲頁面速度更快。Bigpipe實現一個「barrier」的概念，即當全部的pagelet的內容所有加載好了以後，瀏覽器再向服務器發送js 的http 請求。能夠在BigPipe.js 中將全部的pagelet 所需的js文件的路徑保存下來，在判斷全部的內容加載完成後統一貫服務器發送請求。

相關標籤/搜索

bigpipe

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。