High Performance Networking in Google Chrome 進程間通信(IPC) 多進程資源加載

小結:javascript

一、php

小文件存儲於一個文件中;html

 

在內部,磁盤緩存(disk cache)實現了它本身的一組數據結構, 它們被存儲在一個單獨的緩存目錄裏。其中有索引文件(在瀏覽器啓動時加載到內存中),數據文件(存儲着實際數據,以及HTTP頭以及其它信息)。比較有趣 的是,16KB如下的文件存儲於共同的數據塊文件中(data block-files,即小文件集中存儲於一個大文件中),其它較大的文件纔會存儲到本身專屬的文件中。最後,磁盤緩存的淘汰策略是維護一個LRU,通 過好比訪問頻率和資源的使用時間(age)的度量進行管理。html5

 二、java

進程間通信(IPC)和多進程資源加載

渲染進程和內核進程之間的通信是經過IPC完成的。在Linux和 Mac OS上,使用了一個提供異步命名管道通信方式的socketpair()。每個渲染進程的消息會被序列化地到一個專用的I/O線程中,而後再由它發到內 核進程。在接收端,內核進程提供一個過濾接口(filter interface)用於解析資源相關的IPC請求(ResourceMessageFilter), 這部分就是網絡模塊負責的。node

 

https://mp.weixin.qq.com/s/tGc9rWRTszbL_eCNYXxXPgandroid

https://qianduan.group/posts/5a0af34941a4410ebdd6df2fios

https://www.igvita.com/posa/high-performance-networking-in-google-chrome/c++

 

Google Chrome 中的高性能網絡

Horky譯 架構文摘 2018-02-08
 
 

High Performance Networking in Google Chrome

The following is a  draft chapter for the upcoming "The Performance of Open Source Applications" (POSA), a sequel to  The Architecture of Open Source Applications. POSA is a collection of essays about performance optimization, designing for performance, managing performance as part of a development process, and more.  The book will be published in Spring 2013 under a Creative Commons license with royalties going to Amnesty International.


History and guiding principles of Google Chrome #

Google Chrome was first released in the second half of 2008, as a beta version for the Windows platform. The Google-authored code powering Chrome was also made available under a permissive BSD license - aka, the Chromium project. To many observers, this turn of events came as a surprise: the return of the browser wars? Could Google really do much better?

"It was so good that it essentially forced me to change my mind..." - Eric Schmidt, on his  initial reluctanceto the idea of developing Google Chrome.

Turns out, they could. Today Chrome is one of the most widely used browsers on the web (35%+ of the market share according to StatCounter) and is now available on Windows, Linux, OS X, Chrome OS, as well as Android and iOS platforms. Clearly, the features and the functionality resonated with the users, and many innovations of Chrome have also found their way into other popular browsers.

The original 38-page comic book explanation of the ideas and innovations of Google Chrome offers a great overview of the thinking and design process behind the popular browser. However, this was only the beginning. The core principles that motivated the original development of the browser continue to be the guiding principles for ongoing improvements in Chrome:

  • Speed: the objective is to make the fastest browser
  • Security: provide the most secure environment to the user
  • Stability: provide a resilient and stable web application platform
  • Simplicity: sophisticated technology, wrapped in a simple user experience

As the team observed, many of the sites we use today aren't just web pages, they are applications. In turn, the ever more ambitious applications require speed, security, and stability. Each of these deserves its own dedicated chapter, but since our subject is performance, our focus will be primarily on speed.

The many facets of performance #

A modern browser is a platform, just like your operating system, and Google Chrome is designed as such. Prior to Google Chrome, all major browsers were built as a monolithic, single process applications. All open pages shared the same address space and contended for the same resources. A bug in any page, or the browser, ran the risk of compromising the entire experience.

By contrast, Chrome works on a multi-process model, which provides process and memory isolation, and a tight security sandboxfor each tab. In an increasingly multi-core world, the ability to isolate the processes as well as shield each open tab from other misbehaving pages has by itself proven to give Chrome a significant performance edge over the competition. In fact, it is important to note that most other browsers have followed suit, or are in the process of migrating to similar architecture.

With an allocated process in place, the execution of a web program primarily involves three tasks: fetching resources, page layout and rendering, and JavaScript execution. The rendering and script steps follow a single-threaded, interleaved model of execution - it is not possible to perform concurrent modifications of the resulting Document Object Model (DOM). This is in part due to the fact that JavaScript itself is a single threaded language. Hence, optimizing how the rendering and script execution runtimes work together is of critical importance, both to the web developers building the applications as well as the developers working on the browser.

For rendering, Chrome uses Blink, which is a fast, open-source, and standards compliant layout engine. For JavaScript, Chrome ships with its own, heavily optimized "V8" JavaScript runtime, which was also released as a standalone open-source project and has found its way into many other popular projects - e.g., runtime for node.js. However, optimizing V8 JavaScript execution, or the Blink parsing and rendering pipelines won't do much good if the browser is blocked on the network, waiting for the resources to arrive!

The ability of the browser to optimize the order, priority, and latency of each network resource is one of the most critical contributors to the overall user experience. You may not be aware of it, but Chrome's network stack is, quite literally, getting smarter every day, trying to hide or decrease the latency cost of each resource: it learns likely DNS lookups, it remembers the topology of the web, it preconnects to likely destination targets, and more. From the outside, it presents itself as a simple resource fetching mechanism, but from the inside it is an elaborate and a fascinating case study for how to optimize web performance and deliver the best experience to the user.

Let's dive in...

What is a modern web application? #

Before we get to the tactical details of how to optimize our interaction with the network, it helps to understand the trends and the landscape of the problem we are up against. In other words, what does a modern web page, or application look like?

The HTTP Archive project tracks how the web is built, and it can help us answer this question. Instead of crawling the web for the content, it periodically crawls the most popular sites to record and aggregate analytics on the number of used resources, content types, headers, and other metadata for each individual destination. The stats, as of January 2013, may surprise you. An average page, amongst the top 300,000 destinations on the web is:

  • 1280 KB in size
  • composed of 88 resources
  • connects to 15+ distinct hosts

Let that sink in. Over 1MB in size on average, composed of 88 resources such as images, JavaScript, and CSS, and delivered from 15 different own and third-party hosts! Further, each of these numbers have been steadily increasingover the past few years, and there are no signs of stopping. We are increasingly building larger and more ambitious web applications.

Applying basic math to the HTTP Archive numbers reveals that an average resource is about 12KB in size (1045 KB / 84 resources), which means that most network transfers in the browser are short and bursty. This presents its own set of complications because the underlying transport (TCP) is optimized for large, streaming downloads. Let's peel back the onion and inspect one of these network requests...

The life of a resource request on the wire #

The W3C Navigation Timing specification provides a browser API and visibility into the timing and performance data behind the life of every request in the browser. Let's inspect the components, as each is a critical piece of delivering the optimal user experience:

Given the URL of a resource on the web, the browser starts by checking its local and application caches. If you have previously fetched the resource and the appropriate cache headers were provided (ExpiresCache-Control, etc.), then it is possible that we may be allowed to use the local copy to fulfill the request - the fastest request is a request not made. Alternatively, if we have to revalidate the resource, if it expired, or if we simply haven't seen it before, then a costly network request must be dispatched.

Given a hostname and resource path, Chrome first checks for existing open connections it is allowed to reuse - sockets are pooled by {scheme, host, port}. Alternatively, if you have configured a proxy, or specified a proxy auto-config (PAC) script, then Chrome checks for connections through the appropriate proxy. PAC scripts allow for different proxies based on URL, or other specified rules, each of which can have its own socket pool. Finally, if neither of the above conditions is matched, then the request must begin by resolving the hostname to its IP address - aka, a DNS lookup.

If we are lucky, the hostname may already be cached in which case the response is usually just one quick system call away. If not, then a DNS query must be dispatched before any other work can happen. The time taken to do the DNS lookup will vary based on your internet provider, the popularity of the site and the likelihood of the hostname to be in intermediate caches, as well as the response time of the authoritative servers for that domain. In other words, there are a lot of variables at play, but it's not unusual for a DNS lookup to take up to several hundred milliseconds - ouch.

With the resolved IP address in hand, Chrome can now open a new TCP connection to the destination, which means that we must perform the "three-way handshake": SYN > SYN-ACK > ACK. This exchange adds a full roundtrip of latency delay to each and every new TCP connection - no shortcuts. Depending on the distance between the client and the server, as well as the chosen routing path, this can yield from tens to hundreds, or even thousands, of milliseconds of delay. All of this work and latency is before even a single byte of application data has hit the wire!

Once the TCP handshake is complete, and if we're connecting to a secure destination (HTTPS), then the SSL handshake must take place. This can add up to two additional roundtrips of latency delay between client and server. If the SSL session is cached, then we can "escape" with just one additional roundtrip.

Finally, Chrome is able to dispatch the HTTP request (requestStart in the Nav Timing figure above). Once received, the server can process the request and then stream the response data back to the client. This incurs a minimum of a network roundtrip, plus the processing time on the server. Following that, we're done. Well, that is unless the actual response is an HTTP redirect! In which case, we may have to repeat the entire cycle once over. Have a few gratuitous redirects on your pages? You may want to revisit that decision!

Have you been counting all the delays? To illustrate the problem, let's assume the worst case scenario for a typical broadband connection: local cache miss, followed by a relatively fast DNS lookup (50 ms), TCP handshake, SSL negotiation, and a relatively fast (100 ms) server response time, with a round-trip time of 80 ms (an average round-trip across continental USA):

  • 50ms for DNS
  • 80ms for TCP handshake (one RTT)
  • 160ms for SSL handshake (two RTT's)
  • 40ms for request to server
  • 100ms for server processing
  • 40ms for response from the server

That's 470 milliseconds for a single request, which translates to over 80% of network latency overhead as compared to the actual server processing time to fulfill the request - we have some work to do here! In fact, even 470 milliseconds may be an optimistic estimate:

  • If the server response does not fit into the initial TCP congestion window (4-15 KB), then one or more additional roundtrips of latency is introduced
  • SSL delays could get even worse if we need to fetch a missing certificate or perform an online certificate status check (OCSP), both of which will require an entirely new TCP connection, which can add hundreds and even thousands of milliseconds of additional latency

What is "fast enough"? #

The network overhead of DNS, handshakes, and the roundtrip times is what dominates the total time in our earlier case - the server response time accounts for only 20% of the total latency! But, in the grand scheme of things, do these delays even matter? If you are reading this, then you probably already know the answer: yes, very much so.

Past user experience research paints a consistent picture in what we, as users, expect in terms of responsiveness of any applications, both offline and online:

Delay User Reaction
0 - 100ms Instant
100 - 300ms Small perceptible delay
300 - 1000ms Machine is working
1s+ Mental context switch
10s+ I'll come back later...

The table above also explains the unofficial rule of thumb in the web performance community: render your pages, or at the very least, provide visual feedback in under 250 ms to keep the user engaged. This is not speed simply for speed's sake. Studies at Google, Amazon, Microsoft, as well as thousands of other sites show that additional latency has a direct impact on the bottom line of your site: faster sites yield more pageviews, higher engagement from the users, and see higher conversion rates.

So, there you have it, our optimal latency budget is 250 ms, and yet as we saw in the example above, the combination of a DNS lookup, the TCP and SSL handshakes, and propagation times for the request add up to 370 ms. We're 50% over budget, and we still haven't factored in the server processing time!

To most users and even web-developers, the DNS, TCP, and SSL delays are entirely transparent and are negotiated at network layers to which few of us descend or think about. And yet, each of these steps is critical to the overall user experience, since each extra network request can add tens or hundreds of milliseconds of latency. This is the reason why Chrome's network stack is much, much more than a simple socket handler.

Now that we've identified the problem, let's dive into the implementation details...

Chrome's network stack from 10,000 feet #

Multi-process architecture #

Chrome's multi-process architecture carries important implications for how each network request is handled within the browser. Under the hood, Chrome actually supports four different execution models that determine how the process allocation is performed.

By default, desktop Chrome browsers use the process-per-site model, that isolates different sites from each other, but groups all instances of the same site into the same process. However, to keep things simple, let's assume one of the simplest cases: one distinct process for each open tab. From the network performance perspective, the differences here are not substantial, but the process-per-tab model is much easier to understand.

The architecture dedicates one render process to each tab, which itself contains an instance of the Blink open-source layout engine for interpreting and layout out the HTML (aka, "HTML Renderer" in the diagram), an instance of the V8 JavaScript engine, and the glue code to bridge these and a few other components. If you are curious, the Chromium wiki contains a great introduction to the plumbing.

Each of these "render" processes is executed within a sandboxed environment that has limited access to the user's computer - including the network. To gain access to these resources, each render process communicates with the browser (kernel) process, which is able to impose security and access policies on each renderer.

Inter-process communication (IPC) and Multi-process resource loading #

All communication between the renderer and the kernel process in Chrome is done via IPC. On Linux and OSX, a socketpair() is used, which provides a named pipe transport for asynchronous communication. Each message from the renderer is serialized and passed to a dedicated I/O thread, which dispatches it to the kernel process. On the receiving end, the kernel process provides a filter interface, which allows Chrome to intercept resource IPC requests (see ResourceMessageFilter) which should be handled by the network stack.

One of the advantages of this architecture is that all resource requests are handled entirely on the I/O threads and neither any UI generated activity, or network events interfere with each other. The resource filter runs in the I/O thread in the browser process, intercepts the resource request messages, and forwards them to a ResourceDispatcherHost singleton in the browser process.

The singleton interface allows the browser to control each renderer's access to the network, but it also enables efficient, and consistent resource sharing:

  • Socket pool and connection limits: the browser is able to enforce limits on the number of open sockets per profile (256), proxy (32), and {scheme, host, port} (6) groups. Note that this allows up to six HTTP and six HTTPS connections to the same {host, port}!
  • Socket reuse: persistent TCP connections are retained in the socket pool for some time after servicing the request to enable connection reuse, which avoids the extra DNS, TCP, and SSL (if required) setup overhead imposed on each new connection.
  • Socket late-binding: requests are associated with an underlying TCP connection only once the socket is ready to dispatch the application request, allowing better request prioritization (e.g., arrival of a higher priority request while the socket was connecting), better throughput (e.g., re-use of a "warm" TCP connection in cases where an existing socket becomes available while a new connection is being opened), as well as a general-purpose mechanism for TCP pre-connect, and a number of other optimizations.
  • Consistent session state: authentication, cookies, and cached data is shared between all render processes.
  • Global resource and network optimizations: the browser is able to make decisions across all render processes and outstanding requests. For example, giving network priority to the requests initiated by the foreground tab.
  • Predictive optimizations: by observing all network traffic, Chrome is able to build and refine predictive models to improve performance.
  • ... and the list goes on.

As far as the render process is concerned, it is simply sending a resource request message over IPC, which is tagged with a unique request ID to the browser process, and the browser kernel process handles the rest.

Cross-platform resource fetching #

One of the chief concerns in the implementation of Chrome's network stack is portability across many different platforms: Linux, Windows, OS X, Chrome OS, Android, and iOS. To address this challenge, the network stack is implemented as a mostly single-threaded (there are separate cache and proxy threads), cross-platform library, which allows Chrome to reuse the same infrastructure and provide the same performance optimizations, as well as a greater opportunity for optimization across all platforms.

All of the network code is, of course, open source and can be found in the "src/net" subdirectory. We won't examine each component in detail, but the layout of the code itself tells you a lot about its capabilities and structure. A few examples:

net/android Bindings to the Android runtime
net/base Common net utilities, such as host resolution, cookies, network change detection, and SSL certificate management
net/cookies Implementation of storage, management, and retrieval of HTTP cookies
net/disk_cache Disk and memory cache implementation for web resources
net/dns Implementation of an asynchronous DNS resolver
net/http HTTP protocol implementation
net/proxy Proxy (SOCKS and HTTP) configuration, resolution, script fetching, ...
net/socket Cross-platform implementations of TCP sockets, SSL streams, and socket pools
net/spdy SPDY protocol implementation
net/url_request URLRequest, URLRequestContext, and URLRequestJob implementations
net/websockets WebSockets protocol implementation

Each of the above makes for a great read for the curious - the code is well documented, and you'll find plenty of unit tests for every component.

Architecture and performance on mobile platforms #

Mobile browser usage is growing at an exponential rate and even by modest projections, it will eclipse desktop browsing in the not so distant future. Needless to say, delivering an optimized mobile experience has been a top priority for the Chrome team. In early 2012, Chrome for Android was announced, and a few months later, Chrome for iOS followed.

The first thing to note about the mobile version of Chrome, is that it's not simply a direct adaptation of the desktop browser - that would not deliver the best user experience. By its very nature, the mobile environment is both much more resource constrained, and has many fundamentally different operating parameters:

  • Desktop users navigate with the mouse, may have overlapping windows, have a large screen, are mostly not power constrained, usually have a much more stable network connection, and have access to much larger pools of storage and memory.
  • Mobile users use touch and gesture navigation, have a much smaller screen, are battery and power constrained, are often on metered connections, and have limited local storage and memory.

Further, there is no such thing as a "typical mobile device". Instead there is a wide range of devices with varying hardware capabilities, and to deliver the best performance, Chrome must to adapt to the operating constraints of each and every device. Thankfully, the various execution models allow Chrome to do exactly that!

On Android devices, Chrome leverages the same multi-process architecture as the desktop version - there is a browser process, and one or more renderer processes. The one difference is that due to memory constraints of the mobile device, Chrome may not be able to run a dedicated renderer for each open tab. Instead, Chrome determines the optimal number of renderer processes based on available memory, and other constraints of the device, and shares the renderer process between the multiple tabs.

In cases where only minimal resources are available, or if Chrome is unable to run multiple processes, it can also switch to use a single-process, multi-threaded processing model. In fact, on iOS devices, due to sandboxing restrictions of the underlying platform, it does exactly that - it runs a single, but multi-threaded process.

What about network performance? First off, Chrome uses the same network stack on Android and iOS, as it does on all other versions. This enables all of the same network optimizations across all platforms, which gives Chrome a significant performance advantage. However, what is different, and is often adjusted based on the capabilities of the device and the network in use, are variables such as priority of speculative optimization techniques, socket timeouts and management logic, cache sizes, and more.

For example, to preserve battery, mobile Chrome may opt-in to use lazy closing of idle sockets - sockets are closed only when opening new ones to minimize radio use. Similarly, since prerendering (which we will discuss below), may require significant network and processing resources, it is often only enabled when the user is on Wi-Fi.

Optimizing the mobile browsing experience is one of the highest priority items for the Chrome development team, and we can expect to see a lot of new improvements in the months and years to come. In fact, it is a topic that deserves its own separate chapter - perhaps in the next installment of the POSA series!

Speculative optimization with Chrome's Predictor #

Chrome gets faster as you use it. This feat is accomplished with the help of a singleton Predictor object, which is instantiated within the browser kernel process, and whose sole responsibility is to observe network patterns and to learn and anticipate likely user actions in the future. A few example signals processed by the Predictor include:

  • Users hovering their mouse over a link is a good indicator of a likely, upcoming navigation event, which Chrome can help accelerate by dispatching a speculative DNS lookup of the target hostname, as well as potentially starting the TCP handshake. By the time the user clicks, which takes ~200 ms on average, there is a good chance that we have already completed the DNS and TCP steps, allowing us to eliminate hundreds of milliseconds of extra latency for the navigation event.
  • Typing in the Omnibox (URL) bar triggers high-likelihood suggestions, which may similarly kick off a DNS lookup, TCP pre-connect, and can even pre-render the page in a hidden tab!
  • Each one of us has a list of favorite sites that we visit every day. Chrome can learn the subresources on these sites and speculatively pre-resolve and perhaps even pre-fetch them to accelerate the browsing experience. And the list goes on...

Chrome learns the topology of the web, as well as your own browsing patterns, as you use it. If it does the job well, it can eliminate hundreds of milliseconds of latency from each navigation and get the user closer to the holy grail of the "instant page load". To achieve this goal, Chrome leverages four core optimization techniques:

DNS pre-resolve resolve hostnames ahead of time, to avoid DNS latency
TCP pre-connect connect to destination server ahead of time, to avoid TCP handshake latency
Resource prefetching fetch critical resources on the page ahead of time, to accelerate rendering of the page
Page prerendering fetch the entire page with all of its resources ahead of time, to enable instant navigation when triggered by the user

Each decision to invoke one or several of these techniques is optimized against a large number of constraints. After all, each is a speculative optimization, which means that if done poorly, it might trigger unnecessary work and network traffic, or even worse, have a negative effect on the loading time for an actual navigation triggered by the user.

How does Chrome address this problem? The predictor consumes as many signals as it can, which include user generated actions, historical browsing data, as well as signals from the renderer and the network stack itself.

Not unlike the ResourceDispatcherHost, which is responsible for coordinating all of the network activity within Chrome, the Predictor object creates a number of filters on user and network generated activity within Chrome:

  • IPC channel filter to monitor for signals from the render processes
  • ConnectInterceptor object is added to each request, such that it can observe the traffic patterns and record success metrics for each request

As a hands on example, the render process can trigger a message to the browser process with any of the following hints, which are conveniently defined in ResolutionMotivation (url_info.h):

enum ResolutionMotivation { MOUSE_OVER_MOTIVATED, // Mouse-over initiated by the user. OMNIBOX_MOTIVATED, // Omni-box suggested resolving this. STARTUP_LIST_MOTIVATED, // This resource is on the top 10 startup list. EARLY_LOAD_MOTIVATED, // In some cases we use the prefetcher to warm up // the connection in advance of issuing the real // request. // The following involve predictive prefetching, triggered by a navigation. // The referring_url_ is also set when these are used. STATIC_REFERAL_MOTIVATED, // External database suggested this resolution. LEARNED_REFERAL_MOTIVATED, // Prior navigation taught us this resolution. SELF_REFERAL_MOTIVATED, // Guess about need for a second connection. // <snip> ... };

Given such a signal, the goal of the predictor is to evaluate the likelihood of its success, and then to trigger the activity if resources are available. Every hint may have a likelihood of success, a priority, and an expiration timestamp, the combination of which can be used to maintain an internal priority queue of speculative optimizations. Finally, for every dispatched request from within this queue, the predictor is also able to track its success rate, which allows it to further optimize its future decisions.

Chrome network architecture in a nutshell #

  • Chrome uses a multi-process architecture, which isolates render processes from the browser process
  • Chrome maintains a single instance of the resource dispatcher, which is shared across all render processes, and runs within the browser kernel process
  • The network stack is a cross-platform, (mostly) single-threaded library
  • The network stack uses non-blocking operations to manage all network operations
  • Shared network stack allows efficient resource prioritization, reuse, and provides the browser with ability to perform global optimization across all running processes
  • Each render process communicates with the resource dispatcher via IPC
  • Resource dispatcher intercepts resource requests via a custom IPC filter
  • Predictor intercepts resources request and response traffic to learn and optimize future network requests
  • Predictor may speculatively schedule DNS, TCP, and even resource requests based on learned traffic patterns, saving hundreds of milliseconds when the navigation is triggered by the user

Lifetime of your browser session... #

With the 10,000 foot architecture view of the Chrome network stack in mind, let's now take a closer look at the kinds of user-facing optimizations enabled within the browser. Specifically, let's imagine we have just created a new Chrome profile and are ready to start our day.

Optimizing the cold-boot experience #

The first time you load your browser, it of course knows little about your favorite sites or navigation patterns. But, as it turns out, many of us follow the same routine after a cold-boot of the browser, where we may navigate to our email inbox, favorite news site, a social site, an internal portal, and so on. The specific sites will, of course, vary, but the similarity of all these sessions allows the Chrome predictor to accelerate your cold-boot experience!

Chrome remembers the top ten likely hostnames accessed by the user following the browser start - note that this is not the top ten global destinations, but specifically the destinations following a fresh browser start. As the browser loads, Chrome can trigger a DNS pre-fetch for the likely destinations! If you are curious, you can inspect your own startup hostname list by opening a new tab and navigating to chrome://dns. At the top of the page, you will find the list of the top ten likely startup candidates for your profile.

Above screenshot is an example from my own Chrome profile. How do I usually begin my browsing? Frequently by navigating to Google Docs if I'm working on an article such at this one. Not surprisingly, we see a lot of Google hostnames in the list!

Optimizing interactions with the Omnibox #

One of the innovations of Chrome was the introduction of the Omnibox, which unlike its predecessors handles much more than just destination URLs. Besides remembering the URLs of pages that the user visited in the past, it also offers full text search over your history (tip: instead of the URL, try typing the name of the page you've recently visited), as well as a tight integration with the search engine of your choice.

As the user types, the Omnibox automatically proposes an action, which is either a URL based on your navigation history, or a search query. Under the hood, each proposed action is scored with respect to the query, as well as its past performance. In fact, Chrome allows us to inspect this data by visiting chrome://predictors.

Chrome maintains a history of the user entered prefixes, the actions it has proposed, as well as the hit rate for each one. For my own profile, you can see that whenever I enter "g" in the Omnibox, there is a 76% chance that I'm heading to Gmail. Once I add an "m" (for "gm"), then the confidence rises to 99.8% - in fact, out of the 412 recorded visits, I didn't end up going to Gmail, after entering "gm" only once!

But, you're thinking, what does this have to do with the network stack? Well, the yellow and green colors for the likely candidates are also important signals for the ResourceDispatcher! If we have a likely candidate (yellow), Chrome may trigger a DNS pre-fetch for the target host. If we have a high confidence candidate (green), then Chrome may also trigger a TCP pre-connect once the hostname has been resolved. And finally, if both complete while the user is still deliberating, then Chrome may even pre-render the entire page in a hidden tab.

Alternatively, if there is no good match for the entered prefix based on past navigation history, then Chrome may issue a DNS pre-fetch and TCP pre-connect to your search provider, in anticipation of a likely search request.

An average user takes hundreds of milliseconds to fill in their query and to evaluate the proposed autocomplete suggestions. In the background, Chrome is able to pre-fetch, pre-connect, and in certain cases even pre-render the page, such that by the time the user is ready to hit the "enter" key, much of the network latency has already been eliminated!

Optimizing cache performance #

The best, and the fastest request, is a request not made. Whenever we talk about performance, we would be amiss if we didn't talk about the cache -- you are providing ExpiresETagLast-Modified, and Cache-Control response headersfor all the resources on your pages, right? If not, stop, go fix it, we'll wait.

Chrome has two different implementations of the internal cache: one backed by local disk, and second which stores everything in memory. The in-memory implementation is used for the Incognito browsing mode and is wiped clean whenever you close the window. Both implement the same internal interface (disk_cache::Backend, and disk_cache::Entry), which greatly simplifies the architecture - and if you are so inclined, allows you to easily experiment with your own, experimental cache implementations.

Internally, the disk cache implements its own set of data structures, all of which are stored within a single cache folder for your profile. Inside this folder, there are index files, which are memmapped when the browser starts, and data files which store the actual data, alongside the HTTP headers and other bookkeeping information. As an interesting sidenote, resources up to 16KB in size are stored in shared data block-files, and larger files get their own dedicated files on disk. Finally, for eviction, the disk cache maintains an LRU which uses ranking metrics such as frequency of access and age of resource into account.

If you are ever curious about the state of the Chrome cache, open a new tab and navigate to chrome://net-internals/#httpCache. Alternatively, if you want to see the actual HTTP metadata and the cached response, you can also visit chrome://cache, which will enumerate all of the resources currently available in the cache. From that page, search for a resource you're looking for and click on the URL to see the exact, cached headers and response bytes.

Optimizing DNS with prefetching #

We have already mentioned DNS pre-resolution on several occasions, so before we dive into the implementation, let's review the cases in which it may be triggered, and why:

  • The Blink document parser, which runs in the render process, may provide a list of hostnames for all the links on the current page, which Chrome may, or may not choose to pre-resolve ahead of time.
  • Renderer process may trigger a mouse hover or "button down" event as an early signal of user's intent to perform a navigation.
  • The Omnibox may trigger a resolve request based on a high likelihood suggestion.
  • Chrome predictor may request hostname resolution based on past navigation and resource request data - more on this below.
  • The owner of the page may explicitly indicate to Chrome which hostnames it should pre-resolve.

In all cases, DNS pre-resolution is treated as a hint. Chrome does not guarantee that the pre-resolution will occur, rather it uses each signal in combination with its own predictor to assess the hint and decide on a course of action. In the "worst case", if we weren't able to pre-resolve the hostname in time, the user would have to wait for an explicit DNS resolution, followed by TCP connection time, and finally the actual resource fetch. However, when this occurs, the predictor can take note and adjust its future decisions accordingly - it gets faster, and smarter, as you use it.

One of the optimizations we have not covered previously is the ability of Chrome to learn the topology of each site and then use this information to accelerate future visits. Specifically, recall that an average page consists of 88 resources, which are delivered from 30+ distinct hosts. Well, each time you perform a navigation, Chrome may record the hostnames for the popular resources on the page, and during a future visit, it may choose to trigger a DNS pre-resolve and even a TCP pre-connect for some, or all of them!

To inspect the subresource hostnames stored by Chrome, navigate to chrome://dns and search for any popular destination hostname for your profile. In the example above, you can see the six subresource hostnames that Chrome remembered for Google+, as well as stats for the number of cases when a DNS pre-resolution was triggered, or a TCP pre-connect was performed, as well as an expected number of requests that will be served by each. This internal accounting is what enables the Chrome predictor to perform its optimizations.

In addition to all of the internal signals, the owner of the site is also able to embed additional markup on their pages to request the browser to pre-resolve a hostname:

<link rel="dns-prefetch" href="//host_name_to_prefetch.com">

Why not simply rely on the automated machinery in the browser? In some cases, you may want to pre-resolve a hostname which is not mentioned anywhere on the page. The canonical example is, of course, redirects: a link may point to a host, like an analytics tracking service, which then redirects the user to the actual destination. By itself, Chrome cannot infer this pattern, but you can help it by providing a manual hint and get the browser to resolve the hostname of the actual destination ahead of time.

So, how is this all implemented under the hood? The answer to this question, just like all other optimizations we have covered, depends on the version of Chrome, since the team is always experimenting with new and better ways to improve performance. However, broadly speaking, the DNS infrastructure within Chrome has two major implementations: historically, Chrome has relied on the platform-independent getaddrinfo() system call, and delegated the actual responsibility for the lookups to the operating system, however this approach is in the process of being replaced with Chrome's own implementation of an asynchronous DNS resolver.

The original implementation, which relied on the operating system, has its benefits: less and simpler code, and the ability to leverage the operating system's DNS cache. However, getaddrinfo() is also a blocking system call, which meant that Chrome had to create and maintain a dedicated worker thread-pool to allow it to perform multiple lookups in parallel. This unjoined pool was capped at six worker threads, which is an empirical number based on lowest common denominator of hardware - turns out, higher numbers of parallel requests can overload some users' routers!

For pre-resolution with the worker-pool, Chrome simply dispatches the getaddrinfo() call, which blocks the worker thread until the response is ready, at which point it just discards the returned result and begins processing the next prefetch request. Discards it? The result is cached by the OS DNS daemon cache, which returns an immediate response to future, actual getaddrinfo() lookups. Simple, effective, works well enough in practice.

Well, effective, but not good enough! The getaddrinfo() call hides a lot of useful information from Chrome, such as the time-to-live (TTL) timestamps for each record, as well as the state of the DNS cache itself. To improve performance, Chrome team decided to implement their own, cross-platform, asynchronous DNS resolver.

By moving DNS resolution into Chrome the new async resolver enables a number of new optimizations:

  • better control of retransmission timers, and ability to execute multiple queries in parallel
  • visibility into record TTLs, which allows Chrome to refresh popular records ahead of time
  • better behavior for dual stack implementations (IPv4 and IPv6)
  • failovers to different servers, based on RTT or other signals

All of the above, and more, are ideas for continuous experimentation and improvement within Chrome. Which brings us to the obvious question: how do we know and measure the impact of these ideas? Simple, Chrome tracks detailed network performance stats and histograms for each individual profile. To inspect the collected DNS metrics, open a new tab, and head to chrome://histograms/DNS.

The above histogram shows the distribution of latencies for DNS prefetch requests: roughly 50% (rightmost column) of the prefetch queries were finished within 20ms (leftmost column). Note that this is data based on a recent browsing session (9869 samples), and is private to the user. If the user has opted in to report their usage stats in Chrome, then the summary of this data is anonymized and periodically beaconed back to the engineering team, which is then able to see the impact of their experiments and adjust accordingly. Rinse, lather, repeat.

Optimizing TCP connection management with pre-connect #

We have pre-resolved the hostname and we have a high likelihood navigation event that's about to happen, as estimated by the Omnibox, or the Chrome predictor. Why not go one step further, and also speculatively pre-connect to the destination host and complete the TCP handshake before the user dispatches the request? By doing so, we can eliminate another full round-trip of latency delay, which can easily save hundreds of milliseconds for the user. Well, that's exactly what TCP-preconnect is and how it works!

To see the hosts for which a TCP preconnect has been triggered, open a new tab and visit chrome://dns.

First, Chrome checks its socket pools to see if there is an available socket for the hostname, which it may be able to reuse - keep-alive sockets are kept in the pool for some period of time, to avoid the TCP handshake and slow-start penalties. If no socket is available, then it can initiate the TCP handshake, and place it in the pool. Then, when the user initiates the navigation, the HTTP request can be dispatched immediately.

Curious to see the state of all the open sockets in Chrome? Simple, head to: chrome://net-internals#sockets

Note that you can also drill into each socket and inspect the timeline: connect and proxy times, arrival times for each packet, and more. Last but not least, you can also export this data for further analysis or a bug report. Having good instrumentation is key to any performance optimization, and chrome://net-internals is the nexus of all things networking in Chrome - if you haven't explored it yet, you should!

Optimizing resource loading with prefetch hints #

Sometimes, the author of a page is able to provide additional navigation, or page context, based on the structure or the layout of their site, and help the browser optimize the experience for the user. Chrome supports two such hints, which can be embedded in the markup of the page:

<link rel="subresource" href="/javascript/myapp.js"> <link rel="prefetch" href="/images/big.jpeg">

Subresource and prefetch look very similar, but have very different semantics. When a link resource specifies its relationship as "prefetch", it is an indication to the browser that this resource might be needed in a future navigation. In other words, effectively it is a cross-page hint. By contrast, when a resource specifies the relationship as a "subresource", it is an early indication to the browser that the resource will be used on a current page, and that it may want to dispatch the request before it encounters it later in the document.

As you would expect, the different semantics of the hints lead to very different behavior by the resource loader. Resources marked with prefetch are considered low priority and might be downloaded by the browser only once the current page has finished loading. Whereas subresource resources are fetched with high priority as soon as they are encountered and will compete with the rest of the resources on the current page.

Both hints, when used well and in the right context, can help significantly with optimizing the user experience on your site. Finally, it is also important to note that prefetch is part of the HTML5 spec, and as of today supported by Firefox and Chrome, whereas subresource is currently only available in Chrome.

Optimizing resource loading with browser prefreshing #

Unfortunately, not all site owners are able or willing to provide the browser with subresource hints in their markup. Further, even if they do, we must wait for the HTML document to arrive from the server before we are able to parse the hints and begin fetching the necessary subresources - depending on the server response time, as well as the latency between the client and the server, this could take hundreds and even thousands of milliseconds.

However, as we saw earlier, Chrome is already learning the hostnames of the popular resources to perform DNS pre-fetching. So, why couldn't it do the same, but go one step further and perform the DNS lookup, use TCP preconnect, and then also speculatively prefetch the resource? Well, that's exactly what "prefreshing" could do:

  • User initiates a request to a target URL
  • Chrome queries its Predictor for learned subresources associated with target URL and initiates the sequence of DNS prefetch, TCP preconnect, and resource prefreshing
  • If the learned subresource is in the cache, then its loaded from disk and into memory
  • If the learned subresource is missing, or has expired, then a network request is made

Resource prefreshing is a great example of the workflow of every experimental optimization in Chrome - in theory, it should enable better performance, but there are many tradeoffs as well. There is only one way to reliably determine if it will make the cut and make it into Chrome: implement it and run it as A/B experiment in some of the pre-release channels with real users, on real networks, with real browsing patterns.

As of early 2013, the Chrome team is in the early stages of discussing the implementation. If it makes the cut based on gathered results, we may see prefreshing in Chrome sometime later in the year. The process of improving Chrome network performance never stops, the team is always experimenting with new approaches, ideas, and techniques.

Optimizing navigation with prerendering #

Each and every optimization we have covered up to now helps reduce the latency between the user's direct request for a navigation and the resulting page rendering in their tab. However, what would it take to have a truly instant experience? Based on the UX data we saw earlier, this interaction would have to happen in less than 100 milliseconds, which doesn't leave much room for network latency at all. What could we do to deliver a rendered page in sub 100 milliseconds?

Of course, you already know the answer, since this is a common pattern employed by many users: if you open multiple tabs then switching between tabs is instant and is definitely much faster than waiting for the navigation between the same resources in a single foreground tab. Well, what if the browser provided an API to do this?

<link rel="prerender" href="http://example.org/index.html">

You guessed it, that's prerendering in Chrome! Instead of just downloading a single resource, as the "prefetch" hint would have done, the "prerender" attribute indicates to Chrome that it should, well, prerender the page in a hidden tab, along with all of its subresources. The hidden tab itself is invisible to the user, but when the user triggers the navigation, the tab is swapped in from the background for an "instant experience".

Curious to try it out? You can visit prerender-test.appspot.com for a hands on demo, and see the history and status of the prerendered pages for your profile by visiting: chrome://net-internals/#prerender

As you would expect, rendering an entire page in a hidden tab can consume a lot of resources, both CPU and network, and hence should only be used in cases where we have high confidence that the hidden tab will be used! For example, when you are using the Omnibox, a prerender may be triggered for the a high confidence suggestion. Similarly, Google Search sometimes adds the prerender hint to its markup if it estimates that the first search result is a highly confidence destination (aka, Google Instant Pages):

 

Note that you can also add prerender hints to your own site! However, before you do, note that prerendering has a number of restrictions and limitations, which you should keep in mind:

  • At most one prerender tab is allowed across all processes
  • HTTPS and pages with HTTP authentication are not allowed
  • Prerendering is abandoned if the requested resource, or any of its subresources need to make a non-idempotent request (only GET requests allowed)
  • All resources are fetched with lowest network priority
  • The page is rendered with lowest CPU priority
  • The page is abandoned if memory requirements exceed 100MB
  • Plugin initialization is deferred, and pre-rendering is abandoned if an HTML5 media element is present

In other words, prerendering is not guaranteed to happen and only applies to pages where it is safe. Additionally, since JavaScript and other logic may be executed within the hidden page, it is best practice to leverage the Page Visibility API to detect if the page is visible - which is something you should be doing anyway!

Chrome gets faster as you use it #

Needless to say, Chrome's network stack is much more than a simple socket manager. Our whirlwind tour covered the many levels of potential optimizations that are performed transparently in the background, as you navigate the web. The more Chrome learns about the topology of the web and your browsing patterns, the better it can do its job. Almost like magic, Chrome gets faster as you use it. Except, it's not magic, because now you know how it works!

Finally, it is important to note that the Chrome team continues to iterate and experiment with new ideas to improve performance - this process never stops. By the time you read this, chances are there will be new experiments and optimizations being developed, tested, or deployed. Perhaps once we reach our target destination of instant page loads (<100 ms), for each and every page, then we can take a break. Until then, there is always more work to do!

Ilya Grigorik Ilya Grigorik is a web performance engineer at Google, co-chair of the W3C Web Performance working group, and author of High Performance Browser Networking (O'Reilly) book — follow on  TwitterGoogle+.
 
 
 

Google Chrome的歷史和指導原則

原譯註:這部分再也不詳細翻譯,只列出核心意思。

驅動Chrome繼續前進的核心原則包括:

  • Speed: 作最快的(fastest)的瀏覽器。
  • Security:爲用戶提供最爲安全的(most secure)的上網環境。
  • Stability: 提供一個健壯且穩定的(resilient and stable)的Web應用平臺。
  • Simplicity: 以簡練的用戶體驗(simple user experience)封裝精益求精的技術(sophisticated technology)。

本文關將注於第一點,速度。

關於性能的方方面面

一個現代瀏覽器就是一個和操做系統同樣的平臺。在Chrome以前的瀏覽器都是單進程的應用,全部頁面共享相同的地址空間和資源。引入多進程架構這是Chrome最爲著名的改進。

原譯註:省略一些反覆談論的細節。

一個進程內,Web應用主要須要執行三個任務:獲取資源,頁面 排版及渲染,和運行JavaScript。渲染和腳本都是在運行中交替以單線程的方式運行的,其緣由是爲了保持DOM的一致性,而JavaScript本 身也是一個單線程的語言。因此優化渲染和腳本運行不管對於頁面開發者仍是瀏覽器開發者都是極爲重要的。

Chrome的渲染引擎是WebKit, JavaScript Engine則使用深刻優論的V8 (「V8″ JavaScript runtime)。可是,若是網絡不順暢,不管優化V8的JavaScript執行,仍是優化WebKit的解析和渲染,做用其實頗有限。巧婦難爲無米之炊,數據沒來就得等着!

相對於用戶體驗,做用最爲明顯的就是如何優化網絡資源的加載順 序、優先級及每個資源的延遲時間(latency)。也許你察覺不到,Chrome網絡模塊天天都在進步,逐步下降每一個資源的加載成本:向DNS lookups學習,記住頁面拓撲結構(topology of the web), 預先鏈接可能的目標網址,等等,還有不少。從外面來看就是一個簡單的資源加載的機制,但在內部倒是一個精彩的世界。

關於Web應用

開始正題前,仍是先來了解一下如今網頁或者Web應用在網絡上的需求。

HTTP Archive 項目一直在追蹤網頁構建。除了頁面內容外,它還會分析流行頁面使用的資源數量,類型,頭信息以及不一樣目標地址的元數據(metadata)。下面是2013年1月的統計資料,由300,000目標頁面得出的平均數據:

  • 1280 KB
  • 包含88個資源(Images,JavaScript,CSS …)
  • 鏈接15個以上的不一樣主機(distinct hosts)

這些數字在過去幾年中一直持續增加(steadily increasing),沒有停下的跡象。這說明咱們正不斷地建構一個更加龐大的、野心勃勃的網絡應用。還要注意,平均來看每一個資源不過12KB, 代表絕大多數的網絡傳輸都是短促(short and bursty)的。這和TCP針對大數據、流式(streaming)下載的方向不一致,正由於如此,而引入了一些併發症。下面就用一個例子來抽絲剝繭,一窺究竟……

一個Resource Request的一輩子

W3C的Navigation Timing specification定義了一組API,能夠觀察到瀏覽器的每個請求(request)的時序和性能數據。下面瞭解一些細節:

給定一個網頁資源地址後,瀏覽器就會檢查本地緩存和應用緩存。若是以前獲取過而且有相應的緩存信息(appropriate cache headers)(如Expires, Cache-Control, etc.), 就會用緩存數據填充這個請求,畢竟最快的請求就是沒有請求(the fastest request is a request not made)。不然,咱們從新驗證資源,若是已經失效(expired),或者根本就沒見過,一個耗費網絡的請求就沒法避免地發送了。

給定了一個主機名和資源路徑後,Chrome先是檢查現有已創建的鏈接(existing open connections)是否能夠複用, 即sockets指定了以(scheme、host和port)定義的鏈接池(pool)。但若是配置了一個代理,或者指定了proxy auto-config(PAC)腳本,Chrome就會檢查與proxy的鏈接。PAC腳本基於URL提供不一樣的代理,或者爲此指定了特定 的規則。與每個代理間均可以有本身的socket pool。最後,上述狀況都不存在,這個請求就會從DNS查詢(DNS lookup)開始了,以便得到它的IP地址。

幸運的話,這個主機名已經被緩存過。不然,必須先發起一個 DNS Query。這個過程所需的時間和ISP,頁面的知名度,主機名在中間緩存(intermediate caches)的可能性,以及authoritative servers的響應時間這些因素有關。也就是說這裏變量不少,不過通常還不致於到幾百毫秒那麼誇張。

拿到解析出的IP後,Chrome就會在目標地址間打開一個新TCP鏈接,咱們就要執行一個3度握手(「three-way handshake」): SYN > SYN-ACK > ACK。這個操做每一個新的TCP鏈接都必須完成,沒有捷徑。根據遠近,路由路徑的選擇,這個過程可能要耗時幾百毫秒,甚至幾秒。而到如今,咱們連一個有效的字節都還沒收到。

當TCP握手完成了,若是咱們鏈接的是一個HTTPS地址,還有一個SSL握手過程,同時又要增長最多兩輪的延遲等待。若是SSL會話被緩存了,就只需一次。

最後,Chrome終於要發送HTTP請求了 (如上面圖示中的requestStart)。 服務器收到請求後,就會傳送響應數據(response data)回到客戶端。這裏包含最少的往返延遲和服務的處理時間。而後一個請求就完成了。可是,若是是一個HTTP重定向(redirect)的話?咱們 又要從頭開始這個過程。若是你的頁面裏有些冗餘的重定向,最好三思一下!

你得出全部的延遲時間了嗎? 咱們假設一個典型的寬帶環境:沒有本地緩存,相對較快的DNS lookup(50ms), TCP握手,SSL協商,以及一個較快服務器響應時間(100ms)和一次延遲(80ms,在美國國內的平均值):

  • 50ms for DNS
  • 80ms for TCP handshake (one RTT)
  • 160ms for SSL handshake (two RTT’s)
  • 40ms (發送請求到服務器)
  • 100ms (服務器處理)
  • 40ms (服務器回傳響應數據)

一個請求花了470毫秒, 其中80%的時間被網絡延遲佔去了。看到了吧,咱們真得有不少事情要作!事實上,470毫秒已經很樂觀了:

  • 若是服務器沒有達到到初始TCP的擁塞窗口(congestion window),即4-15KB,就會引入更多的往返延遲。
  • SSL延遲也可能變得更糟。若是須要獲取一個沒有的認證(certificate)或者執行online certificate status check(OCSP), 都會讓咱們須要一個新的TCP鏈接,又增長了數百至上千毫秒的延遲。

怎樣纔算」夠快」?

前面能夠看到服務器響應時間僅是總延遲時間的20%,其它都被DNS,握手等操做佔用了。過去用戶體驗研究(user experience research)代表用戶對延遲時間的不一樣反應:

  • 0 – 100ms 迅速
  • 100 – 300ms 有點慢
  • 300 – 1000ms 機器還在運行
  • 1s+ 想一想別的事……
  • 10s+ 我一會再來看看吧……

上表一樣適用於頁面的性能表現: 渲染頁面,至少要在250ms內給個迴應來吸引住用戶。這就是簡單地針對速度。從Google, Amazon, Microsoft,以及其它數千個站點來看,額外的延遲直接影響頁面表現:流暢的頁面會吸引更多的瀏覽、以及更強的用戶吸引力(engagement) 和頁面轉換率(conversion rates).

如今咱們知道了理想的延遲時間是250ms,而前面的示例告訴咱們,DNS Lookup, TCP和SSL握手,以及request的準備時間花去了370ms, 即使不考慮服務器處理時間,咱們也超出了50%。

對於絕大多數的用戶和網頁開發者來講,DNS, TCP,以及SSL延遲都是透明,不多有人會想到它。這也就是爲何Chrome的網絡模塊那麼的複雜。

咱們已經識別出了問題,下面讓咱們深刻一下實現的細節

深刻Chrome的網絡模塊

多進程架構

Chrome的多進程架構爲瀏覽器的網絡請求處理帶來了重要意義,它目前支持四種不一樣的執行模式(four different execution models)。

默認狀況下,桌面的Chrome瀏覽器使用process-per-site模式, 將不一樣的網站頁面隔離起來, 相同網站的頁面組織在一塊兒。舉個簡單的例子: 每一個tab獨立一個進程。從網絡性能的角度上說,並沒什麼本質上的不一樣,只是process-per- tabl模式更易於理解。

每個tab有一個渲染進程(render process),其中包括了用於解析頁面(interpreting)和排版(layout out)的WebKit的排版引擎(layout engine), 即上圖中的HTML Render。還有V8引擎和二者之間的DOM Bindings,若是你對這部分很好奇,能夠看這裏(great introduction to the plumbing)。

每個這樣的渲染進程被運行在一個沙箱環境中,只會對用戶的電 腦環境作極有限的訪問–包括網絡。而使用這些資源,每個渲染進程必須和瀏覽內核進程(browser[kernel] process)溝通,以管理每一個渲染進程的安全性和訪問策略(access policies)。

進程間通信(IPC)和多進程資源加載

渲染進程和內核進程之間的通信是經過IPC完成的。在Linux和 Mac OS上,使用了一個提供異步命名管道通信方式的socketpair()。每個渲染進程的消息會被序列化地到一個專用的I/O線程中,而後再由它發到內 核進程。在接收端,內核進程提供一個過濾接口(filter interface)用於解析資源相關的IPC請求(ResourceMessageFilter), 這部分就是網絡模塊負責的。

這樣作其中一個好處是全部的資源請求都由I/O進程處理,不管是UI產生的活動,或者網絡事件觸發的交互。在內核進程(browser/kernel process)的I/O線程解析資源請求消息,將轉發到一個ResourceDispatcherHost的單例(singleton)對象中處理。

這個單例接口容許瀏覽器控制每一個渲染進程對網絡的訪問,也能達到有效和一致的資源共享:

  • Socket pool 和 connection limits: 瀏覽器能夠限定每個profile打開256個sockets, 每一個proxy打開32個sockets, 而每一組{scheme, host, port}能夠打開6個。注意同時針對一組{host,port}最多允計打開6個HTTP和6個HTTPS鏈接。
  • Socket reuse: 在Socket Pool中提供持久可用的TCP connections,以供複用。這樣能夠爲新的鏈接避免額外創建DNS、TCP和SSL(若是須要的話)所花費的時間。
  • Socket late-binding(延遲綁定): 網絡請求老是當Scoket準備好發送數據時才與一個TCP鏈接關連起來,因此首先有機會作到對請求有效分級(prioritization),好比,在 socket鏈接過程當中可能會到達到一個更高優先級的請求。同時也能夠有更好的吞吐率(throughput),好比,在鏈接打開過程當中,去複用一個恰好 可用的socket, 就可使用到一個徹底可用的TCP鏈接。其實傳統的TCP pre-connect(預鏈接)及其它大量的優化方法也是這個效果。
  • Consistent session state(一致的會話狀態): 受權、cookies及緩存數據會在全部渲染進程間共享。
  • Global resource and network optimizations(全局資源和網絡優化): 瀏覽器可以在全部渲染進程和未處理的請求間作更優的決策。好比給當前tab對應的請求以更好的優先級。
  • Predictive optimizations(預測優化): 經過監控網絡活動,Chrome會創建並持續改善預測模型來提高性能。
  • … 項目還在增長中。

單就一個渲染進程而言, 透過IPC發送資源請求很容易,只要告訴瀏覽器內核進程一個惟一ID, 後面就交給內核進程處理了。

跨平臺的資源加載

跨平臺也是Chrome網絡模塊的一個主要考量,包括Linux, Windows, OS X, Chrome OS, Android, 和iOS。 爲此,網絡模塊儘可能實現成了單進程模式(只分出了獨立的cache和proxy進程)的跨平臺函數庫, 這樣就能夠在平臺間共用基礎組件(infrastructure)並分享相同的性能優化,更有機會作到同時爲全部平臺進行優化。

相關的代碼能夠在這裏找到the 「src/net」 subdirectory)。本文不會詳細展開每一個組件,不過了解一下代碼結構能夠幫助咱們理解它的能力結構。 好比:

  • net/android 綁定到Android 運行時(runtime) [譯註(Horky):運行時真是一個很爛的術語,翻和沒翻同樣。]
  • net/base 公共的網絡工具函數。好比,主機解析, cookies, 網絡轉換偵測(network change detection),以及SSL認證管理
  • net/cookies 實現了Cookie的存儲、管理及獲取
  • net/disk_cache 磁盤和內存緩存的實現
  • net/dns 實現了一個異步的DNS解析器(DNS resolver)
  • net/http 實現了HTTP協議
  • net/proxy 代理(SOCKS 和 HTTP)配置、解析(resolution) 、腳本抓取(script fetching), …
  • net/socket TCP sockets,SSL streams和socket pools的跨平臺實現
  • net/spdy 實現了SPDY協議
  • net/url_request URLRequest, URLRequestContext和URLRequestJob的實現
  • net/websockets 實現了WebSockets協議

上面每一項都值得好好讀讀,代碼組織的很好,你還會發現大量的單元測試。

Mobile平臺上的架構和性能

移動瀏覽器正在大發展,Chrome團隊也視優化移動端的體驗爲最高優先級。先要說明的是移動版的Chrome的並非其桌面版本的直接移植,由於那樣根本不會帶來好的用戶體驗。移動端的先天特性就決定了它是一個資源嚴重受限的環境,在運行參數有一些基本的不一樣:

  • 桌面用戶使用鼠標操做,能夠有重疊的窗口,大的屏幕,也不用擔憂電池。網絡也很是穩定,有大量的存儲空間和內存。
  • 移動端的用戶則是觸摸和手勢操做,屏幕小,電池電量有限,經過只能用龜速且昂貴的網絡,存儲空間和內存也是至關受限。

再者,不但沒有典型的樣板移動設備,反而是有一大批各色硬件的設備。Chrome要作的,只能是設法兼容這些設備。好在Chrome有不一樣的運行模式(execution models),面對這些問題,遊刃有餘!

在Android版本上,Chrome一樣運用了桌面版本的多進程架構。一個瀏覽器內核進程,以及一個或多個渲染進程。但由於內存的限制,移動版的Chrome沒法爲每個tabl運行一個特定的渲染進程,而是根據內存狀況等條件決定一個最佳的渲染進程個數,而後就會在多個tab間共享這些渲染進程。

若是內存實在不足,或其它緣由致使Chrome沒法運行多進程,它就會切到單進程、多線程的模式。好比在iOS設備上,由於其沙箱機制的限制,Chrome只能運行在這種模式下。

關於網絡性能,首先Chrome在Android和iOS使用的是 各其它平臺相同的網絡模塊。這能夠作到跨平臺的網絡優化,這也是Chrome明顯領先的優點之一。所不一樣的是須要常常根據網絡狀況和設備能力進行些調整, 包括推測優化(speculative optimization)的優先級、socket的超時設置和管理邏輯、緩存大小等。

好比,爲了延長電池壽命,移動端的Chrome會傾向於延遲關閉空 閒的sockets (lazy closing of idle sockets), 一般是爲了減小信號(radio)的使用而在打開新的socket時關閉舊的。另外由於預渲染(pre-rendering,稍後會介紹)會使用必定的網 絡和處理資源,它一般只在WiFi纔會使用。

關於移動瀏覽體驗會獨立一章,也許就在POSA系列的下一期。

Chrome Predictor的預測功能優化

Chrome會隨着使用變得更快。它這個特性是經過一個單例對象Predictor來實現的。這個對象在瀏覽器內核進程(Browser Kernel Process)中實例化,它惟一的職責就是觀察和學習當前網絡活動方式,提早預估用戶下一步的操做。下面是一個示例:

  • 用戶將鼠標停留在一個連接上,就預示着一個用戶的偏好以及下一步的瀏覽行爲。這時Chrome就能夠提早進行DNS Lookup及TCP握手。用戶的點擊操做平均須要將近200ms,在這個時間就可能處理完DNS和TCP相關的操做, 也就是省去幾百毫秒的延遲時間。
  • 當在地址欄(Omnibox/URL bar) 觸發高可能性選項時,就一樣會觸發一個DNS lookup和TCP預鏈接(pre-connect),甚至在一個不可見的頁籤中進行預渲染(pre-render)!
  • 咱們每一個人都一串每天會訪問的網站, Chrome會研究在這些頁面上的子資源, 而且嘗試進行預解析(pre-resolve), 甚至可能會進行預加載(pre-fetch)以優化瀏覽體驗。

除了上面三項,還有不少..

Chrome會在你使用過程當中學習Web的拓撲結構,而不僅僅是你的瀏覽模式。理想的話,它將爲你省去數百毫秒的延遲, 更接近於即時頁面加載的狀態. 正是爲了這個目標,Chrome投入瞭如下的核心優化技術:

  • DNS預解析(pre-resolve):提早解析主機地址,以減小DNS延遲
  • TCP預鏈接(pre-connect):提早鏈接到目標服務器,以減小TCP握手延遲
  • 資源預加載(prefetching):提早加載頁面的核心資源,以加載頁面顯示
  • 頁面預渲染(prerendering):提早獲取整個頁面和相關子資源,這樣能夠作到及時顯示

每個決策都包含着一個或多個的優化, 用來克服大量的限制因素. 不過畢竟都只是預測性的優化策略,若是效果不理想,就會引入多餘的處理和網絡傳輸。甚至可能會帶來一些加載時間上的負體驗。

Chrome如何處理這些問題呢? Predictor會盡可能收集各類信息,諸如用戶操做,歷史瀏覽數據,以及來自渲染引擎(render)和網絡模塊自身的信息。

和Chrome中負責網絡事務調度的ResourceDispatcherHost不一樣,Predictor對象會針對用戶和網絡事務建立一組過濾器(filter):

  • IPC channel filter用來監控來自render進程的事務。
  • 每一個請求上都會加一個ConnectInterceptor 對象,這樣就能夠跟蹤網絡傳輸的模式以及每個請求的度量數據。

渲染進程(render process)會在一系列的事件下發送消息到瀏覽器進程(browser process), 這些事件被定義在一個枚舉(ResolutionMotivation)中以便於使用 (url_info.h):

enum ResolutionMotivation { MOUSE_OVER_MOTIVATED, // 鼠標懸停. OMNIBOX_MOTIVATED, // Omni-box建議進行解析. STARTUP_LIST_MOTIVATED, // 這是在前10個啓動項中的資源. EARLY_LOAD_MOTIVATED, // 有時須要使用prefetched來提早創建鏈接. // 下面定義了預加載評估的方式,會由一個navigation變量指定. // referring_url_也須要同時指定. STATIC_REFERAL_MOTIVATED, // 外部數據庫(External Database)建議進行解析。 LEARNED_REFERAL_MOTIVATED, // 前一次瀏覽(prior navigation建議進行解析. SELF_REFERAL_MOTIVATED, // 猜想下一個鏈接是否是須要進行解析. // <略> ... }; 

經過這些給定的事 件,Predictor的目標就能夠評估它成功的可能性, 而後再適時觸發操做。每一項事件都有其成功的機率、優先級以及時間戳,這些能夠在內部維護一個用優先級管理的隊列,也是優化的一個手段。最終,對於這個隊 列中發出的每個請求的成功率,均可以被Predictor追蹤到。基於這些數據,Predictor就能夠進一步優化它的決策。

Chrome網絡架構小結

  • Chrome使用多進程架構,將渲染進程同瀏覽器進程隔離開來。
  • Chrome維護着一個資源分發器的實例(a single instance of the resource dispatcher), 它運行在瀏覽器內核進程,並在各個渲染進程間共享。
  • 網絡層是跨平臺的,大部分狀況下以單進程庫存在。
  • 網絡層使用非阻塞式(no-blocking)操做來管理全部網絡任務。
  • 共享的網絡層支持有效的資源排序、複用、併爲瀏覽器提供在多進程間進行全局優化的能力。
  • 每個渲染進程經過IPC和資源分發器(resource dispatcher)通信。
  • 資源分發器(Resource dispatcher)經過自定義的IPC Filter解析資源請求。
  • Predictor在解析資源請求和響應網絡事務中學習,並對後續的網絡請求進行優化。
  • Predictor會根據學習到的網絡事務模式預測性的進行DNS解析, TCP握手,甚至是資源請求,爲用戶實際操做時節省數百毫秒的時間。

瞭解晦澀的內部細節後,讓咱們來看一下用戶能夠感覺到的優化。一切從全新的Chrome開始。

優化冷啓動(Cold-Boot)體驗

第一次啓動瀏覽器,它固然不可能瞭解你的使用習慣和喜歡的頁面。但事實上,咱們大多數會在瀏覽器的冷啓動後作些相似的事情,好比到電子郵箱查看郵件,加一些新聞頁面、社交頁面及內部 頁面到個人最愛,諸如此類。這些頁面各有不一樣,但它們仍然具備一些類似性,因此Predictor仍然能夠對這個過程提速。

Chrome記下了用戶在全新啓動瀏覽器時最經常使用的10個域名。當瀏覽器啓動時,Chrome會提早對這些域名進行DNS預解析。你能夠在Chrome中使用chrome://dns查看到這個列表。在打開頁面的最上面的表格中會列出啓動時的備選域名列表。

經過Omnibox優化與用戶的交互

引入Omnibox是Chrome的一項創新, 並非簡單地處理目標的URL。除了記錄以前訪問過的頁面URL,它還與搜索引擎的整合,而且支持在歷史記錄中進行全文搜索(好比,直接輸入頁面名稱)。

當用戶輸入時,Omnibox自動發起一個行爲,要麼查找瀏覽記錄中的URL, 要麼進行一次搜索。每一次發起的操做都會被加以評分,以統計它的性能。你能夠在Chrome輸入chrome://predictors來觀察這些數據。

Chrome維護着一個歷史記錄,內容包括用戶輸入的前置文字,採用的行爲,以命中的資數。在上面的列表,你能夠看到,當輸入g時,有76%的機會嘗試打開Gmail. 若是再補充一個m (就是gm), 打開Gmail的可能性增長到99.8%。

那麼網絡模塊會作什麼呢?上 表中的黃色和綠色對於ResourceDispatcher很是重要。若是有一個通常可能性的頁面(黃色), Chrome就是發起DNS預解析。若是有一個高可能性的頁面(綠色),Chrome還會在DNS解析後發起TCP預鏈接。若是這兩項都完成了,用戶仍然 繼續錄入,Chrome就會在一個隱藏的頁籤進行預渲染(pre-render)。

相對的,若是輸入的前置文字找不到合適的匹配項目,Chrome會向搜索引擎服務者發起DNS預解析和TCP預連,以獲取類似的搜索結果。

平均而言用戶從填寫查詢內容到評估給出的建議須要花費數百毫秒。此時Chrome能夠在後臺進行預解析,預鏈接,甚至進行預渲染。再當用戶準備按下回車鍵時,大量的網絡延遲已經被提早處理掉了。

優化緩存性能

最快的請求就是沒有請求。 不管什麼時候討論性能,都不能不談緩存。相信你已經爲頁面上全部資源的都提供了Expires, ETag, Last-Modified和Cache-Control這些響應頭信息(response headers)。什麼? 尚未?那你仍是先處理好再來看吧!

Chrome有兩種不一樣的內部緩存的實現:一種備份於本地磁盤(local disk),另外一種則存儲於內存(in-memory)中。內存模式(in-memory)主要應用於無痕瀏覽模式(Incognito browsing mode),並在關閉窗口清除掉。 兩種方式使用了相同的內部接口(disk_cache::Backend, 和disk_cache::Entry),大大簡化了系統架構。若是你想實現一個本身的緩存算法,能夠很容易地實現進去。

在內部,磁盤緩存(disk cache)實現了它本身的一組數據結構, 它們被存儲在一個單獨的緩存目錄裏。其中有索引文件(在瀏覽器啓動時加載到內存中),數據文件(存儲着實際數據,以及HTTP頭以及其它信息)。比較有趣 的是,16KB如下的文件存儲於共同的數據塊文件中(data block-files,即小文件集中存儲於一個大文件中),其它較大的文件纔會存儲到本身專屬的文件中。最後,磁盤緩存的淘汰策略是維護一個LRU,通 過好比訪問頻率和資源的使用時間(age)的度量進行管理。

在Chrome開個頁籤,輸入chrome://net-internals/#httpCache。 若是你要看到實際的HTTP數據和緩存的響應處理,能夠打開chrome://cache, 裏面會列出全部緩存中可用的資源。打開每一項,還能夠看到詳細的數據頭等信息。

優化DNS預鏈接

前面已經屢次提到了DNS預解析,在深刻實現以前,先彙總一下DNS預解析的場景和理由:

  • 運行在渲染進程中的WebKit文檔解析器(document parser), 會爲當前頁面上全部的連接提供一個主機名(hostname)列表,Chrome能夠選擇是否提早解析。
  • 當用戶要打開頁面時,渲染進程先會觸發一個鼠標懸停(hover)或按鍵(button down)事件。
  • Omnibox可能會針對一個高可能性的建議頁面發起解析請求。
  • Chrome Predictor會基於過往瀏覽記錄和資源請求數據發起主機解析請求。(下面會詳細解釋。)
  • 頁面自己會顯式地要求Chrome對某些主機名稱進行預解析。

上述各項對於Chrome都只是一個線索。Chrome並不保證預解析必定會被執行,全部的線索會由Predictor進行評估,以決定後續的操做。最壞的狀況下,可能沒法及時解析主機名,用戶就必須等待一個 DNS解析時間,而後是TCP鏈接時間,最後是資源加載時間。Predictor會記下這個場景,在將來決策時相應地加以參考。總之,必定是越用越快。

以前提過到Chrome能夠 記住每一個頁面的拓撲(topology),並能夠基於這個信息進行加速。還記得吧,平均每一個頁面帶有88個資源,分別來自於30多個獨立的主機。每打開這 個頁面,Chrome會記下資源中比較經常使用的主機名,在後續的瀏覽過程當中,Chrome就會發起針對某些主機或者所有主機的DNS解析,甚至是TCP預鏈接!

使用chrome://dns 就能夠觀察到上面的數據(Google+頁面), 其中有6個子資源對應的主機名,並記錄了DNS預解析發生的次數,TCP預鏈接發生的次數,以及到每一個主機的請求次數。這些數據就可讓Chrome Predictor執行相應的優化決策。

除了內部事件通知外,頁面設計者能夠在頁面中嵌入以下的語句請求瀏覽器進行預解析:

<link rel="dns-prefetch" href="//host_name_to_prefetch.com"> 

之因此有這個需求,一個典型的例子是重定向(redirects). Chrome自己沒辦法判斷出這種模式,經過這種方式則可讓瀏覽器提早進行解析。

具體的實現也是因版本而有所差別,整體而言,Chrome中的DNS處理有兩個主要的實現:1.基於歷史數據(historically), 經過調用平臺無關的getaddrinfo()系統函數實現。2.代理操做系統的DNS處理方法,這種方法正在被Chrome自行實現的一套異步DNS解析機制(asynchronous DNS resolver)所取代。

依賴於系統的實現,代碼少而 且簡單,可是getaddrInfo()是一個阻塞式的系統調用,沒法有效地並行多個查詢操做。經驗數據還顯示,並行請求過多甚至會超出路由器的負額。 Chrome爲此設計了一個複雜的機制。對於其中帶有worker-pool的預解析, Chrome只是簡單的發送getaddrinfo() 調用, 同時阻塞worker thread直到收到響應數據。由於系統有DNS緩存的緣由,針對解析過的主機,解析操做會當即返回。 這個過程簡單,有效。

但還不夠! getaddrinfo()隱藏了太多有用的信息,好比Time-to-live(TTL)時間戳, DNS緩存的狀態等。因而Chrome決定本身實現一套跨平臺的異步DNS解析器。

這個新技術能夠支持如下優化:

  • 更好地控制重轉的時機,有能力並行執行多個查詢操做。 清晰地記錄TTLs。
  • 更好地處理IPv4和IPv6的兼容。
  • 基於RTT和其它事件,針對不一樣服務器進行錯誤處理(failover)

Chrome仍然進行着持續的優化. 經過chrome://histograms/DNS能夠觀察到DNS度量數據:

上面的柱狀圖展現了 DNS預解析延遲時間的分佈:好比將近50%(最右側)的查詢在20ms內完成。這些數據基於最近的瀏覽操做(採樣9869次),用戶能夠選擇是否報告這 些使用數據,而後這些數據會以匿名的形式交由工程團隊加以分析,這樣就能夠了解到功能的性能,以及將來如何進一步調整。周而復始,不斷優化。

使用預鏈接優化了TCP鏈接管理

已經預解析到了主機名,也有了 由OmniBox和Chrome Predictor提供信號,預示着用戶將來的操做。爲何再進一步鏈接到目標主機,在用戶真正發起請求前完成TCP握手呢?這樣就可省掉了另外一個往返的 延遲,輕易地就能爲用戶節省到上百毫秒。其實,這就是TCP預鏈接的工做。 經過訪問chrome://dns 就能夠看到TCP預鏈接的使用狀況。

首先, Chrome檢查它的socket pool裏有沒有目標主機能夠複用的socket, 這些sockets會在socket pool裏保留一段時間,以節省TCP握手時間及啓動延時(slow-start penalty)。若是沒有可用的socket, 就須要發起TCP握手,而後放到socket pool中。這樣當用戶發起請求時,就能夠用這個socket當即發起HTTP請求。

打開 chrome://net-internals#sockets 就能夠看到當前的sockets的狀態:

你能夠看到每個socket的時間軸:鏈接和代理的時間,每一個封包到達的時間,以及其它一些信息。你也能夠把這些數據導出,以方便進一步分析或者報告問題。有好的測試數據是優化的基礎, chrome://net-internals就是Chrome網絡的信息中心。

使用預加載優化資源加載

Chrome支持在頁面的HTML標籤中加入的兩個線索來優化資源加載:

<link rel="subresource" href="/javascript/myapp.js"> <link rel="prefetch" href="/images/big.jpeg"> 

在rel中指定的 subresource(子資源)和prefetch(預加載)很是類似。不一樣的是,若是一個link指定rel(relation)爲prefetch 後,就是告訴瀏覽器這個資源是稍後的頁面中要用到的。而指定爲subresource則表示在本頁中就會用到,指望能在使用前加載。二者不一樣的語義讓 resource loader有不一樣的行爲。prefetch的優先級較低,通常只會在頁面加載完成後纔會開始。而subresource則會在解析出來時就被嘗試優先執行。

還要注意,prefetch是HTML5的一部分,Firefox和Chrome均可以支持。但subresource還只能用在Chrome中。

應用Browser Prefreshing優化資源加載

不過,並非全部的Web開發者會願意加入上面所述的subresource relation, 就算加了,也要等收到主文檔並解析出這些內容才行,這段時間開銷依賴於服務器的響應時間和客戶端與服務器間的延遲時間,甚至要耗去上千毫秒。

和前面的預解析,預鏈接同樣,這裏還有一個prefreshing::

  • 用戶初始化一個目標頁面的請求。
  • Chrome查詢Predictor以前針對目標頁面的子資源加載,初始化一組DNS預解析,TCP預鏈接及資源prefreshing。
  • 如是在緩存中發現以前記錄的子資源,由從磁盤中加載到內存中。
  • 若是沒有或者已通過期了,就是發送網絡請求。

直到在2013年初, prefreshing仍是處於早期的討論階段。若是經過數據結果分析,這個功能最終上線了,咱們就能夠稍晚些時候使用到它了。

使用預渲染優化頁面瀏覽

前面討論的每一個優化都用來幫助減小用戶發起請求到看到頁面內容的延遲時間。多快才能帶來即時呈現的體驗呢?基於用戶體驗數據,這個時間是100毫秒,根本沒給網絡延遲留什麼空間。而在100毫秒內,又怎樣渲染頁面呢?

你們可能都有這樣的體驗: 同時開多個頁籤時會明顯快於在一個頁籤中等待。瀏覽器爲此提供了一個實現方式:

<link rel="prerender" href="http://example.org/index.html"> 

這就是Chrome的預渲染(prerendering in Chrome)! 相對於只下載一個資源的「prefetch」, 「prerender」會讓Chrome在一個不可見的頁籤中渲染一個頁面,包含了它全部的子資源。當用戶要瀏覽它時,這個頁籤被切到前臺,作到了即時的體驗。

能夠瀏覽prerender-test.appspot.com來體驗一下效果,再經過chrome://net-internals/#prerender查看下歷史記錄和預鏈接頁面的狀態。

由於預渲染會同時消耗CPU和網絡資源,因些必定要在確信預渲染頁面會被使用到狀況下才用。Google Search就在它的搜索結果里加入prerender, 由於第一個搜索結果極可能就是下一個頁面(也叫做Google Instant Pages)

你可使用預渲染特性,但如下限制項必定要牢記:

  • 全部的進程中最多隻能有一個預渲染頁。
  • HTTPS和帶有HTTP認證的頁面不能夠預渲染。
  • 若是請求資源須要發起非冪等(non-idempotent,idempotent request的意義爲發起屢次,不會帶來服務的負面響應的請求)的請求(只有GET請求)時,預渲染也不可用。
  • 全部的資源的優先級都很低。
  • 頁面渲染進程的使用最低的CPU優先級。
  • 若是須要超過100MB的內存,將沒法使用預渲染。
  • 不支持HTML5多媒體元素。

預渲染只能應用於確信安全的頁面。另外JavaScript也最好在運行時使用Page Visibility API來判斷一下當前頁是否可見(參考 you should be doing anyway) !

最後,總之,Chrome正逐步優化網絡延遲和用戶體驗,讓它隨着用戶的使用愈來愈快!

原譯文:[譯]Google Chrome中的高性能網絡

原文:High Performance Networking in Google Chrome

相關文章
相關標籤/搜索