[譯] 或許你並不須要 Rust 和 WASM 來提高 JS 的執行效率 — 第一部分

時間 2019-11-16

標籤或許並不須要 rust wasm 提高執行效率第一部分欄目 JavaScript 简体版

原文原文鏈接

原文地址：Maybe you don't need Rust and WASM to speed up your JS — Part 1

原文做者：Vyacheslav Egorov

譯文出自：掘金翻譯計劃

本文永久連接：github.com/xitu/gold-m…

譯者：Shery

校對者：geniusq1981

幾個星期前，我在 Twitter 上看到一篇名爲「Oxidizing Source Maps with Rust and WebAssembly」的推文，其內容主要是討論用 Rust 編寫的 WebAssembly 替換 source-map 庫中純 JavaScript 編寫的核心代碼所帶來的性能優點。html

這篇文章使我感興趣的緣由，並非由於我擅長 Rust 或 WASM，而是由於我老是對語言特性和純 JavaScript 中缺乏的性能優化感到好奇。前端

因而我從 GitHub 檢出了這個庫，而後逐字逐句的記錄了此次小型性能研究。android

獲取代碼

對於個人研究，當時使用的是近乎默認配置的 x64 V8 的發佈版本，V8 版本對應着 1 月 20 日的提交歷史 commit 69abb960c97606df99408e6869d66e014aa0fb51。爲了可以根據須要深刻到生成的機器碼，我經過 GN 標誌啓用了反彙編程序，這是我惟一偏離默認配置的地方。webpack

╭─ ~/src/v8/v8 ‹master›
╰─$ gn args out.gn/x64.release --list --short --overrides-only
is_debug = false
target_cpu = "x64"
use_goma = true
v8_enable_disassembler = true
複製代碼

而後我獲取了兩個版本的 source-map，版本信息以下：ios

commit c97d38b，在 Rust/WASM 實裝前最近一次更新 dist/source-map.js 的提交記錄；
commit 51cf770，當我進行此次調查時的最近一次提交記錄；

分析純 JavaScript 版本

在純 JavaScript 版本中進行基準測試很簡單:git

╭─ ~/src/source-map/bench ‹ c97d38b›
╰─$ d8 bench-shell-bindings.js
Parsing source map
console.timeEnd: iteration, 4655.638000
console.timeEnd: iteration, 4751.122000
console.timeEnd: iteration, 4820.566000
console.timeEnd: iteration, 4996.942000
console.timeEnd: iteration, 4644.619000
[Stats samples: 5, total: 23868 ms, mean: 4773.6 ms, stddev: 161.22112144505135 ms]
複製代碼

我作的第一件事是禁用基準測試的序列化部分:github

diff --git a/bench/bench-shell-bindings.js b/bench/bench-shell-bindings.js
index 811df40..c97d38b 100644
--- a/bench/bench-shell-bindings.js
+++ b/bench/bench-shell-bindings.js
@@ -19,5 +19,5 @@ load("./bench.js");
    print("Parsing source map");
    print(benchmarkParseSourceMap());
    print();
-print("Serializing source map");
-print(benchmarkSerializeSourceMap());
+// print("Serializing source map");
+// print(benchmarkSerializeSourceMap());
複製代碼

而後把它放到 Linux 的 perf 性能分析工具中:web

╭─ ~/src/source-map/bench ‹perf-work›
╰─$ perf record -g d8 --perf-basic-prof bench-shell-bindings.js
Parsing source map
console.timeEnd: iteration, 4984.464000
^C[ perf record: Woken up 90 times to write data ]
[ perf record: Captured and wrote 24.659 MB perf.data (~1077375 samples) ]
複製代碼

請注意，我將 --perf-basic-prof 標誌傳遞給了 d8 二進制文件，它通知 V8 生成一個輔助映射文件 /tmp/perf-$pid.map。該文件容許 perf report 理解 JIT 生成的機器碼。shell

這是咱們切換到主執行線程後經過 perf report --no-children 得到的內容:編程

Overhead  Symbol
    17.02%  *doQuickSort ../dist/source-map.js:2752
    11.20%  Builtin:ArgumentsAdaptorTrampoline
    7.17%  *compareByOriginalPositions ../dist/source-map.js:1024
    4.49%  Builtin:CallFunction_ReceiverIsNullOrUndefined
    3.58%  *compareByGeneratedPositionsDeflated ../dist/source-map.js:1063
    2.73%  *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    2.11%  Builtin:StringEqual
    1.93%  *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    1.66%  *doQuickSort ../dist/source-map.js:2752
    1.25%  v8::internal::StringTable::LookupStringIfExists_NoAllocate
    1.22%  *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    1.21%  Builtin:StringCharAt
    1.16%  Builtin:Call_ReceiverIsNullOrUndefined
    1.14%  v8::internal::(anonymous namespace)::StringTableNoAllocateKey::IsMatch
    0.90%  Builtin:StringPrototypeSlice
    0.86%  Builtin:KeyedLoadIC_Megamorphic
    0.82%  v8::internal::(anonymous namespace)::MakeStringThin
    0.80%  v8::internal::(anonymous namespace)::CopyObjectToObjectElements
    0.76%  v8::internal::Scavenger::ScavengeObject
    0.72%  v8::internal::String::VisitFlat<v8::internal::IteratingStringHasher>
    0.68%  *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    0.64%  *doQuickSort ../dist/source-map.js:2752
    0.56%  v8::internal::IncrementalMarking::RecordWriteSlow
複製代碼

事實上, 就像「Oxidizing Source Maps …」那篇博文說的那樣，基準測試至關側重於排序上：doQuickSort 出如今配置文件的頂部，而且在列表中還屢次出現（這意味着它已被優化/去優化了幾回）。

優化排序 — 參數適配

在性能分析器中出現了一些可疑內容，分別是 Builtin:ArgumentsAdaptorTrampoline 和 Builtin:CallFunction_ReceiverIsNullOrUndefined，它們彷佛是V8實現的一部分。若是咱們讓 perf report 追加與它們關聯的調用鏈信息，那麼咱們會注意到這些函數大多也是從排序代碼中調用的：

- Builtin:ArgumentsAdaptorTrampoline
    + 96.87% *doQuickSort ../dist/source-map.js:2752
    +  1.22% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    +  0.68% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    +  0.68% Builtin:InterpreterEntryTrampoline
    +  0.55% *doQuickSort ../dist/source-map.js:2752

- Builtin:CallFunction_ReceiverIsNullOrUndefined
    + 93.88% *doQuickSort ../dist/source-map.js:2752
    +  2.24% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    +  2.01% Builtin:InterpreterEntryTrampoline
    +  1.49% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
複製代碼

如今是查看代碼的時候了。快速排序實現自己位於 lib/quick-sort.js 中，並經過解析 lib/source-map-consumer.js 中的代碼進行調用。用於排序的比較函數是 compareByGeneratedPositionsDeflated 和 compareByOriginalPositions。

經過查看這些比較函數是如何定義，以及如何在快速排序中調用，能夠發現調用時的參數數量不匹配：

function compareByOriginalPositions(mappingA, mappingB, onlyCompareOriginal) {
    // ...
}

function compareByGeneratedPositionsDeflated(mappingA, mappingB, onlyCompareGenerated) {
    // ...
}

function doQuickSort(ary, comparator, p, r) {
    // ...
        if (comparator(ary[j], pivot) <= 0) {
        // ...
        }
    // ...
}
複製代碼

經過梳理源代碼發現除了測試以外，quickSort 只被這兩個函數調用過。

若是咱們修復調用參數數量問題會怎麼樣？

diff --git a/dist/source-map.js b/dist/source-map.js
index ade5bb2..2d39b28 100644
--- a/dist/source-map.js
+++ b/dist/source-map.js
@@ -2779,7 +2779,7 @@ return /******/ (function(modules) { // webpackBootstrap
            //
            //   * Every element in `ary[i+1 .. j-1]` is greater than the pivot.
            for (var j = p; j < r; j++) {
-             if (comparator(ary[j], pivot) <= 0) {
+             if (comparator(ary[j], pivot, false) <= 0) {
                i += 1;
                swap(ary, i, j);
                }
複製代碼

注意：由於我不想花時間搞清楚構建過程，因此我直接在 dist/source-map.js 中進行編輯。

╭─ ~/src/source-map/bench ‹perf-work› [Fix comparator invocation arity]
╰─$ d8 bench-shell-bindings.js
Parsing source map
console.timeEnd: iteration, 4037.084000
console.timeEnd: iteration, 4249.258000
console.timeEnd: iteration, 4241.165000
console.timeEnd: iteration, 3936.664000
console.timeEnd: iteration, 4131.844000
console.timeEnd: iteration, 4140.963000
[Stats samples: 6, total: 24737 ms, mean: 4122.833333333333 ms, stddev: 132.18789657150916 ms]
複製代碼

僅僅經過修正參數不匹配，咱們將 V8 的基準測試平均值從 4774 ms 提升到了 4123 ms，提高了 14% 的性能。若是咱們再次對基準測試進行性能分析，咱們會發現 ArgumentsAdaptorTrampoline 已經徹底消失。爲何最初它會出現呢？

事實證實，ArgumentsAdaptorTrampoline 是 V8 應對 JavaScript 可變參數調用約定的機制：您能夠在調用有 3 個參數的函數時只傳入 2 個參數 —— 在這種狀況下，第三個參數將被填充爲 undefined。V8 經過在堆棧上建立一個新的幀，接着向下複製參數，而後調用目標函數來完成此操做：

若是您從未據說過執行棧，請查看維基百科和 Franziska Hinkelmann 的博客文章。

儘管對於真實代碼這類開銷能夠忽略不計，但在這段代碼中，comparator 函數在基準測試運行期間被調用了數百萬次，這擴大了參數適配的開銷。

細心的讀者可能還會注意到，如今咱們明確地將之前使用隱式 undefined 的參數設置爲布爾值 false。這看起來對性能改進有必定貢獻。若是咱們用 void 0 替換 false，咱們會獲得稍微差一點的測試數據：

diff --git a/dist/source-map.js b/dist/source-map.js
index 2d39b28..243b2ef 100644
--- a/dist/source-map.js
+++ b/dist/source-map.js
@@ -2779,7 +2779,7 @@ return /******/ (function(modules) { // webpackBootstrap
            //
            //   * Every element in `ary[i+1 .. j-1]` is greater than the pivot.
            for (var j = p; j < r; j++) {
-             if (comparator(ary[j], pivot, false) <= 0) {
+             if (comparator(ary[j], pivot, void 0) <= 0) {
                i += 1;
                swap(ary, i, j);
                }
複製代碼

╭─ ~/src/source-map/bench ‹perf-work U› [Fix comparator invocation arity]
╰─$ ~/src/v8/v8/out.gn/x64.release/d8 bench-shell-bindings.js
Parsing source map
console.timeEnd: iteration, 4215.623000
console.timeEnd: iteration, 4247.643000
console.timeEnd: iteration, 4425.871000
console.timeEnd: iteration, 4167.691000
console.timeEnd: iteration, 4343.613000
console.timeEnd: iteration, 4209.427000
[Stats samples: 6, total: 25610 ms, mean: 4268.333333333333 ms, stddev: 106.38947316346669 ms]
複製代碼

對於參數適配開銷的爭論彷佛是高度針對 V8 的。當我在 SpiderMonkey 下對參數適配進行基準測試時，我看不到採用參數適配後有任何顯着的性能提高：

╭─ ~/src/source-map/bench ‹ d052ea4› [Disabled serialization part of the benchmark]
╰─$ sm bench-shell-bindings.js
Parsing source map
[Stats samples: 8, total: 24751 ms, mean: 3093.875 ms, stddev: 327.27966571700836 ms]
╭─ ~/src/source-map/bench ‹perf-work› [Fix comparator invocation arity]
╰─$ sm bench-shell-bindings.js
Parsing source map
[Stats samples: 8, total: 25397 ms, mean: 3174.625 ms, stddev: 360.4636187025859 ms]
複製代碼

多虧了 Mathias Bynens 的 jsvu 工具，SpiderMonkey shell 如今很是易於安裝。

讓咱們回到排序代碼。若是咱們再次分析基準測試，咱們會注意到 ArgumentsAdaptorTrampoline 從結果中消失了，但 CallFunction_ReceiverIsNullOrUndefined 仍然存在。這並不奇怪，由於咱們仍在調用 comparator 函數。

優化排序 — 單態（monomorphise）

怎樣比調用函數的性能更好呢？不調用它！

這裏明顯的選擇是嘗試將 comparator 內聯到 doQuickSort。然而事實上使用不一樣 comparator 函數調用 doQuickSort 阻礙了內聯。

要解決這個問題，咱們能夠嘗試經過克隆 doQuickSort 來實現單態（monomorphise）。下面是咱們如何作到的。

咱們首先使用 SortTemplate 函數將 doQuickSort 和其餘 helpers 包裝起來：

function SortTemplate(comparator) {
    function swap(ary, x, y) {
    // ...
    }

    function randomIntInRange(low, high) {
    // ...
    }

    function doQuickSort(ary, p, r) {
    // ...
    }

    return doQuickSort;
}
複製代碼

而後，咱們經過先將 SortTemplate 函數轉換爲一個字符串，再經過 Function 構造函數將它解析成函數，從而對咱們的排序函數進行克隆：

function cloneSort(comparator) {
    let template = SortTemplate.toString();
    let templateFn = new Function(`return ${template}`)();
    return templateFn(comparator);  // Invoke template to get doQuickSort
}
複製代碼

如今咱們可使用 cloneSort 爲咱們使用的每一個 comparator 生成一個排序函數：

let sortCache = new WeakMap();  // Cache for specialized sorts.
exports.quickSort = function (ary, comparator) {
    let doQuickSort = sortCache.get(comparator);
    if (doQuickSort === void 0) {
    doQuickSort = cloneSort(comparator);
    sortCache.set(comparator, doQuickSort);
    }
    doQuickSort(ary, 0, ary.length - 1);
};
複製代碼

從新運行基準測試生成的結果：

╭─ ~/src/source-map/bench ‹perf-work› [Clone sorting functions for each comparator]
╰─$ d8 bench-shell-bindings.js
Parsing source map
console.timeEnd: iteration, 2955.199000
console.timeEnd: iteration, 3084.979000
console.timeEnd: iteration, 3193.134000
console.timeEnd: iteration, 3480.459000
console.timeEnd: iteration, 3115.011000
console.timeEnd: iteration, 3216.344000
console.timeEnd: iteration, 3343.459000
console.timeEnd: iteration, 3036.211000
[Stats samples: 8, total: 25423 ms, mean: 3177.875 ms, stddev: 181.87633161024556 ms]
複製代碼

咱們能夠看到平均時間從 4268 ms 變爲 3177 ms（提升了 25%）。

分析器顯示瞭如下圖片：

Overhead Symbol
    14.95% *doQuickSort :44
    11.49% *doQuickSort :44
    3.29% Builtin:StringEqual
    3.13% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    1.86% v8::internal::StringTable::LookupStringIfExists_NoAllocate
    1.86% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    1.72% Builtin:StringCharAt
    1.67% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    1.61% v8::internal::Scavenger::ScavengeObject
    1.45% v8::internal::(anonymous namespace)::StringTableNoAllocateKey::IsMatch
    1.23% Builtin:StringPrototypeSlice
    1.17% v8::internal::(anonymous namespace)::MakeStringThin
    1.08% Builtin:KeyedLoadIC_Megamorphic
    1.05% v8::internal::(anonymous namespace)::CopyObjectToObjectElements
    0.99% v8::internal::String::VisitFlat<v8::internal::IteratingStringHasher>
    0.86% clear_page_c_e
    0.77% v8::internal::IncrementalMarking::RecordWriteSlow
    0.48% Builtin:MathRandom
    0.41% Builtin:RecordWrite
    0.39% Builtin:KeyedLoadIC
複製代碼

與調用 comparator 相關的開銷如今已從結果中徹底消失。

這個時候，我開始對咱們花了多少時間來解析映射和對它們進行排序產生了興趣。我進入到解析部分的代碼並添加了幾個 Date.now() 記錄耗時：

我想用 performance.now()，可是 SpiderMonkey shell 顯然不支持它。

diff --git a/dist/source-map.js b/dist/source-map.js
index 75ebbdf..7312058 100644
--- a/dist/source-map.js
+++ b/dist/source-map.js
@@ -1906,6 +1906,8 @@ return /******/ (function(modules) { // webpackBootstrap
            var generatedMappings = [];
            var mapping, str, segment, end, value;

+
+      var startParsing = Date.now();
            while (index < length) {
                if (aStr.charAt(index) === ';') {
                generatedLine++;
@@ -1986,12 +1988,20 @@ return /******/ (function(modules) { // webpackBootstrap
                }
                }
            }
+      var endParsing = Date.now();

+      var startSortGenerated = Date.now();
            quickSort(generatedMappings, util.compareByGeneratedPositionsDeflated);
            this.__generatedMappings = generatedMappings;
+      var endSortGenerated = Date.now();

+      var startSortOriginal = Date.now();
            quickSort(originalMappings, util.compareByOriginalPositions);
            this.__originalMappings = originalMappings;
+      var endSortOriginal = Date.now();
+
+      console.log(`${}, ${endSortGenerated - startSortGenerated}, ${endSortOriginal - startSortOriginal}`);
+      console.log(`sortGenerated: `);
+      console.log(`sortOriginal:  `);
            };
複製代碼

這是生成的結果：

╭─ ~/src/source-map/bench ‹perf-work U› [Clone sorting functions for each comparator]
╰─$ d8 bench-shell-bindings.js
Parsing source map
parse:         1911.846
sortGenerated: 619.5990000000002
sortOriginal:  905.8220000000001
parse:         1965.4820000000004
sortGenerated: 602.1939999999995
sortOriginal:  896.3589999999995
^C
複製代碼

如下是在 V8 和 SpiderMonkey 中每次迭代運行基準測試時解析映射和排序的耗時：

在 V8 中，咱們花費幾乎和排序差很少的時間來進行解析映射。在 SpiderMonkey 中，解析映射速度更快，反而是排序較慢。這促使我開始查看解析代碼。

優化解析 — 刪除分段緩存

讓咱們再看看這個性能分析結果

Overhead  Symbol
    18.23%  *doQuickSort :44
    12.36%  *doQuickSort :44
    3.84%  *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    3.07%  Builtin:StringEqual
    1.92%  v8::internal::StringTable::LookupStringIfExists_NoAllocate
    1.85%  *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    1.59%  *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
    1.54%  Builtin:StringCharAt
    1.52%  v8::internal::(anonymous namespace)::StringTableNoAllocateKey::IsMatch
    1.38%  v8::internal::Scavenger::ScavengeObject
    1.27%  Builtin:KeyedLoadIC_Megamorphic
    1.22%  Builtin:StringPrototypeSlice
    1.10%  v8::internal::(anonymous namespace)::MakeStringThin
    1.05%  v8::internal::(anonymous namespace)::CopyObjectToObjectElements
    1.03%  v8::internal::String::VisitFlat<v8::internal::IteratingStringHasher>
    0.88%  clear_page_c_e
    0.51%  Builtin:MathRandom
    0.48%  Builtin:KeyedLoadIC
    0.46%  v8::internal::IteratingStringHasher::Hash
    0.41%  Builtin:RecordWrite
複製代碼

如下是在咱們刪除了已知曉的 JavaScript 代碼以後剩下的內容：

Overhead  Symbol
    3.07%  Builtin:StringEqual
    1.92%  v8::internal::StringTable::LookupStringIfExists_NoAllocate
    1.54%  Builtin:StringCharAt
    1.52%  v8::internal::(anonymous namespace)::StringTableNoAllocateKey::IsMatch
    1.38%  v8::internal::Scavenger::ScavengeObject
    1.27%  Builtin:KeyedLoadIC_Megamorphic
    1.22%  Builtin:StringPrototypeSlice
    1.10%  v8::internal::(anonymous namespace)::MakeStringThin
    1.05%  v8::internal::(anonymous namespace)::CopyObjectToObjectElements
    1.03%  v8::internal::String::VisitFlat<v8::internal::IteratingStringHasher>
    0.88%  clear_page_c_e
    0.51%  Builtin:MathRandom
    0.48%  Builtin:KeyedLoadIC
    0.46%  v8::internal::IteratingStringHasher::Hash
    0.41%  Builtin:RecordWrite
複製代碼

當我開始查看單個條目的調用鏈時，我發現其中不少都經過 KeyedLoadIC_Megamorphic 傳入 SourceMapConsumer_parseMappings。

-    1.92% v8::internal::StringTable::LookupStringIfExists_NoAllocate
    - v8::internal::StringTable::LookupStringIfExists_NoAllocate
        + 99.80% Builtin:KeyedLoadIC_Megamorphic

-    1.52% v8::internal::(anonymous namespace)::StringTableNoAllocateKey::IsMatch
    - v8::internal::(anonymous namespace)::StringTableNoAllocateKey::IsMatch
        - 98.32% v8::internal::StringTable::LookupStringIfExists_NoAllocate
            + Builtin:KeyedLoadIC_Megamorphic
        + 1.68% Builtin:KeyedLoadIC_Megamorphic

-    1.27% Builtin:KeyedLoadIC_Megamorphic
    - Builtin:KeyedLoadIC_Megamorphic
        + 57.65% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
        + 22.62% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
        + 15.91% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894
        + 2.46% Builtin:InterpreterEntryTrampoline
        + 0.61% BytecodeHandler:Mul
        + 0.57% *doQuickSort :44

-    1.10% v8::internal::(anonymous namespace)::MakeStringThin
    - v8::internal::(anonymous namespace)::MakeStringThin
        - 94.72% v8::internal::StringTable::LookupStringIfExists_NoAllocate
            + Builtin:KeyedLoadIC_Megamorphic
        + 3.63% Builtin:KeyedLoadIC_Megamorphic
        + 1.66% v8::internal::StringTable::LookupString
複製代碼

這種調用堆棧向我代表，代碼正在執行不少 obj[key] 的鍵控查找，同時 key 是動態構建的字符串。當我查看解析代碼時，我發現了如下代碼：

// 因爲每一個偏移量都是相對於前一個偏移量進行編碼的，
// 所以許多分段一般具備相同的編碼。
// 從而咱們能夠經過緩存每一個分段解析後的可變長度字段，
// 若是咱們再次遇到相同的分段，
// 能夠再也不對他進行解析。
for (end = index; end < length; end++) {
    if (this._charIsMappingSeparator(aStr, end)) {
    break;
    }
}
str = aStr.slice(index, end);

segment = cachedSegments[str];
if (segment) {
    index += str.length;
} else {
    segment = [];
    while (index < end) {
    base64VLQ.decode(aStr, index, temp);
    value = temp.value;
    index = temp.rest;
    segment.push(value);
    }

    // ...

    cachedSegments[str] = segment;
}
複製代碼

該代碼負責解碼 Base64 VLQ 編碼序列，例如，字符串 A 將被解碼爲 [0]，而且 UAAAA 被解碼爲 [10,0,0,0,0]。若是你想更好地理解編碼自己，我建議你查看這篇關於 source maps 內部實現細節的博客文章。

該代碼不是對每一個序列進行獨立解碼，而是試圖緩存已解碼的分段：它向前掃描直到找到分隔符 (, or ;)，而後從當前位置提取子字符串到分隔符，並經過在緩存中查找提取的子字符串來檢查咱們是否有先前解碼過的這種分段——若是咱們命中緩存，則返回緩存的分段，不然咱們進行解析，並將分段緩存到緩存中。

緩存（又名記憶化）是一種很是強大的優化技——然而，它只有在維護緩存自己，以及查找緩存結果比再次執行計算這個過程開銷小時纔有意義。

抽象分析

讓咱們嘗試抽象地比較這兩個操做。

一種是直接解析：

解析分段只查看一個分段的每一個字符。對於每一個字符，它執行少許比較和算術運算，將 base64 字符轉換爲它所表示的整數值。而後它執行幾個按位操做來將此整數值併入較大的整數值。而後它將解碼值存儲到一個數組中並移動到該段的下一部分。分段不得多於 5 個。

另外一種是緩存：

爲了查找緩存的值，咱們遍歷該段的全部字符以找到其結尾；
咱們提取子字符串，這須要分配資源和可能的複製，具體取決於 JS VM 中字符串的實現方式；
咱們使用這個字符串做爲 Dictionary 對象中的鍵名，其中：
1. 首先須要 VM 爲該字符串計算散列值（再次遍歷它並對單個字符執行各類按位操做），這可能還須要 VM 將字符串內部化（取決於實現方式）；
2. 那麼 VM 必須執行散列表查找，這須要經過值與其餘鍵進行探測和比較（這可能須要再次查看字符串中的單個字符）；

總的來看，直接解析應該更快，假設 JS VM 在獨立運算/按位操做方面作得很好，僅僅是由於它只查看每一個單獨的字符一次，而緩存須要遍歷該分段 2-4 次，以肯定咱們是否命中緩存。

性能分析彷佛也證明了這一點：KeyedLoadIC_Megamorphic 是 V8 用於實現上面代碼中相似 cachedSegments[str] 等鍵控查找的存根。

基於這些觀察，我着手作了幾個實驗。首先，我檢查瞭解析結尾有多大的 cachedSegments 緩存。它越小緩存效率越高。

結果發現它變得至關大：

Object.keys(cachedSegments).length = 155478
複製代碼

獨立微型基準測試（Microbenchmarks）

如今我決定寫一個小的獨立基準測試：

// 用 [n] 個分段生成一個字符串，分段在長度爲 [v] 的循環中重複，
// 例如，分段數爲 0，v，2 * v，... 都相等，
// 所以是 1, 1 + v, 1 + 2 * v, ...
// 使用 [base] 做爲分段中的基本值 —— 這個參數容許分段很長。
//
// 注意：[v] 越大，[cachedSegments] 緩存越大。
function makeString(n, v, base) {
    var arr = [];
    for (var i = 0; i < n; i++) {
    arr.push([0, base + (i % v), 0, 0].map(base64VLQ.encode).join(''));
    }
    return arr.join(';') + ';';
}

// 對字符串 [str] 運行函數 [f]。
function bench(f, str) {
    for (var i = 0; i < 1000; i++) {
    f(str);
    }
}

// 衡量並報告 [f] 對 [str] 的表現。
// 它有 [v] 個不一樣的分段。
function measure(v, str, f) {
    var start = Date.now();
    bench(f, str);
    var end = Date.now();
    report(`${v}, ${f.name}, ${(end - start).toFixed(2)}`);
}

async function measureAll() {
    for (let v = 1; v <= 256; v *= 2) {
    // 製做一個包含 1000 個分段的字符串和 [v] 個不一樣的字符串
    // 所以 [cachedSegments] 具備 [v] 個緩存分段。
    let str = makeString(1000, v, 1024 * 1024);

    let arr = encoder.encode(str);

    // 針對每種解碼方式運行 10 次迭代。
    for (var j = 0; j < 10; j++) {
        measure(j, i, str, decodeCached);
        measure(j, i, str, decodeNoCaching);
        measure(j, i, str, decodeNoCachingNoStrings);
        measure(j, i, arr, decodeNoCachingNoStringsPreEncoded);
        await nextTick();
    }
    }
}

function nextTick() { return new Promise((resolve) => setTimeout(resolve)); }
複製代碼