jQuery 2.0.3 源碼分析Sizzle引擎 - 解析原理

時間 2019-11-14

標籤 jquery 2.0.3 源碼分析 sizzle 引擎解析原理欄目 JQuery 简体版

原文原文鏈接

聲明：本文爲原創文章，如需轉載，請註明來源並保留原文連接Aaron，謝謝！node

先來回答博友的提問：jquery

如何解析瀏覽器

div > p + div.aaron input[type="checkbox"]

順便在深刻理解下解析的原理：app

HTML結構ide

<div id="text">
  <p>
     <input type="text" />
  </p>
  <div class="aaron">
     <input type="checkbox" name="readme" value="Submit" />
     <p>Sizzle</p>
  </div>
</div>

選擇器語句函數

div > p + div.aaron input[type="checkbox"]

組合後的意思大概就是：優化

1. 選擇父元素爲 <div> 元素的全部子元素 <p> 元素spa

2. 選擇緊接在 <p> 元素以後的全部 <div> 而且class="aaron " 的全部元素code

3. 以後選擇 div.aaron 元素內部的全部 input而且帶有 type="checkbox" 的元素對象

就針對這個簡單的結構，咱們實際中是不可能這麼寫的，可是這裏我用簡單的結構，描述出複雜的處理

咱們用組合語句，jquery中,在高級瀏覽器上都是用過querySelectorAll處理的,因此咱們討論的都是在低版本上的實現，僞類選擇器,XML 要放到後最後，本文暫不涉及這方便的處理.

須要用到的幾個知識點:

1: CSS選擇器的位置關係

2: CSS的瀏覽器實現的基本接口

3: CSS選擇器從右到左掃描匹配

CSS選擇器的位置關係

文檔中的全部節點之間都存在這樣或者那樣的關係

其實不難發現，一個節點跟另外一個節點有如下幾種關係：

祖宗和後代

父親和兒子

臨近兄弟

普通兄弟

在CSS選擇器裏邊分別是用：空格；>；+；~

（其實還有一種關係：div.aaron，中間沒有空格表示了選取一個class爲aaron的div節點）

<div id="grandfather">
  <div id="father">
    <div id="child1"></div>
    <div id="child2"></div>
    <div id="child3"></div>
  </div>
</div>

爺爺grandfather與孫子child1屬於祖宗與後代關係（空格表達）
父親father與兒子child1屬於父子關係，也算是祖先與後代關係（>表達）
哥哥child1與弟弟child2屬於臨近兄弟關係（+表達）
哥哥child1與弟弟child2,弟弟child3都屬於普通兄弟關係（~表達）

在Sizzle裏有一個對象是記錄跟選擇器相關的屬性以及操做：Expr。它有如下屬性：

relative = {
  ">": { dir: "parentNode", first: true },
  " ": { dir: "parentNode" },
  "+": { dir: "previousSibling", first: true },
  "~": { dir: "previousSibling" }
}

因此在Expr.relative裏邊定義了一個first屬性，用來標識兩個節點的「緊密」程度，例如父子關係和臨近兄弟關係就是緊密的。在建立位置匹配器時，會根據first屬性來匹配合適的節點。

CSS的瀏覽器實現的基本接口

除去querySelector,querySelectorAll

HTML文檔一共有這麼四個API：

getElementById，上下文只能是HTML文檔。
getElementsByName，上下文只能是HTML文檔。
getElementsByTagName，上下文能夠是HTML文檔，XML文檔及元素節點。
getElementsByClassName，上下文能夠是HTML文檔及元素節點。IE8尚未支持。

因此要兼容的話sizzle最終只會有三種徹底靠譜的可用

Expr.find = {
      'ID'    : context.getElementById,
      'CLASS' : context.getElementsByClassName,
      'TAG'   : context.getElementsByTagName
}

CSS選擇器從右到左掃描匹配

接下咱們就開始分析解析規則了

1. 選擇器語句

div > p + div.aaron input[type="checkbox"]

2. 開始經過詞法分析器tokenize分解對應的規則（這個上一章具體分析過了）

分解每個小塊
type: "TAG"
value: "div" 
matches ....

type: ">"
value: " > "

type: "TAG"
value: "p"
matches ....

type: "+"
value: " + "

type: "TAG"
value: "div"
matches ....

type: "CLASS"
value: ".aaron"
matches ....

type: " "
value: " "

type: "TAG"
value: "input"
matches ....

type: "ATTR"
value: "[type="checkbox"]"
matches ....

除去關係選擇器，其他的有語意的標籤都都對應這分析出matches

好比
最後一個屬性選擇器分支
"[type="checkbox"]"

matches = [
   0: "type"
   1: "="
   2: "checkbox"
]
type: "ATTR" 
value: "[type="checkbox"]"

因此就分解出了9個部分了

那麼如何匹配纔是最有效的方式？

3. 從右往左匹配

最終仍是經過瀏覽器提供的API實現的，因此Expr.find就是最終的實現接口了

首先肯定的確定是從右邊往左邊匹配，可是右邊第一個是

"[type="checkbox"]"

很明顯Expr.find 中不認識這種選擇器，因此只能在往前扒一個

趴到了

type: "TAG"
value: "input"

這種標籤Expr.find能匹配到了，因此直接調用

Expr.find["TAG"] = support.getElementsByTagName ?
    function(tag, context) {
        if (typeof context.getElementsByTagName !== strundefined) {
            return context.getElementsByTagName(tag);
        }
} :

可是getElementsByTagName方法返回的是一個合集

因此

這裏引入了seed - 種子合集（搜索器搜到符合條件的標籤），放入到這個初始集合seed中

OK了這裏暫停了，不在往下匹配了，在用這樣的方式往下匹配效率就慢了

開始整理：

重組一下選擇器，剔掉已經在用於處理的tag標籤,input

因此選擇器變成了：

selector: "div > p + div.aaron [type="checkbox"]"

這裏能夠優化下，若是直接剔除後，爲空了，就證實知足了匹配要求，直接返回結果了

到這一步爲止

咱們可以使用的東東：

1 seed合集

2 經過tokenize分析解析規則組成match合集

原本是9個規則快，由於匹配input，因此要對應的也要踢掉一個因此就是8個了

3 選擇器語句,對應的踢掉了input

"div > p + div.aaron [type="checkbox"]"

此時send目標合集有2個最終元素了

那麼如何用最簡單，最有效率的方式從2個條件中找到目標呢？

涉及的源碼：

//引擎的主要入口函數
    function select(selector, context, results, seed) {
        var i, tokens, token, type, find,
            //解析出詞法格式
            match = tokenize(selector);

        if (!seed) { //若是外界沒有指定初始集合seed了。
            // Try to minimize operations if there is only one group
            // 沒有多組的狀況下
            // 若是隻是單個選擇器的狀況，也便是沒有逗號的狀況：div, p，能夠特殊優化一下
            if (match.length === 1) {

                // Take a shortcut and set the context if the root selector is an ID
                tokens = match[0] = match[0].slice(0); //取出選擇器Token序列

                //若是第一個是selector是id咱們能夠設置context快速查找
                if (tokens.length > 2 && (token = tokens[0]).type === "ID" &&
                    support.getById && context.nodeType === 9 && documentIsHTML &&
                    Expr.relative[tokens[1].type]) {

                    context = (Expr.find["ID"](token.matches[0].replace(runescape, funescape), context) || [])[0];
                    if (!context) {
                        //若是context這個元素（selector第一個id選擇器）都不存在就不用查找了
                        return results;
                    }
                    //去掉第一個id選擇器
                    selector = selector.slice(tokens.shift().value.length);
                }

                // Fetch a seed set for right-to-left matching
                //其中： "needsContext"= new RegExp( "^" + whitespace + "*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\(" + whitespace + "*((?:-\\d)?\\d*)" + whitespace + "*\\)|)(?=[^-]|$)", "i" )
                //便是表示若是沒有一些結構僞類，這些是須要用另外一種方式過濾，在以後文章再詳細剖析。
                //那麼就從最後一條規則開始，先找出seed集合
                i = matchExpr["needsContext"].test(selector) ? 0 : tokens.length;

                //從右向左邊查詢
                while (i--) { //從後開始向前找！
                    token = tokens[i]; //找到後邊的規則

                    // Abort if we hit a combinator
                    // 若是遇到了關係選擇器停止
                    //
                    //  > + ~ 空
                    //
                    if (Expr.relative[(type = token.type)]) {
                        break;
                    }

                    /*
                  先看看有沒有搜索器find，搜索器就是瀏覽器一些原生的取DOM接口，簡單的表述就是如下對象了
                  Expr.find = {
                    'ID'    : context.getElementById,
                    'CLASS' : context.getElementsByClassName,
                    'NAME'  : context.getElementsByName,
                    'TAG'   : context.getElementsByTagName
                  }
                */
                    //若是是:first-child這類僞類就沒有對應的搜索器了，此時會向前提取前一條規則token
                    if ((find = Expr.find[type])) {

                        // Search, expanding context for leading sibling combinators
                        // 嘗試一下可否經過這個搜索器搜到符合條件的初始集合seed
                        if ((seed = find(
                            token.matches[0].replace(runescape, funescape),
                            rsibling.test(tokens[0].type) && context.parentNode || context
                        ))) {

                            //若是真的搜到了
                            // If seed is empty or no tokens remain, we can return early
                            //把最後一條規則去除掉
                            tokens.splice(i, 1);
                            selector = seed.length && toSelector(tokens);

                            //看看當前剩餘的選擇器是否爲空
                            if (!selector) {
                                //是的話，提早返回結果了。
                                push.apply(results, seed);
                                return results;
                            }

                            //已經找到了符合條件的seed集合，此時前邊還有其餘規則，跳出去
                            break;
                        }
                    }
                }
            }
        }


        // "div > p + div.aaron [type="checkbox"]"

        // Compile and execute a filtering function
        // Provide `match` to avoid retokenization if we modified the selector above
        // 交由compile來生成一個稱爲終極匹配器
        // 經過這個匹配器過濾seed，把符合條件的結果放到results裏邊
        //
        //    //生成編譯函數
        //  var superMatcher =   compile( selector, match )
        //
        //  //執行
        //    superMatcher(seed,context,!documentIsHTML,results,rsibling.test( selector ))
        //
        compile(selector, match)(
            seed,
            context, !documentIsHTML,
            results,
            rsibling.test(selector)
        );
        return results;
    }

這個過程在簡單總結一下：

selector："div > p + div.aaron input[type="checkbox"]"

解析規則：
1 按照從右到左
2 取出最後一個token  好比[type="checkbox"]
                            {
                                matches : Array[3]
                                type    : "ATTR"
                                value   : "[type="
                                checkbox "]"
                            }
3 過濾類型 若是type是 > + ~ 空 四種關係選擇器中的一種，則跳過，在繼續過濾
4 直到匹配到爲 ID,CLASS,TAG  中一種 , 由於這樣才能經過瀏覽器的接口索取
5 此時seed種子合集中就有值了,這樣把刷選的條件給縮的很小了
6 若是匹配的seed的合集有多個就須要進一步的過濾了,修正選擇器 selector: "div > p + div.aaron [type="checkbox"]"
7 OK,跳到一下階段的編譯函數

Sizzle不只僅是簡簡單單的從右往左匹配的

Sizzle1.8開始引入編譯函數的概念，也是下一章的重點