Vue.js 源碼學習八 —— HTML解析細節學習

時間 2019-11-09

標籤 vue.js vue 源碼學習 html 解析細節欄目 JavaScript 简体版

原文原文鏈接

從上一篇博客中，咱們知道了template編譯的總體邏輯和template編譯後用在了哪裏。本文着重講下HTML的解析過程。html

parse 方法

全部解析的起點就在 parse 方法中，parse方法最終將返回爲一個 AST 語法樹元素。前端

// src/core/compiler/parser/index.js
export function parse (
  template: string,
  options: CompilerOptions
): ASTElement | void {
  warn = options.warn || baseWarn

  platformIsPreTag = options.isPreTag || no
  platformMustUseProp = options.mustUseProp || no
  platformGetTagNamespace = options.getTagNamespace || no

  transforms = pluckModuleFunction(options.modules, 'transformNode')
  preTransforms = pluckModuleFunction(options.modules, 'preTransformNode')
  postTransforms = pluckModuleFunction(options.modules, 'postTransformNode')

  delimiters = options.delimiters

  const stack = []
  const preserveWhitespace = options.preserveWhitespace !== false
  let root
  let currentParent
  let inVPre = false
  let inPre = false
  let warned = false

  function warnOnce(msg){...}
  function closeElement(element){...}
  parseHTML(...)

  return root
}

能夠看到，除了 parseHTML 方法外，其餘都是定義變量、方法的行爲。所以只需深刻看 parseHTML 行爲就好。
因而咱們在 src/core/compiler/parser/html-parser.js 文件中找到 parseHTML 方法。vue

parseHTML 中的幾個方法

在源碼中能夠看到，parseHTML 中有四個方法，咱們來一一解讀。node

advance

// 推動。向前推動n個字符
  function advance (n) {
    index += n
    html = html.substring(n)
  }

將index的值向後移動n位，而後從第n個字符開始截取 HTML 內容字符串。git

parseStartTag

// 解析開始標籤
  function parseStartTag () {
    const start = html.match(startTagOpen)
    if (start) {
      const match = {
        tagName: start[1],
        attrs: [],
        start: index
      }
      advance(start[0].length)
      let end, attr
      while (!(end = html.match(startTagClose)) && (attr = html.match(attribute))) {
        advance(attr[0].length)
        match.attrs.push(attr)
      }
      if (end) {
        match.unarySlash = end[1]
        advance(end[0].length)
        match.end = index
        return match
      }
    }
  }

該方法使用正則匹配獲取HTML開始標籤，而且將開始標籤中的屬性都保存到一個數組中。最終返回標籤結果：標籤名、標籤屬性和標籤起始結束位置。例如標籤爲 <button v-on:click="hey"> 返回結果以下：github

{
        "attrs": [
            [
                " v-on:click='hey'",
                "v-on:click",
                "=",
                "hey",
                "undefined",
                "undefined",
            ]
        ],
        "end": 48,
        "start": 23,
        "tagName": "button",
        "unarySlash": ""
    }

handleStartTag

// 處理開始標籤，將開始標籤中的屬性提取出來。
  function handleStartTag (match) {
    const tagName = match.tagName
    const unarySlash = match.unarySlash

    // 解析結束標籤
    if (expectHTML) {
      if (lastTag === 'p' && isNonPhrasingTag(tagName)) {
        parseEndTag(lastTag)
      }
      if (canBeLeftOpenTag(tagName) && lastTag === tagName) {
        parseEndTag(tagName)
      }
    }

    const unary = isUnaryTag(tagName) || !!unarySlash

    // 解析開始標籤的屬性名和屬性值
    const l = match.attrs.length
    const attrs = new Array(l)
    for (let i = 0; i < l; i++) {
      const args = match.attrs[i]
      // hackish work around FF bug https://bugzilla.mozilla.org/show_bug.cgi?id=369778
      if (IS_REGEX_CAPTURING_BROKEN && args[0].indexOf('""') === -1) {
        if (args[3] === '') { delete args[3] }
        if (args[4] === '') { delete args[4] }
        if (args[5] === '') { delete args[5] }
      }
      const value = args[3] || args[4] || args[5] || ''
      const shouldDecodeNewlines = tagName === 'a' && args[1] === 'href'
        ? options.shouldDecodeNewlinesForHref
        : options.shouldDecodeNewlines
      attrs[i] = {
        name: args[1],
        value: decodeAttr(value, shouldDecodeNewlines)
      }
    }

    // 將標籤及其屬性推如堆棧中
    if (!unary) {
      stack.push({ tag: tagName, lowerCasedTag: tagName.toLowerCase(), attrs: attrs })
      lastTag = tagName
    }
    // 觸發 options.start 方法。
    if (options.start) {
      options.start(tagName, attrs, unary, match.start, match.end)
    }
  }

該方法用於處理開始標籤。若是是能夠直接結束的標籤，直接解析結束標籤；而後遍歷查找屬性的屬性值 value 傳入數組；將開始標籤的標籤名、小寫標籤名、屬性值傳入堆棧中；將當前標籤變爲最後標籤；最後觸發 options.start 方法。
最後推入堆棧的數據以下正則表達式

{
        "tag": "button",
        "lowerCasedTag": "button",
        "attrs": [
            { 
                "name": "v-on:click",
                "value": "hey"
            }
        ]
    }

parseEndTag

// 解析結束TAG
  function parseEndTag (tagName, start, end) {
    let pos, lowerCasedTagName
    if (start == null) start = index
    if (end == null) end = index

    if (tagName) {
      lowerCasedTagName = tagName.toLowerCase()
    }

    // 找到同類的開始 TAG 在堆棧中的位置
    if (tagName) {
      for (pos = stack.length - 1; pos >= 0; pos--) {
        if (stack[pos].lowerCasedTag === lowerCasedTagName) {
          break
        }
      }
    } else {
      // If no tag name is provided, clean shop
      pos = 0
    }

    // 對堆棧中的大於等於 pos 的開始標籤使用 options.end 方法。
    if (pos >= 0) {
      // Close all the open elements, up the stack
      for (let i = stack.length - 1; i >= pos; i--) {
        if (process.env.NODE_ENV !== 'production' &&
          (i > pos || !tagName) &&
          options.warn
        ) {
          options.warn(
            `tag <${stack[i].tag}> has no matching end tag.`
          )
        }
        if (options.end) {
          options.end(stack[i].tag, start, end)
        }
      }

      // Remove the open elements from the stack
      // 從棧中移除元素，並標記爲 lastTag
      stack.length = pos
      lastTag = pos && stack[pos - 1].tag
    } else if (lowerCasedTagName === 'br') {
      // 回車標籤
      if (options.start) {
        options.start(tagName, [], true, start, end)
      }
    } else if (lowerCasedTagName === 'p') {
      // 段落標籤
      if (options.start) {
        options.start(tagName, [], false, start, end)
      }
      if (options.end) {
        options.end(tagName, start, end)
      }
    }
  }

解析結束標籤。先是獲取開始結束位置、小寫標籤名；而後遍歷堆棧找到同類開始 TAG 的位置；對找到的 TAG 位置後的全部標籤都執行 options.end 方法；將 pos 後的全部標籤從堆棧中移除，並修改最後標籤爲當前堆棧最後一個標籤的標籤名；若是是br標籤，執行 option.start 方法；若是是 p 標籤，執行 options.start 和options.end 方法。（最後兩個操做讓我猜測 start 和 end 方法用於標籤的開始和結束行爲中。）express

parseHTML 的總體邏輯

以前所說的 options.start 等方法，其實在 parseHTML 的傳參中傳入的 start、end、chars、comment 這四個方法，這些方法會在parseHTML 方法特定的地方被使用，而這些方法中的邏輯下一節再講。
這裏先來看看在 parseHTML 方法的總體邏輯：json

// src/core/compiler/parser/html-parser.js
export function parseHTML (html, options) {
  const stack = []
  const expectHTML = options.expectHTML
  const isUnaryTag = options.isUnaryTag || no
  const canBeLeftOpenTag = options.canBeLeftOpenTag || no
  let index = 0
  let last, lastTag
  while (html) {
    last = html
    // 若是沒有lastTag，並確保咱們不是在一個純文本內容元素中：script、style、textarea
    if (!lastTag || !isPlainTextElement(lastTag)) {
      // 文本結束，經過<查找。
      let textEnd = html.indexOf('<')
      // 文本結束位置在第一個字符，即第一個標籤爲<
      if (textEnd === 0) {
        // 註釋匹配
        if (comment.test(html)) {
          const commentEnd = html.indexOf('-->')

          if (commentEnd >= 0) {
            // 若是須要保留註釋，執行 option.comment 方法
            if (options.shouldKeepComment) {
              options.comment(html.substring(4, commentEnd))
            }
            advance(commentEnd + 3)
            continue
          }
        }

        // http://en.wikipedia.org/wiki/Conditional_comment#Downlevel-revealed_conditional_comment
        // 條件註釋
        if (conditionalComment.test(html)) {
          const conditionalEnd = html.indexOf(']>')

          if (conditionalEnd >= 0) {
            advance(conditionalEnd + 2)
            continue
          }
        }

        // Doctype:
        const doctypeMatch = html.match(doctype)
        if (doctypeMatch) {
          advance(doctypeMatch[0].length)
          continue
        }

        // End tag: 結束標籤
        const endTagMatch = html.match(endTag)
        if (endTagMatch) {
          const curIndex = index
          advance(endTagMatch[0].length)
          // 解析結束標籤
          parseEndTag(endTagMatch[1], curIndex, index)
          continue
        }

        // Start tag: 開始標籤
        const startTagMatch = parseStartTag()
        if (startTagMatch) {
          handleStartTag(startTagMatch)
          if (shouldIgnoreFirstNewline(lastTag, html)) {
            advance(1)
          }
          continue
        }
      }

      // < 標籤位置大於等於0，即標籤中有內容
      let text, rest, next
      if (textEnd >= 0) {
        // 截取從 0 - textEnd 的字符串
        rest = html.slice(textEnd)
        // 獲取在普通字符串中的<字符，而不是開始標籤、結束標籤、註釋、條件註釋
        while (
          !endTag.test(rest) &&
          !startTagOpen.test(rest) &&
          !comment.test(rest) &&
          !conditionalComment.test(rest)
        ) {
          // < in plain text, be forgiving and treat it as text
          next = rest.indexOf('<', 1)
          if (next < 0) break
          textEnd += next
          rest = html.slice(textEnd)
        }
        // 最終截取字符串內容
        text = html.substring(0, textEnd)
        advance(textEnd)
      }

      if (textEnd < 0) {
        text = html
        html = ''
      }
      // 繪製文本內容，使用 options.char 方法。
      if (options.chars && text) {
        options.chars(text)
      }
    } else {
      // 若是lastTag 爲 script、style、textarea
      let endTagLength = 0
      const stackedTag = lastTag.toLowerCase()
      const reStackedTag = reCache[stackedTag] || (reCache[stackedTag] = new RegExp('([\\s\\S]*?)(</' + stackedTag + '[^>]*>)', 'i'))
      const rest = html.replace(reStackedTag, function (all, text, endTag) {
        endTagLength = endTag.length
        if (!isPlainTextElement(stackedTag) && stackedTag !== 'noscript') {
          text = text
            .replace(/<!\--([\s\S]*?)-->/g, '$1') // <!--xxx--> 
            .replace(/<!\[CDATA\[([\s\S]*?)]]>/g, '$1') //<!CDATAxxx>
        }
        if (shouldIgnoreFirstNewline(stackedTag, text)) {
          text = text.slice(1)
        }
        // 處理文本內容，並使用 options.char 方法。
        if (options.chars) {
          options.chars(text)
        }
        return ''
      })
      index += html.length - rest.length
      html = rest
      // 解析結束tag
      parseEndTag(stackedTag, index - endTagLength, index)
    }

    // html文本到最後
    if (html === last) {
      // 執行 options.chars
      options.chars && options.chars(html)
      if (process.env.NODE_ENV !== 'production' && !stack.length && options.warn) {
        options.warn(`Mal-formatted tag at end of template: "${html}"`)
      }
      break
    }
  }

  // 清理全部殘留標籤
  parseEndTag()

  ...
}

具體的解析都寫在註釋裏面了。
其實就是利用正則循環處理 html 文本內容，最後使用 advance 方法來截取後一段 html 文本。在解析過程當中執行了 options 中的一些方法。
下面咱們來看看傳入的方法都作了些什麼？數組

parseHTML 傳參的幾個方法

warn

// src/core/compiler/parser/index.js
warn = options.warn || baseWarn

若是options中有 warn 方法，使用該方法。不然調用 baseWarn 方法。

start

start (tag, attrs, unary) {
      // 肯定命名空間
      const ns = (currentParent && currentParent.ns) || platformGetTagNamespace(tag)

      // 處理 IE 的 SVG bug
      if (isIE && ns === 'svg') {
        attrs = guardIESVGBug(attrs)
      }

      // 獲取AST元素
      let element: ASTElement = createASTElement(tag, attrs, currentParent)
      if (ns) {
        element.ns = ns
      }

      if (isForbiddenTag(element) && !isServerRendering()) {
        element.forbidden = true
      }

      // 遍歷執行 preTransforms 方法
      for (let i = 0; i < preTransforms.length; i++) {
        element = preTransforms[i](element, options) || element
      }

      // 處理各類方法
      if (!inVPre) {
        // v-pre
        processPre(element)
        if (element.pre) {
          inVPre = true
        }
      }
      if (platformIsPreTag(element.tag)) {
        inPre = true
      }
      if (inVPre) {
        // 處理原始屬性
        processRawAttrs(element)
      } else if (!element.processed) {
        // v-for v-if v-once
        processFor(element)
        processIf(element)
        processOnce(element)
        // 元素填充？
        processElement(element, options)
      }

      // 檢查根節點約束
      function checkRootConstraints (el) {
        if (process.env.NODE_ENV !== 'production') {
          if (el.tag === 'slot' || el.tag === 'template') {
            warnOnce(
              `Cannot use <${el.tag}> as component root element because it may ` +
              'contain multiple nodes.'
            )
          }
          if (el.attrsMap.hasOwnProperty('v-for')) {
            warnOnce(
              'Cannot use v-for on stateful component root element because ' +
              'it renders multiple elements.'
            )
          }
        }
      }

      // 語法樹樹管理
      if (!root) {
        // 無root
        root = element
        checkRootConstraints(root)
      } else if (!stack.length) {
        // 容許有 v-if, v-else-if 和 v-else 的根元素
        if (root.if && (element.elseif || element.else)) {
          checkRootConstraints(element)
          // 添加 if 條件
          addIfCondition(root, {
            exp: element.elseif,
            block: element
          })
        } else if (process.env.NODE_ENV !== 'production') {
          warnOnce(
            `Component template should contain exactly one root element. ` +
            `If you are using v-if on multiple elements, ` +
            `use v-else-if to chain them instead.`
          )
        }
      }
      if (currentParent && !element.forbidden) {
        // v-else-if v-else
        if (element.elseif || element.else) {
          // 處理 if 條件
          processIfConditions(element, currentParent)
        } else if (element.slotScope) { // slot-scope
          currentParent.plain = false
          const name = element.slotTarget || '"default"'
          ;(currentParent.scopedSlots || (currentParent.scopedSlots = {}))[name] = element
        } else {
          // 將元素插入 children 數組中
          currentParent.children.push(element)
          element.parent = currentParent
        }
      }
      if (!unary) {
        currentParent = element
        stack.push(element)
      } else {
        // 關閉元素
        closeElement(element)
      }
    },

其實start方法就是處理 element 元素的過程。肯定命名空間；建立AST元素 element；執行預處理；定義root；處理各種 v- 標籤的邏輯；最後更新 root、currentParent、stack 的結果。
其中關鍵點在於 createASTElement 方法。能夠看到該方法傳遞了 tag、attrs和currentParent。其中前兩個參數是否是很熟悉？就是咱們在 parseHTML 的 handleStartTag 方法中傳給堆棧數組中的數據對象。

{
        "tag": "button",
        "lowerCasedTag": "button",
        "attrs": [
            { 
                "name": "v-on:click",
                "value": "hey"
            }
        ]
    }

最終經過 createASTElement 方法定義了一個新的 AST 對象。

// 建立AST元素
export function createASTElement (
  tag: string,
  attrs: Array<Attr>,
  parent: ASTElement | void
): ASTElement {
  return {
    type: 1,
    tag,
    attrsList: attrs,
    attrsMap: makeAttrsMap(attrs),
    parent,
    children: []
  }
}

end

end () {
      // 刪除尾隨空格
      const element = stack[stack.length - 1]
      const lastNode = element.children[element.children.length - 1]
      if (lastNode && lastNode.type === 3 && lastNode.text === ' ' && !inPre) {
        element.children.pop()
      }
      // 退棧
      stack.length -= 1
      currentParent = stack[stack.length - 1]
      // 關閉元素
      closeElement(element)
    },

end方法就很簡單了，就是一個清理結束的過程。
從這裏能夠看到，stack中存的是個有序的數組，數組最後一個值永遠是父級元素；currentParent表示當前的父級元素。其實也很好理解，收集HTML元素的時候是從最外層元素向內收集的，處理HTML內容的時候是從最內部元素向外處理的。因此，當最內部元素處理完後，將元素從對線中移除，開始處理當前最內部的元素。

chars

chars (text: string) {
      if (!currentParent) {
        return
      }
      // IE textarea placeholder bug
      if (isIE &&
        currentParent.tag === 'textarea' &&
        currentParent.attrsMap.placeholder === text
      ) {
        return
      }
      // 獲取元素 children
      const children = currentParent.children
      // 獲取文本內容
      text = inPre || text.trim()
        ? isTextTag(currentParent) ? text : decodeHTMLCached(text)
        // only preserve whitespace if its not right after a starting tag
        : preserveWhitespace && children.length ? ' ' : ''
      if (text) {
        let res
        // inVPre 是判斷 v-pre 的
        if (!inVPre && text !== ' ' && (res = parseText(text, delimiters))) {
          // 表達式，會轉爲 _s(message) 表達式
          children.push({
            type: 2,
            expression: res.expression,
            tokens: res.tokens,
            text
          })
        } else if (text !== ' ' || !children.length || children[children.length - 1].text !== ' ') {
          // 純文本內容
          children.push({
            type: 3,
            text
          })
        }
      }
    },

chars方法用來處理非HTML標籤的文本。若是是表達式，經過 parseText 方法解析文本內容並傳遞給當前元素的 children；若是是普通文本直接傳遞給當前元素的 children。

comment

comment (text: string) {
      currentParent.children.push({
        type: 3,
        text,
        isComment: true
      })
    }

comment方法用來保存須要保存在語法樹中的註釋。它與保存普通文本相似，只是多了 isComment: true。

生成語法樹

我這裏寫了個demo，而且抓取了AST元素最後生成結果。

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Hey</title>
    <script src="vue.js"></script>
</head>
<body>
    <div id="app">
        <!-- this is vue parse demo -->
        <button v-on:click="hey">{{ message }}</button>
        <span>你好！</span>
    </div>

    <script>
        new Vue({
            el: "#app",
            data: {
                message: "Hey Vue.js"
            },
            methods: {
                hey() {
                    this.message = "Hey Button"
                }
            }
        })
    </script>
</body>
</html>

結果以下：