Vue源碼解析之parse

時間 2019-12-05

標籤 vue 源碼解析 parse 简体版

原文原文鏈接

前言

看vue源碼已經有一段時間了，可是不少東西都是看的時候當時記住了，過段時間不去使用或者回顧又忘記了，因此如今開始以博客的方式寫一下本身的理解，也方便後續本身的回顧。html

這一篇主要講的是Vue源碼解析parse的相關內容，主要分紅三大塊vue

從編譯入口到parse函數（封裝思想，柯里化）
parse中的詞法分析（把template模板解析成js對象）
parse中的語法分析（處理解析出的js對象生成ast）

從編譯入口到parse函數

內容待添加。。

parse中的詞法分析

parser簡介

首先讓咱們簡單的瞭解一下parser，簡單來講就是把源代碼轉換爲目標代碼的工具。引用下基維百科對於parser的解釋。node

語法分析器（parser）一般是做爲編譯器或解釋器的組件出現的，它的做用是進行語法檢查、並構建由輸入的單詞組成的數據結構（通常是語法分析樹、抽象語法樹等層次化的數據結構）。語法分析器一般使用一個獨立的詞法分析器從輸入字符流中分離出一個個的「單詞」，並將單詞流做爲其輸入。express

vue其實也是使用解析器來對模板代碼進行解析。數組

/*!
 * HTML Parser By John Resig (ejohn.org)
 * Modified by Juriy "kangax" Zaytsev
 * Original code by Erik Arvidsson, Mozilla Public License
 * http://erik.eae.net/simplehtmlparser/simplehtmlparser.js
 */
複製代碼

它的源代碼上有這樣一段註釋，是Vue fork 自 John Resig 所寫的一個開源項目：erik.eae.net/simplehtmlp… 而後在這個解析器上面作了一些擴展。bash

流程總覽

大概瞭解瞭解析器後讓咱們來看看parse的總體流程，這邊經過精簡後的代碼能夠看到parse實際上就是先處理了一些傳入的options，而後執行了parseHTML函數，傳入了template，options和相關鉤子。數據結構

export function parse (
  template: string,
  options: CompilerOptions
): ASTElement | void {
   // 處理傳入的options合併的實例vm的options上
  dealOptions(options)
  // 模板和相關的配置和鉤子傳入到parseHTML中
  parseHTML(template, {
    someOptions,
    start (tag, attrs, unary, start) {...},
    end (tag, start, end) {...},
    chars (text: string, start: number, end: number) {...},
    comment (text: string, start, end) {}...,
 })
}
複製代碼

這邊咱們繼續看parseHTML函數裏面作了什麼？app

parseHTML是定義在src/compiler/parser/html-parser.js這個文件中，首先文件頭部是定義了一些後續須要使用的正則，不太懂正則的能夠先看看正則先關的知識。dom

// Regular Expressions for parsing tags and attributes
// 匹配標籤的屬性
const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
const ncname = `[a-zA-Z_][\\-\\.0-9_a-zA-Z${unicodeLetters}]*`
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const startTagOpen = new RegExp(`^<${qnameCapture}`)
const startTagClose = /^\s*(\/?)>/
const endTag = new RegExp(`^<\\/${qnameCapture}[^>]*>`)
const doctype = /^<!DOCTYPE [^>]+>/i
// #7298: escape - to avoid being pased as HTML comment when inlined in page
const comment = /^<!\--/
const conditionalComment = /^<!\[/
複製代碼

而後先看看parseHTML的僞代碼，大概流程就是先定義須要的變量，而後循環遍歷template，經過正則匹配到對應的標籤後，經過進行處理經過傳入的鉤子函數把處理後的對象轉換成ast。ide

export function parseHTML (html, options) {
  const stack = []
  const expectHTML = options.expectHTML
  const isUnaryTag = options.isUnaryTag || no
  const canBeLeftOpenTag = options.canBeLeftOpenTag || no
  let index = 0
  let last, lastTag
  while (html) {
    if (!lastTag || !isPlainTextElement(lastTag)){
      let textEnd = html.indexOf('<')
      if (textEnd === 0) {
         if(matchComment) {
           advance(commentLength)
           continue
         }
         if(matchDoctype) {
           advance(doctypeLength)
           continue
         }
         if(matchEndTag) {
           advance(endTagLength)
           parseEndTag()
           continue
         }
         if(matchStartTag) {
           parseStartTag()
           handleStartTag()
           continue
         }
      }
      handleText()
      advance(textLength)
    } else {
       handlePlainTextElement()
       parseEndTag()
    }
  }
}
複製代碼

輔助函數分析

讓咱們分別看看parseHTML中的四個輔助函數他們的實現。

// 爲計數index加上n，同時，使html到n個字符之後到位置做爲起始位
function advance (n) {
    index += n
    html = html.substring(n)
}
複製代碼

// 解析開始標籤
function parseStartTag () {
  //正則匹配獲取HTML開始標籤
  const start = html.match(startTagOpen)
  if (start) {
    const match = {
      tagName: start[1],
      attrs: [],
      start: index
    }
    advance(start[0].length)
    let end, attr
    // 開始標籤中的屬性都保存到一個數組中
    while (!(end = html.match(startTagClose)) && (attr = html.match(attribute))) {
      advance(attr[0].length)
      match.attrs.push(attr)
    }
    if (end) {
      match.unarySlash = end[1]
      advance(end[0].length)
      match.end = index
      return match
    }
  }
}
複製代碼

// 處理開始標籤，將開始標籤中的屬性提取出來。
function handleStartTag (match) {
  const tagName = match.tagName
  const unarySlash = match.unarySlash

  // 解析結束標籤
  if (expectHTML) {
    if (lastTag === 'p' && isNonPhrasingTag(tagName)) {
      parseEndTag(lastTag)
    }
    if (canBeLeftOpenTag(tagName) && lastTag === tagName) {
      parseEndTag(tagName)
    }
  }

  const unary = isUnaryTag(tagName) || !!unarySlash

  // 解析開始標籤的屬性名和屬性值
  const l = match.attrs.length
  const attrs = new Array(l)
  for (let i = 0; i < l; i++) {
    const args = match.attrs[i]
    // hackish work around FF bug https://bugzilla.mozilla.org/show_bug.cgi?id=369778
    if (IS_REGEX_CAPTURING_BROKEN && args[0].indexOf('""') === -1) {
      if (args[3] === '') { delete args[3] }
      if (args[4] === '') { delete args[4] }
      if (args[5] === '') { delete args[5] }
    }
    const value = args[3] || args[4] || args[5] || ''
    const shouldDecodeNewlines = tagName === 'a' && args[1] === 'href'
      ? options.shouldDecodeNewlinesForHref
      : options.shouldDecodeNewlines
    attrs[i] = {
      name: args[1],
      value: decodeAttr(value, shouldDecodeNewlines)
    }
  }

  // 將標籤及其屬性推如堆棧中
  if (!unary) {
    stack.push({ tag: tagName, lowerCasedTag: tagName.toLowerCase(), attrs: attrs })
    lastTag = tagName
  }
  // 觸發 options.start 方法。
  if (options.start) {
    options.start(tagName, attrs, unary, match.start, match.end)
  }
}
複製代碼

// 解析結束TAG
function parseEndTag (tagName, start, end) {
  let pos, lowerCasedTagName
  if (start == null) start = index
  if (end == null) end = index

  if (tagName) {
    lowerCasedTagName = tagName.toLowerCase()
  }

  // 找到同類的開始 TAG 在堆棧中的位置
  if (tagName) {
    for (pos = stack.length - 1; pos >= 0; pos--) {
      if (stack[pos].lowerCasedTag === lowerCasedTagName) {
        break
      }
    }
  } else {
    // If no tag name is provided, clean shop
    pos = 0
  }

  // 對堆棧中的大於等於 pos 的開始標籤使用 options.end 方法。
  if (pos >= 0) {
    // Close all the open elements, up the stack
    for (let i = stack.length - 1; i >= pos; i--) {
      if (process.env.NODE_ENV !== 'production' &&
        (i > pos || !tagName) &&
        options.warn
      ) {
        options.warn(
          `tag <${stack[i].tag}> has no matching end tag.`
        )
      }
      if (options.end) {
        options.end(stack[i].tag, start, end)
      }
    }

    // Remove the open elements from the stack
    // 從棧中移除元素，並標記爲 lastTag
    stack.length = pos
    lastTag = pos && stack[pos - 1].tag
  } else if (lowerCasedTagName === 'br') {
    // 回車標籤
    if (options.start) {
      options.start(tagName, [], true, start, end)
    }
  } else if (lowerCasedTagName === 'p') {
    // 段落標籤
    if (options.start) {
      options.start(tagName, [], false, start, end)
    }
    if (options.end) {
      options.end(tagName, start, end)
    }
  }
}
複製代碼

html解析詳解

這邊詳細講解html解析,下面簡單的template模板就是做爲此次詳解的例子，有可能不能覆蓋所有場景。

<div id="app">
    <!-- 註釋 -->
    <div v-if="show" class="message">{{message}}</div>
</div>
複製代碼

上面這一段是做爲字符串來處理，首先咱們的開頭是<app，全部這個是會走到parseStartTag這個函數中

const startTagMatch = parseStartTag()
  if (startTagMatch) {
    handleStartTag(startTagMatch)
    if (shouldIgnoreFirstNewline(lastTag, html)) {
      advance(1)
    }
    continue
  }
}
複製代碼

返回值就是一個這樣的簡單匹配出來的js對象，而後再經過handleStartTag這個函數把這裏面一些無用的和須要添加的處理的數據處理後，執行最開始的start鉤子函數，這個在後面的ast生成中描述。

{
    attrs: [
        {
            0: " id="app"",
            1: "id",
            2: "=",
            3: "app",
            4: undefined,
            5: undefined,
            end: 13,
            groups: undefined,
            index: 0,
            input: " id="app">↵ <!-- 註釋 -->↵ <div v-if="show" class="message">{{message}}</div>↵ </div>",
            start: 4,
        }
    ],
    end: 14,
    start: 0,
    tagName: "div",
    unarySlash: "",
}
複製代碼

由於有advanceh函數，如今咱們的代碼已經變成下面這樣了，而後繼續循環。

<!-- 註釋 -->
    <div v-if="show" class="message">{{message}}</div>
</div>
複製代碼

如今就是走到了註釋節點這一塊，匹配到後執行開始傳入的comment鉤子函數建立註釋節點ast

// 註釋匹配
  if (comment.test(html)) {
    const commentEnd = html.indexOf('-->')

    if (commentEnd >= 0) {
      // 若是須要保留註釋，執行 option.comment 方法
      if (options.shouldKeepComment) {
        options.comment(html.substring(4, commentEnd))
      }
      advance(commentEnd + 3)
      continue
    }
}
複製代碼

這個時候咱們的代碼已經變成了下面這個樣子，有沒有觀察到一個細節，就是註釋雖然沒有了，可是註釋距離下一個<div這個中間仍是有留白的。

<div v-if="show" class="message">{{message}}</div>
</div>
複製代碼

這樣的話html.indexOf('<')這個匹配出來的就不是0，會走到else邏輯，而後做爲文本節點給處理掉。繼續走一次處理開始標籤<div v-if="show" class="message">，文本標籤`{{message}}後咱們的template變成了

</div>
</div>
複製代碼

這個時候就會經過結束標籤的判斷走到parseEndTag函數，調用開始傳入的end鉤子函數。

// End tag: 結束標籤
  const endTagMatch = html.match(endTag)
  if (endTagMatch) {
    const curIndex = index
    advance(endTagMatch[0].length)
    // 解析結束標籤
    parseEndTag(endTagMatch[1], curIndex, index)
    continue
  }
複製代碼

這樣咱們的template已經被所有解析完成了，可能還有一些別的匹配沒有在這個例子中，可是思路都是同樣的，就是正則匹配到後解析成js對象而後交給傳入的鉤子函數生成ast，下面是循環處理html的詳解。

while (html) {
  last = html
  // 若是沒有lastTag，且不是純文本內容元素中：script、style、textarea
  if (!lastTag || !isPlainTextElement(lastTag)) {
    // 文本結束，經過<查找。
    let textEnd = html.indexOf('<')
    // 文本結束位置在第一個字符，即第一個標籤爲<
    if (textEnd === 0) {
      // 註釋匹配
      if (comment.test(html)) {
        const commentEnd = html.indexOf('-->')

        if (commentEnd >= 0) {
          // 若是須要保留註釋，執行 option.comment 方法
          if (options.shouldKeepComment) {
            options.comment(html.substring(4, commentEnd))
          }
          advance(commentEnd + 3)
          continue
        }
      }

      // http://en.wikipedia.org/wiki/Conditional_comment#Downlevel-revealed_conditional_comment
      // 條件註釋
      if (conditionalComment.test(html)) {
        const conditionalEnd = html.indexOf(']>')

        if (conditionalEnd >= 0) {
          advance(conditionalEnd + 2)
          continue
        }
      }

      // Doctype:
      const doctypeMatch = html.match(doctype)
      if (doctypeMatch) {
        advance(doctypeMatch[0].length)
        continue
      }

      // End tag: 結束標籤
      const endTagMatch = html.match(endTag)
      if (endTagMatch) {
        const curIndex = index
        advance(endTagMatch[0].length)
        // 解析結束標籤
        parseEndTag(endTagMatch[1], curIndex, index)
        continue
      }

      // Start tag: 開始標籤
      const startTagMatch = parseStartTag()
      if (startTagMatch) {
        handleStartTag(startTagMatch)
        if (shouldIgnoreFirstNewline(lastTag, html)) {
          advance(1)
        }
        continue
      }
    }

    // < 標籤位置大於等於0，即標籤中有內容
    let text, rest, next
    if (textEnd >= 0) {
      // 截取從 0 - textEnd 的字符串
      rest = html.slice(textEnd)
      // 獲取在普通字符串中的<字符，而不是開始標籤、結束標籤、註釋、條件註釋
      while (
        !endTag.test(rest) &&
        !startTagOpen.test(rest) &&
        !comment.test(rest) &&
        !conditionalComment.test(rest)
      ) {
        // < in plain text, be forgiving and treat it as text
        next = rest.indexOf('<', 1)
        if (next < 0) break
        textEnd += next
        rest = html.slice(textEnd)
      }
      // 最終截取字符串內容
      text = html.substring(0, textEnd)
      advance(textEnd)
    }

    if (textEnd < 0) {
      text = html
      html = ''
    }
    // 繪製文本內容，使用 options.char 方法。
    if (options.chars && text) {
      options.chars(text)
    }
  } else {
    // 若是lastTag 爲 script、style、textarea
    let endTagLength = 0
    const stackedTag = lastTag.toLowerCase()
    const reStackedTag = reCache[stackedTag] || (reCache[stackedTag] = new RegExp('([\\s\\S]*?)(</' + stackedTag + '[^>]*>)', 'i'))
    const rest = html.replace(reStackedTag, function (all, text, endTag) {
      endTagLength = endTag.length
      if (!isPlainTextElement(stackedTag) && stackedTag !== 'noscript') {
        text = text
          .replace(/<!\--([\s\S]*?)-->/g, '$1') // <!--xxx--> 
          .replace(/<!\[CDATA\[([\s\S]*?)]]>/g, '$1') //<!CDATAxxx>
      }
      if (shouldIgnoreFirstNewline(stackedTag, text)) {
        text = text.slice(1)
      }
      // 處理文本內容，並使用 options.char 方法。
      if (options.chars) {
        options.chars(text)
      }
      return ''
    })
    index += html.length - rest.length
    html = rest
    // 解析結束tag
    parseEndTag(stackedTag, index - endTagLength, index)
  }

  // html文本到最後
  if (html === last) {
    // 執行 options.chars
    options.chars && options.chars(html)
    if (process.env.NODE_ENV !== 'production' && !stack.length && options.warn) {
      options.warn(`Mal-formatted tag at end of template: "${html}"`)
    }
    break
  }
}
複製代碼

這個地方附上個流程示意圖。

到這個地方其實已經把整個html解析完成了，後面咱們就開始說傳入的那幾個鉤子函數是怎麼把咱們解析生成的js對象變成ast的。

parse中的語法分析

當html解析完成後就須要進行詞法分析，把處理好的js對象變成一個ast。咱們首先來看看createASTElement這個函數，看名字就是一個建立ast元素，在vue裏面ast其實就一個有特定格式的js對象。

export function createASTElement (
  tag: string,
  attrs: Array<ASTAttr>,
  parent: ASTElement | void
): ASTElement {
  return {
    type: 1,                            // 節點類型 type = 1 爲dom節點
    tag,                                // 節點標籤
    attrsList: attrs,                   // 節點屬性
    attrsMap: makeAttrsMap(attrs),      // 節點映射
    parent,                             // 父節點
    children: []                        // 子節點
  }
}
複製代碼

start

start這個鉤子函數就是慢慢地給這個AST進行裝飾，添加更多的屬性和標誌，如今讓咱們具體看看這個函數，首先接着上面的例子，start如今接收的四個參數分別爲

tag: div (元素的標籤名)
attrs: [ {end: 13, name: "id", start: 5, value: "app"} ] (元素上面的屬性)
unary: false (是不是一元)
start: 0 (開始位置)
複製代碼

start (tag, attrs, unary, start) {
  // 獲取命名空間
  const ns = (currentParent && currentParent.ns) || platformGetTagNamespace(tag)

  // handle IE svg bug
  /* istanbul ignore if */
  if (isIE && ns === 'svg') {
    attrs = guardIESVGBug(attrs)
  }
  
  // 建立一個ast基礎元素
  let element: ASTElement = createASTElement(tag, attrs, currentParent)
  if (ns) {
    element.ns = ns
  }
  
  // 服務端渲染的狀況下是否存在被禁止標籤
  if (isForbiddenTag(element) && !isServerRendering()) {
    element.forbidden = true
    process.env.NODE_ENV !== 'production' && warn(
      'Templates should only be responsible for mapping the state to the ' +
      'UI. Avoid placing tags with side-effects in your templates, such as ' +
      `<${tag}>` + ', as they will not be parsed.',
      { start: element.start }
    )
  }

  // 預處理一些動態類型：v-model
  for (let i = 0; i < preTransforms.length; i++) {
    element = preTransforms[i](element, options) || element
  }
  
  // 對vue的指令進行處理v-pre、v-if、v-for、v-once、slot、key、ref這裏就不細說了
  if (!inVPre) {
    processPre(element)
    if (element.pre) {
      inVPre = true
    }
  }
  if (platformIsPreTag(element.tag)) {
    inPre = true
  }
  if (inVPre) {
    processRawAttrs(element)
  } else if (!element.processed) {
    // structural directives
    processFor(element)
    processIf(element)
    processOnce(element)
  }

 // 限制根節點不能是slot，template，v-for這類標籤
 if (!root) {
    root = element
    if (process.env.NODE_ENV !== 'production') {
      checkRootConstraints(root)
    }
  }

  // 不是單標籤就入棧，是的話結束這個元素的
  if (!unary) {
    currentParent = element
    stack.push(element)
  } else {
    closeElement(element)
  }
},
複製代碼

處理完成以後element元素就變成了，就是上面說的ast對象的格式。

{
    attrsList: [{name: "id", value: "app", start: 5, end: 13}]
    attrsMap: {id: "app"}
    children: []
    parent: undefined
    start: 0
    tag: "div"
    type: 1
}
複製代碼

char

而後是咱們代碼裏面的一串空格進入了char方法，這個空格被trim後就變成空了，走到判斷text的地方直接跳過了，這個鉤子char函數在咱們列子中{{message}}這個也會做爲文本進入。

chars (text: string, start: number, end: number) {
  // 判斷有沒有父元素
  if (!currentParent) {
    if (process.env.NODE_ENV !== 'production') {
      if (text === template) {
        warnOnce(
          'Component template requires a root element, rather than just text.',
          { start }
        )
      } else if ((text = text.trim())) {
        warnOnce(
          `text "${text}" outside root element will be ignored.`,
          { start }
        )
      }
    }
    return
  }
  // IE textarea placeholder bug
  /* istanbul ignore if */
  if (isIE &&
    currentParent.tag === 'textarea' &&
    currentParent.attrsMap.placeholder === text
  ) {
    return
  }
  // 儲存下currentParent的子元素
  const children = currentParent.children
  if (inPre || text.trim()) {
    text = isTextTag(currentParent) ? text : decodeHTMLCached(text)
  } else if (!children.length) {
    // remove the whitespace-only node right after an opening tag
    text = ''
  } else if (whitespaceOption) {
    if (whitespaceOption === 'condense') {
      // in condense mode, remove the whitespace node if it contains
      // line break, otherwise condense to a single space
      text = lineBreakRE.test(text) ? '' : ' '
    } else {
      text = ' '
    }
  } else {
    text = preserveWhitespace ? ' ' : ''
  }
  if (text) {
    if (whitespaceOption === 'condense') {
      // condense consecutive whitespaces into single space
      text = text.replace(whitespaceRE, ' ')
    }
    let res
    let child: ?ASTNode
    // 解析文本，動態屬性狀況
    if (!inVPre && text !== ' ' && (res = parseText(text, delimiters))) {
      child = {
        type: 2,
        expression: res.expression,
        tokens: res.tokens,
        text
      }
    } else if (text !== ' ' || !children.length || children[children.length - 1].text !== ' ') {
      child = {
        type: 3,
        text
      }
    }
    if (child) {
      if (process.env.NODE_ENV !== 'production' && options.outputSourceRange) {
        child.start = start
        child.end = end
      }
      children.push(child)
    }
  }
}
複製代碼

comment

生成註釋ast的函數仍是比較簡單的，設置type=3爲註釋類型，把text放入對象中而後push到currentParent.children

comment (text: string, start, end) {
  const child: ASTText = {
    type: 3,
    text,
    isComment: true
  }
  if (process.env.NODE_ENV !== 'production' && options.outputSourceRange) {
    child.start = start
    child.end = end
  }
  currentParent.children.push(child)
}
複製代碼

end

最後例子進入到end鉤子

end (tag, start, end) {
  const element = stack[stack.length - 1]
  if (!inPre) {
    // 刪除無用的最後一個空註釋節點
    const lastNode = element.children[element.children.length - 1]
    if (lastNode && lastNode.type === 3 && lastNode.text === ' ') {
      element.children.pop()
    }
  }
  // 修改棧，讓父級變成上一級
  stack.length -= 1
  currentParent = stack[stack.length - 1]
  if (process.env.NODE_ENV !== 'production' && options.outputSourceRange) {
    element.end = end
  }
  // 關閉當前元素
  closeElement(element)
},
複製代碼

到這個結尾能夠看一下如今返回的root也就是ast是什麼樣子，和預想中的基本一致。