Vue模板編譯原理

寫在開頭

寫過 Vue 的同窗確定體驗過， .vue 這種單文件組件有多麼方便。可是咱們也知道，Vue 底層是經過虛擬 DOM 來進行渲染的，那麼 .vue 文件的模板究竟是怎麼轉換成虛擬 DOM 的呢？這一塊對我來講一直是個黑盒，以前也沒有深刻研究過，今天打算一探究竟。html

Vue 3 發佈在即，原本想着直接看看 Vue 3 的模板編譯，可是我打開 Vue 3 源碼的時候，發現我好像連 Vue 2 是怎麼編譯模板的都不知道。從小魯迅就告訴咱們，不能一口吃成一個胖子，那我只能回頭看看 Vue 2 的模板編譯源碼，至於 Vue 3 就留到正式發佈的時候再看。vue

Vue 的版本

不少人使用 Vue 的時候，都是直接經過 vue-cli 生成的模板代碼，並不知道 Vue 其實提供了兩個構建版本。node

vue.js：完整版本，包含了模板編譯的能力；
vue.runtime.js：運行時版本，不提供模板編譯能力，須要經過 vue-loader 進行提早編譯。

簡單來講，就是若是你用了 vue-loader ，就可使用 vue.runtime.min.js，將模板編譯的過程交過 vue-loader，若是你是在瀏覽器中直接經過 script 標籤引入 Vue，須要使用 vue.min.js，運行的時候編譯模板。web

編譯入口

瞭解了 Vue 的版本，咱們看看 Vue 完整版的入口文件（src/platforms/web/entry-runtime-with-compiler.js）。vue-cli

// 省略了部分代碼，只保留了關鍵部分
import { compileToFunctions } from './compiler/index'

const mount = Vue.prototype.$mount
Vue.prototype.$mount = function (el) {
  const options = this.$options
  
  // 若是沒有 render 方法，則進行 template 編譯
  if (!options.render) {
    let template = options.template
    if (template) {
      // 調用 compileToFunctions，編譯 template，獲得 render 方法
      const { render, staticRenderFns } = compileToFunctions(template, {
        shouldDecodeNewlines,
        shouldDecodeNewlinesForHref,
        delimiters: options.delimiters,
        comments: options.comments
      }, this)
      // 這裏的 render 方法就是生成生成虛擬 DOM 的方法
      options.render = render
    }
  }
  return mount.call(this, el, hydrating)
}
複製代碼

再看看 ./compiler/index 文件的 compileToFunctions 方法從何而來。express

import { baseOptions } from './options'
import { createCompiler } from 'compiler/index'

// 經過 createCompiler 方法生成編譯函數
const { compile, compileToFunctions } = createCompiler(baseOptions)
export { compile, compileToFunctions }
複製代碼

後續的主要邏輯都在 compiler 模塊中，這一塊有些繞，由於本文不是作源碼分析，就不貼整段源碼了。簡單看看這一段的邏輯是怎麼樣的。數組

export function createCompiler(baseOptions) {
  const baseCompile = (template, options) => {
    // 解析 html，轉化爲 ast
    const ast = parse(template.trim(), options)
    // 優化 ast，標記靜態節點
    optimize(ast, options)
    // 將 ast 轉化爲可執行代碼
    const code = generate(ast, options)
    return {
      ast,
      render: code.render,
      staticRenderFns: code.staticRenderFns
    }
  }
  const compile = (template, options) => {
    const tips = []
    const errors = []
    // 收集編譯過程當中的錯誤信息
    options.warn = (msg, tip) => {
      (tip ? tips : errors).push(msg)
    }
    // 編譯
    const compiled = baseCompile(template, options)
    compiled.errors = errors
    compiled.tips = tips

    return compiled
  }
  const createCompileToFunctionFn = () => {
    // 編譯緩存
    const cache = Object.create(null)
    return (template, options, vm) => {
      // 已編譯模板直接走緩存
      if (cache[template]) {
        return cache[template]
      }
      const compiled = compile(template, options)
    	return (cache[key] = compiled)
    }
  }
  return {
    compile,
    compileToFunctions: createCompileToFunctionFn(compile)
  }
}
複製代碼

主流程

能夠看到主要的編譯邏輯基本都在 baseCompile 方法內，主要分爲三個步驟：瀏覽器

模板編譯，將模板代碼轉化爲 AST；
優化 AST，方便後續虛擬 DOM 更新；
生成代碼，將 AST 轉化爲可執行的代碼；

const baseCompile = (template, options) => {
  // 解析 html，轉化爲 ast
  const ast = parse(template.trim(), options)
  // 優化 ast，標記靜態節點
  optimize(ast, options)
  // 將 ast 轉化爲可執行代碼
  const code = generate(ast, options)
  return {
    ast,
    render: code.render,
    staticRenderFns: code.staticRenderFns
  }
}
複製代碼

parse

AST

首先看到 parse 方法，該方法的主要做用就是解析 HTML，並轉化爲 AST（抽象語法樹），接觸過 ESLint、Babel 的同窗確定對 AST 不陌生，咱們能夠先看看通過 parse 以後的 AST 長什麼樣。緩存

下面是一段普普統統的 Vue 模板：markdown

new Vue({
  el: '#app',
  template: ` <div> <h2 v-if="message">{{message}}</h2> <button @click="showName">showName</button> </div> `,
  data: {
    name: 'shenfq',
    message: 'Hello Vue!'
  },
  methods: {
    showName() {
      alert(this.name)
    }
  }
})
複製代碼

通過 parse 以後的 AST：

AST 爲一個樹形結構的對象，每一層表示一個節點，第一層就是 div（tag: "div"）。div 的子節點都在 children 屬性中，分別是 h2 標籤、空行、button 標籤。咱們還能夠注意到有一個用來標記節點類型的屬性：type，這裏 div 的 type 爲 1，表示是一個元素節點，type 一共有三種類型：

元素節點；
表達式；
文本；

在 h2 和 button 標籤之間的空行就是 type 爲 3 的文本節點，而 h2 標籤下就是一個表達式節點。

解析HTML

parse 的總體邏輯較爲複雜，咱們能夠先簡化一下代碼，看看 parse 的流程。

import { parseHTML } from './html-parser'

export function parse(template, options) {
  let root
  parseHTML(template, {
    // some options...
    start() {}, // 解析到標籤位置開始的回調
    end() {}, // 解析到標籤位置結束的回調
    chars() {}, // 解析到文本時的回調
    comment() {} // 解析到註釋時的回調
  })
  return root
}
複製代碼

能夠看到 parse 主要經過 parseHTML 進行工做，這個 parseHTML 自己來自於開源庫：htmlparser.js，只不過通過了 Vue 團隊的一些修改，修復了相關 issue。

下面咱們一塊兒來理一理 parseHTML 的邏輯。

export function parseHTML(html, options) {
  let index = 0
  let last,lastTag
  const stack = []
  while(html) {
    last = html
    let textEnd = html.indexOf('<')

    // "<" 字符在當前 html 字符串開始位置
    if (textEnd === 0) {
      // 一、匹配到註釋: <!-- -->
      if (/^<!\--/.test(html)) {
        const commentEnd = html.indexOf('-->')
        if (commentEnd >= 0) {
          // 調用 options.comment 回調，傳入註釋內容
          options.comment(html.substring(4, commentEnd))
          // 裁切掉註釋部分
          advance(commentEnd + 3)
          continue
        }
      }

      // 二、匹配到條件註釋: <![if !IE]> <![endif]>
      if (/^<!\[/.test(html)) {
        // ... 邏輯與匹配到註釋相似
      }

      // 三、匹配到 Doctype: <!DOCTYPE html>
      const doctypeMatch = html.match(/^<!DOCTYPE [^>]+>/i)
      if (doctypeMatch) {
        // ... 邏輯與匹配到註釋相似
      }

      // 四、匹配到結束標籤: </div>
      const endTagMatch = html.match(endTag)
      if (endTagMatch) {}

      // 五、匹配到開始標籤: <div>
      const startTagMatch = parseStartTag()
      if (startTagMatch) {}
    }
    // "<" 字符在當前 html 字符串中間位置
    let text, rest, next
    if (textEnd > 0) {
      // 提取中間字符
      rest = html.slice(textEnd)
      // 這一部分當成文本處理
      text = html.substring(0, textEnd)
      advance(textEnd)
    }
    // "<" 字符在當前 html 字符串中不存在
    if (textEnd < 0) {
      text = html
      html = ''
    }
    
    // 若是存在 text 文本
    // 調用 options.chars 回調，傳入 text 文本
    if (options.chars && text) {
      // 字符相關回調
      options.chars(text)
    }
  }
  // 向前推動，裁切 html
  function advance(n) {
    index += n
    html = html.substring(n)
  }
}
複製代碼

上述代碼爲簡化後的 parseHTML，while 循環中每次截取一段 html 文本，而後經過正則判斷文本的類型進行處理，這就相似於編譯原理中經常使用的有限狀態機。每次拿到 "<" 字符先後的文本，"<" 字符前的就當作文本處理，"<" 字符後的經過正則判斷，可推算出有限的幾種狀態。

其餘的邏輯處理都不復雜，主要是開始標籤與結束標籤，咱們先看看關於開始標籤與結束標籤相關的正則。

const ncname = '[a-zA-Z_][\\w\\-\\.]*'
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const startTagOpen = new RegExp(`^<${qnameCapture}`)
複製代碼

這段正則看起來很長，可是理清以後也不是很難。這裏推薦一個正則可視化工具。咱們到工具上看看startTagOpen：

這裏比較疑惑的點就是爲何 tagName 會存在 :，這個是 XML 的命名空間，如今已經不多使用了，咱們能夠直接忽略，因此咱們簡化一下這個正則：

const ncname = '[a-zA-Z_][\\w\\-\\.]*'
const startTagOpen = new RegExp(`^<${ncname}`)
const startTagClose = /^\s*(\/?)>/
const endTag = new RegExp(`^<\\/${ncname}[^>]*>`)
複製代碼

除了上面關於標籤開始和結束的正則，還有一段用來提取標籤屬性的正則，真的是又臭又長。

const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
複製代碼

把正則放到工具上就一目瞭然了，以 = 爲分界，前面爲屬性的名字，後面爲屬性的值。

理清正則後能夠更加方便咱們看後面的代碼。

while(html) {
  last = html
  let textEnd = html.indexOf('<')

  // "<" 字符在當前 html 字符串開始位置
  if (textEnd === 0) {
    // some code ...

    // 四、匹配到標籤結束位置: </div>
    const endTagMatch = html.match(endTag)
    if (endTagMatch) {
      const curIndex = index
      advance(endTagMatch[0].length)
      parseEndTag(endTagMatch[1], curIndex, index)
      continue
    }

    // 五、匹配到標籤開始位置: <div>
    const startTagMatch = parseStartTag()
    if (startTagMatch) {
      handleStartTag(startTagMatch)
      continue
    }
  }
}
// 向前推動，裁切 html
function advance(n) {
  index += n
  html = html.substring(n)
}

// 判斷是否標籤開始位置，若是是，則提取標籤名以及相關屬性
function parseStartTag () {
  // 提取 <xxx
  const start = html.match(startTagOpen)
  if (start) {
    const [fullStr, tag] = start
    const match = {
      attrs: [],
      start: index,
      tagName: tag,
    }
    advance(fullStr.length)
    let end, attr
    // 遞歸提取屬性，直到出現 ">" 或 "/>" 字符
    while (
      !(end = html.match(startTagClose)) &&
      (attr = html.match(attribute))
    ) {
      advance(attr[0].length)
      match.attrs.push(attr)
    }
    if (end) {
      // 若是是 "/>" 表示單標籤
      match.unarySlash = end[1]
      advance(end[0].length)
      match.end = index
      return match
    }
  }
}

// 處理開始標籤
function handleStartTag (match) {
  const tagName = match.tagName
  const unary = match.unarySlash
  const len = match.attrs.length
  const attrs = new Array(len)
  for (let i = 0; i < l; i++) {
    const args = match.attrs[i]
    // 這裏的 三、四、5 分別對應三種不一樣複製屬性的方式
    // 3: attr="xxx" 雙引號
    // 4: attr='xxx' 單引號
    // 5: attr=xxx 省略引號
    const value = args[3] || args[4] || args[5] || ''
    attrs[i] = {
      name: args[1],
      value
    }
  }

  if (!unary) {
    // 非單標籤，入棧
    stack.push({
      tag: tagName,
      lowerCasedTag:
      tagName.toLowerCase(),
      attrs: attrs
    })
    lastTag = tagName
  }

  if (options.start) {
    // 開始標籤的回調
    options.start(tagName, attrs, unary, match.start, match.end)
  }
}

// 處理閉合標籤
function parseEndTag (tagName, start, end) {
  let pos, lowerCasedTagName
  if (start == null) start = index
  if (end == null) end = index

  if (tagName) {
    lowerCasedTagName = tagName.toLowerCase()
  }

  // 在棧內查找相同類型的未閉合標籤
  if (tagName) {
    for (pos = stack.length - 1; pos >= 0; pos--) {
      if (stack[pos].lowerCasedTag === lowerCasedTagName) {
        break
      }
    }
  } else {
    pos = 0
  }

  if (pos >= 0) {
    // 關閉該標籤內的未閉合標籤，更新堆棧
    for (let i = stack.length - 1; i >= pos; i--) {
      if (options.end) {
        // end 回調
        options.end(stack[i].tag, start, end)
      }
    }

    // 堆棧中刪除已關閉標籤
    stack.length = pos
    lastTag = pos && stack[pos - 1].tag
  }
}
複製代碼

在解析開始標籤的時候，若是該標籤不是單標籤，會將該標籤放入到一個堆棧當中，每次閉合標籤的時候，會從棧頂向下查找同名標籤，直到找到同名標籤，這個操做會閉合同名標籤上面的全部標籤。接下來咱們舉個例子：

<div>
  <h2>test</h2>
  <p>
  <p>
</div>
複製代碼

在解析了 div 和 h2 的開始標籤後，棧內就存在了兩個元素。h2 閉合後，就會將 h2 出棧。而後會解析兩個未閉合的 p 標籤，此時，棧內存在三個元素（div、p、p）。若是這個時候，解析了 div 的閉合標籤，除了將 div 閉合外，div 內兩個未閉合的 p 標籤也會跟隨閉合，此時棧被清空。

爲了便於理解，特意錄製了一個動圖，以下：

理清了 parseHTML 的邏輯後，咱們回到調用 parseHTML 的位置，調用該方法的時候，一共會傳入四個回調，分別對應標籤的開始和結束、文本、註釋。

parseHTML(template, {
  // some options...

  // 解析到標籤位置開始的回調
  start(tag, attrs, unary) {},
  // 解析到標籤位置結束的回調
  end(tag) {},
  // 解析到文本時的回調
  chars(text: string) {},
  // 解析到註釋時的回調
  comment(text: string) {}
})
複製代碼

處理開始標籤

首先看解析到開始標籤時，會生成一個 AST 節點，而後處理標籤上的屬性，最後將 AST 節點放入樹形結構中。

function makeAttrsMap(attrs) {
  const map = {}
  for (let i = 0, l = attrs.length; i < l; i++) {
    const { name, value } = attrs[i]
    map[name] = value
  }
  return map
}
function createASTElement(tag, attrs, parent) {
  const attrsList = attrs
  const attrsMap = makeAttrsMap(attrsList)
  return {
    type: 1,       // 節點類型
    tag,           // 節點名稱
    attrsMap,      // 節點屬性映射
    attrsList,     // 節點屬性數組
    parent,        // 父節點
    children: [],  // 子節點
  }
}

const stack = []
let root // 根節點
let currentParent // 暫存當前的父節點
parseHTML(template, {
  // some options...

  // 解析到標籤位置開始的回調
  start(tag, attrs, unary) {
    // 建立 AST 節點
    let element = createASTElement(tag, attrs, currentParent)

    // 處理指令: v-for v-if v-once
    processFor(element)
    processIf(element)
    processOnce(element)
    processElement(element, options)

    // 處理 AST 樹
    // 根節點不存在，則設置該元素爲根節點
   	if (!root) {
      root = element
      checkRootConstraints(root)
    }
    // 存在父節點
    if (currentParent) {
      // 將該元素推入父節點的子節點中
      currentParent.children.push(element)
      element.parent = currentParent
    }
    if (!unary) {
    	// 非單標籤須要入棧，且切換當前父元素的位置
      currentParent = element
      stack.push(element)
    }
  }
})
複製代碼

處理結束標籤

標籤結束的邏輯就比較簡單了，只須要去除棧內最後一個未閉合標籤，進行閉合便可。

parseHTML(template, {
  // some options...

  // 解析到標籤位置結束的回調
  end() {
    const element = stack[stack.length - 1]
    const lastNode = element.children[element.children.length - 1]
    // 處理尾部空格的狀況
    if (lastNode && lastNode.type === 3 && lastNode.text === ' ') {
      element.children.pop()
    }
    // 出棧，重置當前的父節點
    stack.length -= 1
    currentParent = stack[stack.length - 1]
  }
})
複製代碼

處理文本

處理完標籤後，還須要對標籤內的文本進行處理。文本的處理分兩種狀況，一種是帶表達式的文本，還一種就是純靜態的文本。

parseHTML(template, {
  // some options...

  // 解析到文本時的回調
  chars(text) {
    if (!currentParent) {
      // 文本節點外若是沒有父節點則不處理
      return
    }
    
    const children = currentParent.children
    text = text.trim()
    if (text) {
      // parseText 用來解析表達式
      // delimiters 表示表達式標識符，默認爲 ['{{', '}}']
      const res = parseText(text, delimiters))
      if (res) {
        // 表達式
        children.push({
          type: 2,
          expression: res.expression,
          tokens: res.tokens,
          text
        })
      } else {
        // 靜態文本
        children.push({
          type: 3,
          text
        })
      }
    }
  }
})
複製代碼

下面咱們看看 parseText 如何解析表達式。

// 構造匹配表達式的正則
const buildRegex = delimiters => {
  const open = delimiters[0]
  const close = delimiters[1]
  return new RegExp(open + '((?:.|\\n)+?)' + close, 'g')
}

function parseText (text, delimiters){
  // delimiters 默認爲 {{ }}
  const tagRE = buildRegex(delimiters || ['{{', '}}'])
  // 未匹配到表達式，直接返回
  if (!tagRE.test(text)) {
    return
  }
  const tokens = []
  const rawTokens = []
  let lastIndex = tagRE.lastIndex = 0
  let match, index, tokenValue
  while ((match = tagRE.exec(text))) {
    // 表達式開始的位置
    index = match.index
    // 提取表達式開始位置前面的靜態字符，放入 token 中
    if (index > lastIndex) {
      rawTokens.push(tokenValue = text.slice(lastIndex, index))
      tokens.push(JSON.stringify(tokenValue))
    }
    // 提取表達式內部的內容，使用 _s() 方法包裹
    const exp = match[1].trim()
    tokens.push(`_s(${exp})`)
    rawTokens.push({ '@binding': exp })
    lastIndex = index + match[0].length
  }
  // 表達式後面還有其餘靜態字符，放入 token 中
  if (lastIndex < text.length) {
    rawTokens.push(tokenValue = text.slice(lastIndex))
    tokens.push(JSON.stringify(tokenValue))
  }
  return {
    expression: tokens.join('+'),
    tokens: rawTokens
  }
}

複製代碼

首先經過一段正則來提取表達式：

看代碼可能有點難，咱們直接看例子，這裏有一個包含表達式的文本。

<div>是否登陸：{{isLogin ? '是' : '否'}}</div>
複製代碼

optimize

經過上述一些列處理，咱們就獲得了 Vue 模板的 AST。因爲 Vue 是響應式設計，因此拿到 AST 以後還須要進行一系列優化，確保靜態的數據不會進入虛擬 DOM 的更新階段，以此來優化性能。

export function optimize (root, options) {
  if (!root) return
  // 標記靜態節點
  markStatic(root)
}
複製代碼

簡單來講，就是把因此靜態節點的 static 屬性設置爲 true。

function isStatic (node) {
  if (node.type === 2) { // 表達式，返回 false
    return false
  }
  if (node.type === 3) { // 靜態文本，返回 true
    return true
  }
  // 此處省略了部分條件
  return !!(
    !node.hasBindings && // 沒有動態綁定
    !node.if && !node.for && // 沒有 v-if/v-for
    !isBuiltInTag(node.tag) && // 不是內置組件 slot/component
    !isDirectChildOfTemplateFor(node) && // 不在 template for 循環內
    Object.keys(node).every(isStaticKey) // 非靜態節點
  )
}

function markStatic (node) {
  node.static = isStatic(node)
  if (node.type === 1) {
    // 若是是元素節點，須要遍歷全部子節點
    for (let i = 0, l = node.children.length; i < l; i++) {
      const child = node.children[i]
      markStatic(child)
      if (!child.static) {
        // 若是有一個子節點不是靜態節點，則該節點也必須是動態的
        node.static = false
      }
    }
  }
}
複製代碼

generate

獲得優化的 AST 以後，就須要將 AST 轉化爲 render 方法。仍是用以前的模板，先看看生成的代碼長什麼樣：

<div>
  <h2 v-if="message">{{message}}</h2>
  <button @click="showName">showName</button>
</div>
複製代碼

{
  render: "with(this){return _c('div',[(message)?_c('h2',[_v(_s(message))]):_e(),_v(" "),_c('button',{on:{"click":showName}},[_v("showName")])])}"
}
複製代碼

將生成的代碼展開：

with (this) {
    return _c(
      'div',
      [
        (message) ? _c('h2', [_v(_s(message))]) : _e(),
        _v(' '),
        _c('button', { on: { click: showName } }, [_v('showName')])
      ])
    ;
}
複製代碼

看到這裏一堆的下劃線確定很懵逼，這裏的 _c 對應的是虛擬 DOM 中的 createElement 方法。其餘的下劃線方法在 core/instance/render-helpers 中都有定義，每一個方法具體作了什麼不作展開。

具體轉化方法就是一些簡單的字符拼接，下面是簡化了邏輯的部分，不作過多講述。

export function generate(ast, options) {
  const state = new CodegenState(options)
  const code = ast ? genElement(ast, state) : '_c("div")'
  return {
    render: `with(this){return ${code}}`,
    staticRenderFns: state.staticRenderFns
  }
}

export function genElement (el, state) {
  let code
  const data = genData(el, state)
  const children = genChildren(el, state, true)
  code = `_c('${el.tag}'${ data ? `,${data}` : '' // data }${ children ? `,${children}` : '' // children })`
  return code
}
複製代碼

總結

理清了 Vue 模板編譯的整個過程，重點都放在瞭解析 HTML 生成 AST 的部分。本文只是大體講述了主要流程，其中省略了特別多的細節，好比：對 template/slot 的處理、指令的處理等等，若是想了解其中的細節能夠直接閱讀源碼。但願你們在閱讀這篇文章後有所收穫。