前端與編譯原理——用 JS 寫一個 JS 解釋器

時間 2019-11-15

原文原文鏈接

寫於 2018.12.03javascript

提及編譯原理，印象每每只停留在本科時那些枯燥的課程和晦澀的概念。做爲前端開發者，編譯原理彷佛離咱們很遠，對它的理解極可能僅僅侷限於「抽象語法樹（AST）」。但這僅僅是個開頭而已。編譯原理的使用，甚至能讓咱們利用JS直接寫一個能運行JS代碼的解釋器。前端

項目地址：github.com/jrainlau/ca…java

在線體驗：codepen.io/jrainlau/pe…node

1、爲何要用JS寫JS的解釋器

接觸太小程序開發的同窗應該知道，小程序運行的環境禁止new Function，eval等方法的使用，致使咱們沒法直接執行字符串形式的動態代碼。此外，許多平臺也對這些JS自帶的可執行動態代碼的方法進行了限制，那麼咱們是沒有任何辦法了嗎？既然如此，咱們即可以用JS寫一個解析器，讓JS本身去運行本身。git

在開始以前，咱們先簡單回顧一下編譯原理的一些概念。github

2、什麼是編譯器

說到編譯原理，確定離不開編譯器。簡單來講，當一段代碼通過編譯器的詞法分析、語法分析等階段以後，會生成一個樹狀結構的「抽象語法樹（AST）」，該語法樹的每個節點都對應着代碼當中不一樣含義的片斷。express

好比有這麼一段代碼：小程序

const a = 1
console.log(a)
複製代碼

通過編譯器處理後，它的AST長這樣：微信小程序

{
  "type": "Program",
  "start": 0,
  "end": 26,
  "body": [
    {
      "type": "VariableDeclaration",
      "start": 0,
      "end": 11,
      "declarations": [
        {
          "type": "VariableDeclarator",
          "start": 6,
          "end": 11,
          "id": {
            "type": "Identifier",
            "start": 6,
            "end": 7,
            "name": "a"
          },
          "init": {
            "type": "Literal",
            "start": 10,
            "end": 11,
            "value": 1,
            "raw": "1"
          }
        }
      ],
      "kind": "const"
    },
    {
      "type": "ExpressionStatement",
      "start": 12,
      "end": 26,
      "expression": {
        "type": "CallExpression",
        "start": 12,
        "end": 26,
        "callee": {
          "type": "MemberExpression",
          "start": 12,
          "end": 23,
          "object": {
            "type": "Identifier",
            "start": 12,
            "end": 19,
            "name": "console"
          },
          "property": {
            "type": "Identifier",
            "start": 20,
            "end": 23,
            "name": "log"
          },
          "computed": false
        },
        "arguments": [
          {
            "type": "Identifier",
            "start": 24,
            "end": 25,
            "name": "a"
          }
        ]
      }
    }
  ],
  "sourceType": "module"
}
複製代碼

常見的JS編譯器有babylon，acorn等等，感興趣的同窗能夠在AST explorer這個網站自行體驗。bash

能夠看到，編譯出來的AST詳細記錄了代碼中全部語義代碼的類型、起始位置等信息。這段代碼除了根節點Program外，主體包含了兩個節點VariableDeclaration和ExpressionStatement，而這些節點裏面又包含了不一樣的子節點。

正是因爲AST詳細記錄了代碼的語義化信息，因此Babel，Webpack，Sass，Less等工具能夠針對代碼進行很是智能的處理。

3、什麼是解釋器

如同翻譯人員不只能看懂一門外語，也能對其藝術加工後把它翻譯成母語同樣，人們把可以將代碼轉化成AST的工具叫作「編譯器」，而把可以將AST翻譯成目標語言並運行的工具叫作「解釋器」。

在編譯原理的課程中，咱們思考過這麼一個問題：如何讓計算機運行算數表達式1+2+3:

1 + 2 + 3
複製代碼

當機器執行的時候，它可能會是這樣的機器碼：

1 PUSH 1
2 PUSH 2
3 ADD
4 PUSH 3
5 ADD
複製代碼

而運行這段機器碼的程序，就是解釋器。

在這篇文章中，咱們不會搞出機器碼這樣複雜的東西，僅僅是使用JS在其runtime環境下去解釋JS代碼的AST。因爲解釋器使用JS編寫，因此咱們能夠大膽使用JS自身的語言特性，好比this綁定、new關鍵字等等，徹底不須要對它們進行額外處理，也所以讓JS解釋器的實現變得很是簡單。

在回顧了編譯原理的基本概念以後，咱們就能夠着手進行開發了。

4、節點遍歷器

經過分析上文的AST，能夠看到每個節點都會有一個類型屬性type，不一樣類型的節點須要不一樣的處理方式，處理這些節點的程序，就是「節點處理器（nodeHandler）」

定義一個節點處理器：

const nodeHandler = {
  Program () {},
  VariableDeclaration () {},
  ExpressionStatement () {},
  MemberExpression () {},
  CallExpression () {},
  Identifier () {}
}
複製代碼

關於節點處理器的具體實現，會在後文進行詳細探討，這裏暫時不做展開。

有了節點處理器，咱們便須要去遍歷AST當中的每個節點，遞歸地調用節點處理器，直到完成對整棵語法書的處理。

定義一個節點遍歷器（NodeIterator）：

class NodeIterator {
  constructor (node) {
    this.node = node
    this.nodeHandler = nodeHandler
  }

  traverse (node) {
    // 根據節點類型找到節點處理器當中對應的函數
    const _eval = this.nodeHandler[node.type]
    // 若找不到則報錯
    if (!_eval) {
      throw new Error(`canjs: Unknown node type "${node.type}".`)
    }
    // 運行處理函數
    return _eval(node)
  }

}
複製代碼

理論上，節點遍歷器這樣設計就能夠了，但仔細推敲，發現漏了一個很重要的東西——做用域處理。

回到節點處理器的VariableDeclaration()方法，它用來處理諸如const a = 1這樣的變量聲明節點。假設它的代碼以下：

VariableDeclaration (node) {
    for (const declaration of node.declarations) {
      const { name } = declaration.id
      const value = declaration.init ? traverse(declaration.init) : undefined
      // 問題來了，拿到了變量的名稱和值，而後把它保存到哪裏去呢？
      // ...
    }
  },
複製代碼

問題在於，處理完變量聲明節點之後，理應把這個變量保存起來。按照JS語言特性，這個變量應該存放在一個做用域當中。在JS解析器的實現過程當中，這個做用域能夠被定義爲一個scope對象。

改寫節點遍歷器，爲其新增一個scope對象

class NodeIterator {
  constructor (node, scope = {}) {
    this.node = node
    this.scope = scope
    this.nodeHandler = nodeHandler
  }

  traverse (node, options = {}) {
    const scope = options.scope || this.scope
    const nodeIterator = new NodeIterator(node, scope)
    const _eval = this.nodeHandler[node.type]
    if (!_eval) {
      throw new Error(`canjs: Unknown node type "${node.type}".`)
    }
    return _eval(nodeIterator)
  }

  createScope (blockType = 'block') {
    return new Scope(blockType, this.scope)
  }
}
複製代碼

而後節點處理函數VariableDeclaration()就能夠經過scope保存變量了：

VariableDeclaration (nodeIterator) {
    const kind = nodeIterator.node.kind
    for (const declaration of nodeIterator.node.declarations) {
      const { name } = declaration.id
      const value = declaration.init ? nodeIterator.traverse(declaration.init) : undefined
      // 在做用域當中定義變量
      // 若是當前是塊級做用域且變量用var定義，則定義到父級做用域
      if (nodeIterator.scope.type === 'block' && kind === 'var') {
        nodeIterator.scope.parentScope.declare(name, value, kind)
      } else {
        nodeIterator.scope.declare(name, value, kind)
      }
    }
  },
複製代碼

關於做用域的處理，能夠說是整個JS解釋器最難的部分。接下來咱們將對做用域處理進行深刻的剖析。

5、做用域處理

考慮到這樣一種狀況：

const a = 1
{
  const b = 2
  console.log(a)
}
console.log(b)
複製代碼

運行結果必然是可以打印出a的值，而後報錯：Uncaught ReferenceError: b is not defined

這段代碼就是涉及到了做用域的問題。塊級做用域或者函數做用域能夠讀取其父級做用域當中的變量，反之則不行，因此對於做用域咱們不能簡單地定義一個空對象，而是要專門進行處理。

定義一個做用域基類Scope：

class Scope {
  constructor (type, parentScope) {
    // 做用域類型，區分函數做用域function和塊級做用域block
    this.type = type
    // 父級做用域
    this.parentScope = parentScope
    // 全局做用域
    this.globalDeclaration = standardMap
    // 當前做用域的變量空間
    this.declaration = Object.create(null)
  }

  /* * get/set方法用於獲取/設置當前做用域中對應name的變量值 符合JS語法規則，優先從當前做用域去找，若找不到則到父級做用域去找，而後到全局做用域找。 若是都沒有，就報錯 */
  get (name) {
    if (this.declaration[name]) {
      return this.declaration[name]
    } else if (this.parentScope) {
      return this.parentScope.get(name)
    } else if (this.globalDeclaration[name]) {
      return this.globalDeclaration[name]
    }
    throw new ReferenceError(`${name} is not defined`)
  }

  set (name, value) {
    if (this.declaration[name]) {
      this.declaration[name] = value
    } else if (this.parentScope[name]) {
      this.parentScope.set(name, value)
    } else {
      throw new ReferenceError(`${name} is not defined`)
    }
  }

  /** * 根據變量的kind調用不一樣的變量定義方法 */
  declare (name, value, kind = 'var') {
    if (kind === 'var') {
      return this.varDeclare(name, value)
    } else if (kind === 'let') {
      return this.letDeclare(name, value)
    } else if (kind === 'const') {
      return this.constDeclare(name, value)
    } else {
      throw new Error(`canjs: Invalid Variable Declaration Kind of "${kind}"`)
    }
  }

  varDeclare (name, value) {
    let scope = this
    // 若當前做用域存在非函數類型的父級做用域時，就把變量定義到父級做用域
    while (scope.parentScope && scope.type !== 'function') {
      scope = scope.parentScope
    }
    this.declaration[name] = new SimpleValue(value, 'var')
    return this.declaration[name]
  }

  letDeclare (name, value) {
    // 不容許重複定義
    if (this.declaration[name]) {
      throw new SyntaxError(`Identifier ${name} has already been declared`)
    }
    this.declaration[name] = new SimpleValue(value, 'let')
    return this.declaration[name]
  }

  constDeclare (name, value) {
    // 不容許重複定義
    if (this.declaration[name]) {
      throw new SyntaxError(`Identifier ${name} has already been declared`)
    }
    this.declaration[name] = new SimpleValue(value, 'const')
    return this.declaration[name]
  }
}
複製代碼

這裏使用了一個叫作simpleValue()的函數來定義變量值，主要用於處理常量：

class SimpleValue {
  constructor (value, kind = '') {
    this.value = value
    this.kind = kind
  }

  set (value) {
    // 禁止從新對const類型變量賦值
    if (this.kind === 'const') {
      throw new TypeError('Assignment to constant variable')
    } else {
      this.value = value
    }
  }

  get () {
    return this.value
  }
}
複製代碼

處理做用域問題思路，關鍵的地方就是在於JS語言自己尋找變量的特性——優先當前做用域，父做用域次之，全局做用域最後。反過來，在節點處理函數VariableDeclaration()裏，若是遇到塊級做用域且關鍵字爲var，則須要把這個變量也定義到父級做用域當中，這也就是咱們常說的「全局變量污染」。

JS標準庫注入

細心的讀者會發現，在定義Scope基類的時候，其全局做用域globalScope被賦值了一個standardMap對象，這個對象就是JS標準庫。

簡單來講，JS標準庫就是JS這門語言自己所帶有的一系列方法和屬性，如經常使用的setTimeout，console.log等等。爲了讓解析器也可以執行這些方法，因此咱們須要爲其注入標準庫：

const standardMap = {
  console: new SimpleValue(console)
}
複製代碼

這樣就至關於往解析器的全局做用域當中注入了console這個對象，也就能夠直接被使用了。

6、節點處理器

在處理完節點遍歷器、做用域處理的工做以後，即可以來編寫節點處理器了。顧名思義，節點處理器是專門用來處理AST節點的，上文反覆說起的VariableDeclaration()方法即是其中一個。下面將對部分關鍵的節點處理器進行講解。

在開發節點處理器以前，須要用到一個工具，用於判斷JS語句當中的return，break，continue關鍵字。

關鍵字判斷工具`Signal`

定義一個Signal基類：

class Signal {
  constructor (type, value) {
    this.type = type
    this.value = value
  }

  static Return (value) {
    return new Signal('return', value)
  }

  static Break (label = null) {
    return new Signal('break', label)
  }

  static Continue (label) {
    return new Signal('continue', label)
  }

  static isReturn(signal) {
    return signal instanceof Signal && signal.type === 'return'
  }

  static isContinue(signal) {
    return signal instanceof Signal && signal.type === 'continue'
  }

  static isBreak(signal) {
    return signal instanceof Signal && signal.type === 'break'
  }

  static isSignal (signal) {
    return signal instanceof Signal
  }
}
複製代碼

有了它，就能夠對語句當中的關鍵字進行判斷處理，接下來會有大用處。

一、變量定義節點處理器——`VariableDeclaration()`

最經常使用的節點處理器之一，負責把變量註冊到正確的做用域。

VariableDeclaration (nodeIterator) {
    const kind = nodeIterator.node.kind
    for (const declaration of nodeIterator.node.declarations) {
      const { name } = declaration.id
      const value = declaration.init ? nodeIterator.traverse(declaration.init) : undefined
      // 在做用域當中定義變量
      // 若爲塊級做用域且關鍵字爲var，則須要作全局污染
      if (nodeIterator.scope.type === 'block' && kind === 'var') {
        nodeIterator.scope.parentScope.declare(name, value, kind)
      } else {
        nodeIterator.scope.declare(name, value, kind)
      }
    }
  },
複製代碼

二、標識符節點處理器——`Identifier()`

專門用於從做用域中獲取標識符的值。

Identifier (nodeIterator) {
    if (nodeIterator.node.name === 'undefined') {
      return undefined
    }
    return nodeIterator.scope.get(nodeIterator.node.name).value
  },
複製代碼

三、字符節點處理器——`Literal()`

返回字符節點的值。

Literal (nodeIterator) {
    return nodeIterator.node.value
  }
複製代碼

四、表達式調用節點處理器——`CallExpression()`

用於處理表達式調用節點的處理器，如處理func()，console.log()等。

CallExpression (nodeIterator) {
    // 遍歷callee獲取函數體
    const func = nodeIterator.traverse(nodeIterator.node.callee)
    // 獲取參數
    const args = nodeIterator.node.arguments.map(arg => nodeIterator.traverse(arg))

    let value
    if (nodeIterator.node.callee.type === 'MemberExpression') {
      value = nodeIterator.traverse(nodeIterator.node.callee.object)
    }
    // 返回函數運行結果
    return func.apply(value, args)
  },
複製代碼

五、表達式節點處理器——`MemberExpression()`

區分於上面的「表達式調用節點處理器」，表達式節點指的是person.say，console.log這種函數表達式。

MemberExpression (nodeIterator) {
    // 獲取對象，如console
    const obj = nodeIterator.traverse(nodeIterator.node.object)
    // 獲取對象的方法，如log
    const name = nodeIterator.node.property.name
    // 返回表達式，如console.log
    return obj[name]
  }
複製代碼

六、塊級聲明節點處理器——`BlockStatement()`

很是經常使用的處理器，專門用於處理塊級聲明節點，如函數、循環、try...catch...當中的情景。

BlockStatement (nodeIterator) {
    // 先定義一個塊級做用域
    let scope = nodeIterator.createScope('block')

    // 處理塊級節點內的每個節點
    for (const node of nodeIterator.node.body) {
      if (node.type === 'VariableDeclaration' && node.kind === 'var') {
        for (const declaration of node.declarations) {
          scope.declare(declaration.id.name, declaration.init.value, node.kind)
        }
      } else if (node.type === 'FunctionDeclaration') {
        nodeIterator.traverse(node, { scope })
      }
    }

    // 提取關鍵字（return, break, continue）
    for (const node of nodeIterator.node.body) {
      if (node.type === 'FunctionDeclaration') {
        continue
      }
      const signal = nodeIterator.traverse(node, { scope })
      if (Signal.isSignal(signal)) {
        return signal
      }
    }
  }
複製代碼

能夠看到這個處理器裏面有兩個for...of循環。第一個用於處理塊級內語句，第二個專門用於識別關鍵字，如循環體內部的break，continue或者函數體內部的return。

七、函數定義節點處理器——`FunctionDeclaration()`

往做用當中聲明一個和函數名相同的變量，值爲所定義的函數：

FunctionDeclaration (nodeIterator) {
    const fn = NodeHandler.FunctionExpression(nodeIterator)
    nodeIterator.scope.varDeclare(nodeIterator.node.id.name, fn)
    return fn    
  }
複製代碼

八、函數表達式節點處理器——`FunctionExpression()`

用於定義一個函數：

FunctionExpression (nodeIterator) {
    const node = nodeIterator.node
    /** * 一、定義函數須要先爲其定義一個函數做用域，且容許繼承父級做用域 * 二、註冊`this`, `arguments`和形參到做用域的變量空間 * 三、檢查return關鍵字 * 四、定義函數名和長度 */
    const fn = function () {
      const scope = nodeIterator.createScope('function')
      scope.constDeclare('this', this)
      scope.constDeclare('arguments', arguments)

      node.params.forEach((param, index) => {
        const name = param.name
        scope.varDeclare(name, arguments[index])
      })

      const signal = nodeIterator.traverse(node.body, { scope })
      if (Signal.isReturn(signal)) {
        return signal.value
      }
    }

    Object.defineProperties(fn, {
      name: { value: node.id ? node.id.name : '' },
      length: { value: node.params.length }
    })

    return fn
  }
複製代碼

九、this表達式處理器——`ThisExpression()`

該處理器直接使用JS語言自身的特性，把this關鍵字從做用域中取出便可。

ThisExpression (nodeIterator) {
    const value = nodeIterator.scope.get('this')
    return value ? value.value : null
  }
複製代碼

十、new表達式處理器——`NewExpression()`

和this表達式相似，也是直接沿用JS的語言特性，獲取函數和參數以後，經過bind關鍵字生成一個構造函數，並返回。

NewExpression (nodeIterator) {
    const func = nodeIterator.traverse(nodeIterator.node.callee)
    const args = nodeIterator.node.arguments.map(arg => nodeIterator.traverse(arg))
    return new (func.bind(null, ...args))
  }
複製代碼

十一、For循環節點處理器——`ForStatement()`

For循環的三個參數對應着節點的init，test，update屬性，對着三個屬性分別調用節點處理器處理，並放回JS原生的for循環當中便可。

ForStatement (nodeIterator) {
    const node = nodeIterator.node
    let scope = nodeIterator.scope
    if (node.init && node.init.type === 'VariableDeclaration' && node.init.kind !== 'var') {
      scope = nodeIterator.createScope('block')
    }

    for (
      node.init && nodeIterator.traverse(node.init, { scope });
      node.test ? nodeIterator.traverse(node.test, { scope }) : true;
      node.update && nodeIterator.traverse(node.update, { scope })
    ) {
      const signal = nodeIterator.traverse(node.body, { scope })
      
      if (Signal.isBreak(signal)) {
        break
      } else if (Signal.isContinue(signal)) {
        continue
      } else if (Signal.isReturn(signal)) {
        return signal
      }
    }
  }
複製代碼

同理，for...in，while和do...while循環也是相似的處理方式，這裏再也不贅述。

十二、If聲明節點處理器——`IfStatemtnt()`

處理If語句，包括if，if...else，if...elseif...else。

IfStatement (nodeIterator) {
    if (nodeIterator.traverse(nodeIterator.node.test)) {
      return nodeIterator.traverse(nodeIterator.node.consequent)
    } else if (nodeIterator.node.alternate) {
      return nodeIterator.traverse(nodeIterator.node.alternate)
    }
  }
複製代碼

同理，switch語句、三目表達式也是相似的處理方式。

---

上面列出了幾個比較重要的節點處理器，在es5當中還有不少節點須要處理，詳細內容能夠訪問這個地址一探究竟。

7、定義調用方式

通過了上面的全部步驟，解析器已經具有處理es5代碼的能力，接下來就是對這些散裝的內容進行組裝，最終定義一個方便用戶調用的辦法。

const { Parser } = require('acorn')
const NodeIterator = require('./iterator')
const Scope = require('./scope')

class Canjs {
  constructor (code = '', extraDeclaration = {}) {
    this.code = code
    this.extraDeclaration = extraDeclaration
    this.ast = Parser.parse(code)
    this.nodeIterator = null
    this.init()
  }

  init () {
    // 定義全局做用域，該做用域類型爲函數做用域
    const globalScope = new Scope('function')
    // 根據入參定義標準庫以外的全局變量
    Object.keys(this.extraDeclaration).forEach((key) => {
      globalScope.addDeclaration(key, this.extraDeclaration[key])
    })
    this.nodeIterator = new NodeIterator(null, globalScope)
  }

  run () {
    return this.nodeIterator.traverse(this.ast)
  }
}
複製代碼

這裏咱們定義了一個名爲Canjs的基類，接受字符串形式的JS代碼，同時可定義標準庫以外的變量。當運行run()方法的時候就能夠獲得運行結果。

8、後續

至此，整個JS解析器已經完成，能夠很好地運行ES5的代碼（可能還有bug沒有發現）。可是在當前的實現中，全部的運行結果都是放在一個相似沙盒的地方，沒法對外界產生影響。若是要把運行結果取出來，可能的辦法有兩種。第一種是傳入一個全局的變量，把影響做用在這個全局變量當中，藉助它把結果帶出來；另一種則是讓解析器支持export語法，可以把export語句聲明的結果返回，感興趣的讀者能夠自行研究。

最後，這個JS解析器已經在個人Github上開源，歡迎前來交流~

github.com/jrainlau/ca…