編譯原理實戰入門：用 JavaScript 寫一個簡單的四則運算編譯器（二）語法分析

時間 2019-12-04

標籤編譯原理實戰入門 javascript 一個簡單四則運算編譯器語法分析欄目 JavaScript 简体版

原文原文鏈接

四則運算的語法規則（語法規則是分層的）

x* 表示 x 出現零次或屢次
x | y 表示 x 或 y 將出現
( ) 圓括號，用於語言構詞的分組

如下規則從左往右看，表示左邊的表達式還能繼續往下細分紅右邊的表達式，一直細分到不可再分爲止。git

expression: addExpression
addExpression: mulExpression (op mulExpression)*
mulExpression: term (op term)*
term: '(' expression ')' | integerConstant
op: + - * /

PS: addExpression 對應 + - 表達式，mulExpression 對應 * / 表達式。github

語法分析

對輸入的文本按照語法規則進行分析並肯定其語法結構的一種過程，稱爲語法分析。算法

通常語法分析的輸出爲抽象語法樹（AST）或語法分析樹（parse tree）。但因爲四則運算比較簡單，因此這裏採起的方案是即時地進行代碼生成和錯誤報告，這樣就不須要在內存中保存整個程序結構。express

先來看看怎麼分析一個四則運算表達式 1 + 2 * 3。數組

首先匹配的是 expression，因爲目前 expression 往下分只有一種可能，即 addExpression，因此分解爲 addExpression。
依次類推，接下來的順序爲 mulExpression、term、1（integerConstant）、+（op）、mulExpression、term、2（integerConstant）、*（op）、mulExpression、term、3（integerConstant）。this

以下圖所示：spa

這裏可能會有人有疑問，爲何一個表達式搞得這麼複雜，expression 下面有 addExpression，addExpression 下面還有 mulExpression。
其實這裏是爲了考慮運算符優先級而設的，mulExpr 比 addExpr 表達式運算級要高。prototype

1 + 2 * 3
compileExpression
   | compileAddExpr
   |  | compileMultExpr
   |  |  | compileTerm
   |  |  |  |_ matches integerConstant        push 1
   |  |  |_
   |  | matches '+'
   |  | compileMultExpr
   |  |  | compileTerm
   |  |  |  |_ matches integerConstant        push 2
   |  |  | matches '*'
   |  |  | compileTerm
   |  |  |  |_ matches integerConstant        push 3
   |  |  |_ compileOp('*')                      *
   |  |_ compileOp('+')                         +
   |_

有不少算法可用來構建語法分析樹，這裏只講兩種算法。code

遞歸降低分析法

遞歸降低分析法，也稱爲自頂向下分析法。按照語法規則一步步遞歸地分析 token 流，若是遇到非終結符，則繼續往下分析，直到終結符爲止。對象

LL(0)分析法

遞歸降低分析法是簡單高效的算法，LL(0)在此基礎上多了一個步驟，當第一個 token 不足以肯定元素類型時，對下一個字元採起「提早查看」，有可能會解決這種不肯定性。

以上是對這兩種算法的簡介，具體實現請看下方的代碼實現。

表達式代碼生成

咱們一般用的四則運算表達式是中綴表達式，可是對於計算機來講中綴表達式不便於計算。因此在代碼生成階段，要將中綴表達式轉換爲後綴表達式。

後綴表達式

後綴表達式，又稱逆波蘭式，指的是不包含括號，運算符放在兩個運算對象的後面，全部的計算按運算符出現的順序，嚴格從左向右進行（再也不考慮運算符的優先規則）。

示例：

中綴表達式： 5 + 5 轉換爲後綴表達式：5 5 +，而後再根據後綴表達式生成代碼。

// 5 + 5 轉換爲 5 5 + 再生成代碼
push 5
push 5
add

代碼實現

編譯原理的理論知識像天書，常常讓人看得雲裏霧裏，但真正動手作起來，你會發現，其實還挺簡單的。

若是上面的理論知識看不太懂，不要緊，先看代碼，再和理論知識結合起來看。

注意：這裏須要引入上一篇文章詞法分析的代碼。

// 彙編代碼生成器
function AssemblyWriter() {
    this.output = ''
}

AssemblyWriter.prototype = {
    writePush(digit) {
        this.output += `push ${digit}\r\n`
    },

    writeOP(op) {
        this.output += op + '\r\n'
    },

    //輸出彙編代碼
    outputStr() {
        return this.output
    }
}

// 語法分析器
function Parser(tokens, writer) {
    this.writer = writer
    this.tokens = tokens
    // tokens 數組索引
    this.i = -1
    this.opMap1 = {
        '+': 'add',
        '-': 'sub',
    }

    this.opMap2 = {
        '/': 'div',
        '*': 'mul'
    }

    this.init()
}

Parser.prototype = {
    init() {
        this.compileExpression()
    },

    compileExpression() {
        this.compileAddExpr()
    },

    compileAddExpr() {
        this.compileMultExpr()
        while (true) {
            this.getNextToken()
            if (this.opMap1[this.token]) {
                let op = this.opMap1[this.token]
                this.compileMultExpr()
                this.writer.writeOP(op)
            } else {
                // 沒有匹配上相應的操做符 這裏爲沒有匹配上 + - 
                // 將 token 索引後退一位
                this.i--
                break
            }
        }
    },

    compileMultExpr() {
        this.compileTerm()
        while (true) {
            this.getNextToken()
            if (this.opMap2[this.token]) {
                let op = this.opMap2[this.token]
                this.compileTerm()
                this.writer.writeOP(op)
            } else {
                // 沒有匹配上相應的操做符 這裏爲沒有匹配上 * / 
                // 將 token 索引後退一位
                this.i--
                break
            }
        }
    },

    compileTerm() {
        this.getNextToken()
        if (this.token == '(') {
            this.compileExpression()
            this.getNextToken()
            if (this.token != ')') {
                throw '缺乏右括號：)'
            }
        } else if (/^\d+$/.test(this.token)) {
            this.writer.writePush(this.token)
        } else {
            throw '錯誤的 token：第 ' + (this.i + 1) + ' 個 token (' + this.token + ')'
        }
    },

    getNextToken() {
        this.token = this.tokens[++this.i]
    },

    getInstructions() {
        return this.writer.outputStr()
    }
}

const tokens = lexicalAnalysis('100+10*10')
const writer = new AssemblyWriter()
const parser = new Parser(tokens, writer)
const instructions = parser.getInstructions()
console.log(instructions) // 輸出生成的彙編代碼
/*
push 100
push 10
push 10
mul
add
*/