實現一個簡單的解釋器（6）

時間 2020-03-05

標籤實現一個簡單解釋器简体版

原文原文鏈接

譯自：https://ruslanspivak.com/lsbasi-part6/
（已得到做者受權）html

今天，咱們經過將帶括號的表達式添加到語法，並實現一個可以計算任意深度嵌套表達式的解釋器來結束對算術表達式的討論。python

讓咱們開始吧！git

首先，讓咱們修改語法以支持括號內的表達式，正如在第5部分中所記得的那樣，factor規則用於表達式中的基本單位，在那篇文章中，咱們僅有的基本單位是整數，今天咱們添加了另一個基本單位，也就是帶括號的表達式。express

這是咱們更新的語法：
數據結構

expr和term與第5部分徹底相同，惟一的變化是factor的產生式，其中LPAREN表示左括號'('，RPAREN表示右括號')'，而括號之間的非終結符expr表示expr規則。函數

這是factor的更新語法圖：
學習

由於expr和term的語法規則沒有改變，因此它們的語法圖看起來與第5部分中的相同：
spa

這是咱們新語法的一個有趣功能：遞歸，若是嘗試推導表達式2 * (7 + 3)，則將從expr起始符開始，以後將遞歸地再次使用expr規則來推導表達式(7 + 3)這一部分。
讓咱們根據語法分解表達式2 *(7 + 3)：
3d

好的，讓咱們開始將新的更新語法轉換爲代碼。code

如下是對上一篇文章代碼的主要更改：

一、對Lexer進行修改，以返回另外兩個標記：LPAREN用於左括號，而RPAREN用於右括號。

二、對解釋器的factor函數進行修改，能夠解析(parse)除整數之外的帶括號的表達式。

這是計算器的完整代碼，能夠計算任意數量的加，減，乘和除整數運算以及帶有任意深度嵌套的帶括號的表達式：

# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, MUL, DIV, LPAREN, RPAREN, EOF = (
    'INTEGER', 'PLUS', 'MINUS', 'MUL', 'DIV', '(', ')', 'EOF'
)


class Token(object):
    def __init__(self, type, value):
        self.type = type
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS, '+')
            Token(MUL, '*')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Lexer(object):
    def __init__(self, text):
        # client string input, e.g. "4 + 2 * 3 - 6 / 2"
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        self.current_char = self.text[self.pos]

    def error(self):
        raise Exception('Invalid character')

    def advance(self):
        """Advance the `pos` pointer and set the `current_char` variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = ''
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens. One token at a time.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == '+':
                self.advance()
                return Token(PLUS, '+')

            if self.current_char == '-':
                self.advance()
                return Token(MINUS, '-')

            if self.current_char == '*':
                self.advance()
                return Token(MUL, '*')

            if self.current_char == '/':
                self.advance()
                return Token(DIV, '/')

            if self.current_char == '(':
                self.advance()
                return Token(LPAREN, '(')

            if self.current_char == ')':
                self.advance()
                return Token(RPAREN, ')')

            self.error()

        return Token(EOF, None)


class Interpreter(object):
    def __init__(self, lexer):
        self.lexer = lexer
        # set current token to the first token taken from the input
        self.current_token = self.lexer.get_next_token()

    def error(self):
        raise Exception('Invalid syntax')

    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.lexer.get_next_token()
        else:
            self.error()

    def factor(self):
        """factor : INTEGER | LPAREN expr RPAREN"""
        token = self.current_token
        if token.type == INTEGER:
            self.eat(INTEGER)
            return token.value
        elif token.type == LPAREN:
            self.eat(LPAREN)
            result = self.expr()
            self.eat(RPAREN)
            return result

    def term(self):
        """term : factor ((MUL | DIV) factor)*"""
        result = self.factor()

        while self.current_token.type in (MUL, DIV):
            token = self.current_token
            if token.type == MUL:
                self.eat(MUL)
                result = result * self.factor()
            elif token.type == DIV:
                self.eat(DIV)
                result = result / self.factor()

        return result

    def expr(self):
        """Arithmetic expression parser / interpreter.

        calc> 7 + 3 * (10 / (12 / (3 + 1) - 1))
        22

        expr   : term ((PLUS | MINUS) term)*
        term   : factor ((MUL | DIV) factor)*
        factor : INTEGER | LPAREN expr RPAREN
        """
        result = self.term()

        while self.current_token.type in (PLUS, MINUS):
            token = self.current_token
            if token.type == PLUS:
                self.eat(PLUS)
                result = result + self.term()
            elif token.type == MINUS:
                self.eat(MINUS)
                result = result - self.term()

        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        lexer = Lexer(text)
        interpreter = Interpreter(lexer)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

將上面的代碼保存到calc6.py文件中，體驗一下新解釋器是否正確計算了具備不一樣運算符和括號的算術表達式。

這是運行效果：

$ python calc6.py
calc> 3
3
calc> 2 + 7 * 4
30
calc> 7 - 8 / 4
5
calc> 14 + 2 * 3 - 6 / 2
17
calc> 7 + 3 * (10 / (12 / (3 + 1) - 1))
22
calc> 7 + 3 * (10 / (12 / (3 + 1) - 1)) / (2 + 3) - 5 - 3 + (8)
10
calc> 7 + (((3 + 2)))
12

這是今天的練習：

一、如本文所述，編寫你本身的算術表達式解釋器版本，記住：重複是全部學習的源泉。

嘿，你一直閱讀到最後！恭喜你已經學會了如何實現一個基本的遞歸降低解析器/解釋器，它能夠評估很是複雜的算術表達式。

在下一篇文章中，我將詳細討論遞歸降低解析器。我還將在解釋器和編譯器的構造中介紹一個重要且普遍使用的數據結構，咱們將在整個系列中使用。

請繼續關注，很快再見。在此以前請你繼續實現本身的解釋器，最重要的是：盡情享受這一過程！