實現一個簡單的解釋器（3）

時間 2020-03-04

標籤實現一個簡單解釋器简体版

原文原文鏈接

譯自：https://ruslanspivak.com/lsbasi-part3/
（已得到做者受權）html

我今天早上醒來，心想：「爲何咱們學習新技能如此困難？」python

我不認爲這僅僅是由於不夠努力，我認爲緣由之一多是咱們花費大量時間和精力來經過閱讀和觀看獲取知識，而沒有足夠的時間經過實踐將知識轉化爲技能。以游泳爲例，你能夠花費大量時間閱讀數百本有關游泳的書籍，與經驗豐富的游泳者和教練交談數小時，觀看全部可用的培訓視頻，可是當你第一次跳入泳池時，你仍然會像石頭同樣沉下去。git

無論咱們有多麼瞭解咱們的學科，其實並不重要，重要的是將這些知識付諸實踐，這樣才能將其轉化爲技能。爲了幫助你進行練習，我在第一部分和第二部分中都添加了練習，我保證會在今天的文章和之後的文章中增長更多練習：)github

好吧，讓咱們開始吧！express

到目前爲止，你已經瞭解如何解釋兩個整數相加或相減的算術表達式，例如"7 + 3"或"12 - 9"。今天，我將討論如何解析（識別）和解釋包含任意數量的正負運算符的算術表達式，例如"7 - 3 + 2 - 1"。編程

咱們能夠用如下語法圖(syntax diagram)表示本文中將要處理的算術表達式：
編程語言

什麼是語法圖？語法圖是編程語言的語法規則(syntax rules)的圖形表示(graphical representation)。基本上，語法圖直觀地顯示了編程語言中容許哪些語句，哪些不容許。函數

語法圖很是易於閱讀：只需遵循箭頭指示的路徑，有的路徑表示選擇，有的路徑表示循環。學習

咱們來閱讀上面的語法圖：一個term後面可選地跟上加號或者減號，而後又跟上另外一個term，而後又可選地帶上加號或減號，以後能夠繼續循環。你可能想知道什麼是term，在這篇文章中，term只是一個整數。測試

語法圖主要用於兩個目的：

一、它們以圖形方式表示編程語言的規範（語法）(grammar)。
二、它們能夠用來幫助編寫解析器(parser),咱們能夠按照簡單的規則將圖表映射(map)爲代碼。

你已經瞭解到，識別Token流中的短語的過程稱爲解析(parsing)，執行該工做的解釋器或編譯器部分稱爲解析器(parser)，解析也稱爲語法分析(syntax analysis)，咱們也將解析器稱爲語法分析器(syntax analyzer)。

根據上面的語法圖，如下全部算術表達式都是有效的：

3
3 + 4
7-3 + 2-1

因爲不一樣編程語言中算術表達式的語法規則很是類似，所以咱們可使用Python Shell來「測試(test)」語法圖。啓動你的Python Shell，親自看看：

>>> 3
3
>>> 3 + 4
7
>>> 7 - 3 + 2 - 1
5

這裏並不意外，和咱們與預想的同樣。

注意，表達式"3 +"不是一個有效(valid)的算術表達式，由於根據語法圖，加號後必須加上一個term（整數），不然是語法錯誤，再次使用Python Shell嘗試一下：

>>> 3 +
  File "<stdin>", line 1
    3 +
      ^
SyntaxError: invalid syntax

可以使用Python Shell進行測試很是好，但咱們更想將上面的語法圖映射爲代碼，並使用咱們本身的解釋器進行測試。

能夠從前面的文章（第1部分和第2部分）中知道expr函數實現瞭解析器(parser)和解釋器(interperter)，解析器僅識別結構，以確保它與某些規範相對應，而且一旦解析器成功識別（解析）了該表達式，解釋器便會實際計算(evaluate)該表達式(expression)。

如下代碼段顯示了與該圖相對應的解析器代碼。語法圖中的矩形框成爲解析整數的term函數，而expr函數僅遵循語法圖流程(syntax diagram flow)：

def term(self):
    self.eat(INTEGER)

def expr(self):
    # set current token to the first token taken from the input
    self.current_token = self.get_next_token()

    self.term()
    while self.current_token.type in (PLUS, MINUS):
        token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            self.term()
        elif token.type == MINUS:
            self.eat(MINUS)
            self.term()

能夠看到expr首先調用term函數，而後expr函數有一個while循環，能夠執行零次或屢次，在循環內，解析器根據Token（是加號仍是減號）進行選擇，能夠看出上面的代碼確實遵循了算術表達式的語法圖流程。

解析器自己不解釋(interpret)任何內容：若是識別不出來表達式，它會拋出語法錯誤。
讓咱們修改expr函數並添加解釋器代碼：

def term(self):
    """Return an INTEGER token value"""
    token = self.current_token
    self.eat(INTEGER)
    return token.value

def expr(self):
    """Parser / Interpreter """
    # set current token to the first token taken from the input
    self.current_token = self.get_next_token()

    result = self.term()
    while self.current_token.type in (PLUS, MINUS):
        token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            result = result + self.term()
        elif token.type == MINUS:
            self.eat(MINUS)
            result = result - self.term()

    return result

因爲解釋器須要計算(evaluate)表達式，所以對term函數進行了修改以返回整數值，而且對expr函數進行了修改以在適當的位置執行加法和減法並返回解釋的結果。
即便代碼很是簡單，我仍是建議你花一些時間來研究它。

如今咱們來看完整的解釋器代碼。

這是新版本計算器的源代碼，它能夠處理包含任意數量整數的加減運算的有效算術表達式：

# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, PLUS, MINUS, or EOF
        self.type = type
        # token value: non-negative integer value, '+', '-', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS, '+')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Interpreter(object):
    def __init__(self, text):
        # client string input, e.g. "3 + 5", "12 - 5 + 3", etc
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        # current token instance
        self.current_token = None
        self.current_char = self.text[self.pos]

    ##########################################################
    # Lexer code                                             #
    ##########################################################
    def error(self):
        raise Exception('Invalid syntax')

    def advance(self):
        """Advance the `pos` pointer and set the `current_char` variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = ''
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens. One token at a time.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == '+':
                self.advance()
                return Token(PLUS, '+')

            if self.current_char == '-':
                self.advance()
                return Token(MINUS, '-')

            self.error()

        return Token(EOF, None)

    ##########################################################
    # Parser / Interpreter code                              #
    ##########################################################
    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.get_next_token()
        else:
            self.error()

    def term(self):
        """Return an INTEGER token value."""
        token = self.current_token
        self.eat(INTEGER)
        return token.value

    def expr(self):
        """Arithmetic expression parser / interpreter."""
        # set current token to the first token taken from the input
        self.current_token = self.get_next_token()

        result = self.term()
        while self.current_token.type in (PLUS, MINUS):
            token = self.current_token
            if token.type == PLUS:
                self.eat(PLUS)
                result = result + self.term()
            elif token.type == MINUS:
                self.eat(MINUS)
                result = result - self.term()

        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        interpreter = Interpreter(text)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

將以上代碼保存到calc3.py文件中，或直接從GitHub下載，它能夠處理以前顯示的語法圖中得出的算術表達式。

這是我在筆記本上的運行效果：

$ python calc3.py
calc> 3
3
calc> 7 - 4
3
calc> 10 + 5
15
calc> 7 - 3 + 2 - 1
5
calc> 10 + 1 + 2 - 3 + 4 + 6 - 15
5
calc> 3 +
Traceback (most recent call last):
  File "calc3.py", line 147, in <module>
    main()
  File "calc3.py", line 142, in main
    result = interpreter.expr()
  File "calc3.py", line 123, in expr
    result = result + self.term()
  File "calc3.py", line 110, in term
    self.eat(INTEGER)
  File "calc3.py", line 105, in eat
    self.error()
  File "calc3.py", line 45, in error
    raise Exception('Invalid syntax')
Exception: Invalid syntax

記住我在文章開頭提到的，如今該作練習了：

一、爲僅包含乘法和除法的算術表達式繪製語法圖，例如"7 * 4 / 2 * 3"。
二、修改計算器的源代碼以解釋僅包含乘法和除法的算術表達式，例如"7 * 4 / 2 * 3"。
三、從頭開始編寫一個解釋器，處理諸如"7 - 3 + 2 - 1"之類的算術表達式。使用你喜歡的任何編程語言，並在不看示例的狀況下將其寫在腦海中，請考慮所涉及的組件：一個詞法分析器，它接受輸入並將其轉換爲Token流；解析器，將從詞法分析器提供的Token流中嘗試識別該流中的結構；以及在解析器成功解析（識別）有效算術表達式後生成結果的解釋器。將這些組合在一塊兒，花一些時間將學到的知識翻譯成可用於算術表達式的解釋器。

最後再來複習回憶一下：

一、什麼是語法圖？
二、什麼是語法分析？
三、什麼是語法分析器？

你看！你一直閱讀到最後，感謝你今天在這裏閒逛，不要忘了作練習，:)敬請期待。