從零寫一個編譯器（六）：語法分析之表驅動語法分析

時間 2019-11-24

標籤一個編譯器語法分析驅動简体版

原文原文鏈接

項目的完整代碼在 C2j-Compilerjava

前言

上一篇已經正式的完成了有限狀態自動機的構建和足夠判斷reduce的信息，接下來的任務就是根據這個有限狀態自動機來完成語法分析表和根據這個表來實現語法分析git

reduce信息

在完成語法分析表以前，還差最後一個任務，那就是描述reduce信息，來指導自動機是否該進行reduce操做github

reduce信息在ProductionsStateNode各自的節點裏完成，只要遍歷節點裏的產生式，若是符號「.」位於表達式的末尾，那麼該節點便可根據該表達式以及表達式對應的lookAhead set獲得reduce信息數據結構

reduce信息用一個map來表示，key是能夠進行reduce的符號，也就是lookahead sets中的符合，value則是進行reduce操做的產生式this

public HashMap<Integer, Integer> makeReduce() {
      HashMap<Integer, Integer> map = new HashMap<>();
      reduce(map, this.productions);
      reduce(map, this.mergedProduction);

      return map;
  }

  private void reduce(HashMap<Integer, Integer> map, ArrayList<Production> productions) {
      for (int i = 0; i < productions.size(); i++) {
          if (productions.get(i).canBeReduce()) {
              ArrayList<Integer> lookAhead = productions.get(i).getLookAheadSet();
              for (int j = 0; j < lookAhead.size(); j++) {
                  map.put(lookAhead.get(j), (productions.get(i).getProductionNum()));
              }
          }
      }
  }
複製代碼

語法分析表的構建

語法分析表的構建主要在StateNodeManager類裏，能夠先忽略loadTable和storageTableToFile的邏輯，這一部分主要是爲了儲存這張表，可以屢次使用spa

主要邏輯從while開始，遍歷全部節點，先從跳轉信息的Map裏拿出跳轉關係和跳轉的目的節點，而後把這個跳轉關係（這個本質上對應的是一開始Token枚舉的標號）和目的節點的標號拷貝到另外一個map裏。接着拿到reduce信息，找到以前對應在lookahead set裏的符號，把它們的value改寫成- （進行reduce操做的產生式編號），之因此寫成負數，就是爲了區分shift操做。debug

因此HashMap<Integer, HashMap<Integer, Integer>>這個數據結構做爲解析表表示：code

第一個Integer表示當前節點的編號
第二個Integer表示輸入字符
第三個Integer表示，若是大於0則是作shift操做，小於0則根據推導式作reduce操做

public HashMap<Integer, HashMap<Integer, Integer>> getLrStateTable() {
      File table = new File("lrStateTable.sb");
      if (table.exists()) {
          return loadTable();
      }

      Iterator it;
      if (isTransitionTableCompressed) {
          it = compressedStateList.iterator();
      } else {
          it = stateList.iterator();
      }

      while (it.hasNext()) {
          ProductionsStateNode state = (ProductionsStateNode) it.next();
          HashMap<Integer, ProductionsStateNode> map = transitionMap.get(state);
          HashMap<Integer, Integer> jump = new HashMap<>();

          if (map != null) {
              for (Map.Entry<Integer, ProductionsStateNode> item : map.entrySet()) {
                  jump.put(item.getKey(), item.getValue().stateNum);
              }
          }

          HashMap<Integer, Integer> reduceMap = state.makeReduce();
          if (reduceMap.size() > 0) {
              for (Map.Entry<Integer, Integer> item : reduceMap.entrySet()) {

                  jump.put(item.getKey(), -(item.getValue()));
              }
          }

          lrStateTable.put(state.stateNum, jump);
      }

      storageTableToFile(lrStateTable);

      return lrStateTable;
  }
複製代碼

表驅動的語法分析

語法分析的主要過程在LRStateTableParser類裏，由parse方法啓動.get

和第二篇講的同樣須要一個輸入堆棧，節點堆棧，其它的東西如今暫時不須要用到。在初始化的時候先把開始節點壓入堆棧，當前輸入字符設爲EXT_DEF_LIST，而後拿到語法解析表input

public LRStateTableParser(Lexer lexer) {
    this.lexer = lexer;
    statusStack.push(0);
    valueStack.push(null);
    lexer.advance();
    lexerInput = Token.EXT_DEF_LIST.ordinal();
    lrStateTable = StateNodeManager.getInstance().getLrStateTable();
}
複製代碼

語法解析的步驟：

拿到當前節點和當前字符所對應的下一個操做，也就是action > 0是shift操做，action < 0是reduce操做
若是進入action > 0，也就是shift操做
1. 把當前狀態節點和輸入字符分別壓入堆棧
2. 這裏要區分若是當前的字符是終結符，這時候就能夠直接讀入下一個字符
3. 可是這裏若是是非終結符，就應該直接用當前字符跳轉到下一個狀態。這裏是一個須要注意的一個點，這裏須要把當前的這個非終結符，放入到下一個節點的對應輸入堆棧中，這樣它進行reduce操做時彈出退棧的符號纔是正確的
若是action > 0，也就是reduce操做
1. 拿到對應的產生式
2. 把產生式右邊對應的狀態節點彈出堆棧
3. 把完成reduce的這個符號放入輸入堆棧

public void parse() {
      while (true) {
          Integer action = getAction(statusStack.peek(), lexerInput);

          if (action == null) {
              ConsoleDebugColor.outlnPurple("Shift for input: " + Token.values()[lexerInput].toString());
              System.err.println("The input is denied");
              return;
          }

          if (action > 0) {
              statusStack.push(action);
              text = lexer.text;

              // if (lexerInput == Token.RELOP.ordinal()) {
              // relOperatorText = text;
              // }

              parseStack.push(lexerInput);

              if (Token.isTerminal(lexerInput)) {
                  ConsoleDebugColor.outlnPurple("Shift for input: " + Token.values()[lexerInput].toString() + " text: " + text);

                  // Object obj = takeActionForShift(lexerInput);

                  lexer.advance();
                  lexerInput = lexer.lookAhead;
                  // valueStack.push(obj);
              } else {
                  lexerInput = lexer.lookAhead;
              }
          } else {
              if (action == 0) {
                  ConsoleDebugColor.outlnPurple("The input can be accepted");
                  return;
              }

              int reduceProduction = -action;
              Production product = ProductionManager.getInstance().getProductionByIndex(reduceProduction);
              ConsoleDebugColor.outlnPurple("reduce by product: ");
              product.debugPrint();

              // takeActionForReduce(reduceProduction);

              int rightSize = product.getRight().size();
              while (rightSize > 0) {
                  parseStack.pop();
                  // valueStack.pop();
                  statusStack.pop();
                  rightSize--;
              }

              lexerInput = product.getLeft();
              parseStack.push(lexerInput);
              // valueStack.push(attributeForParentNode);
          }
      }
  }

  private Integer getAction(Integer currentState, Integer currentInput) {
      HashMap<Integer, Integer> jump = lrStateTable.get(currentState);
      return jump.get(currentInput);
  }
複製代碼

歧義性語法

到如今已經完成了語法分析的全部內容，接下來就是語義分析了，可是在這以前還有一個須要說的是，咱們當前構造的有限狀態自動機屬於LALR(1)語法，即便LALR(1)語法已經足夠強大，可是依舊有LALR(1)語法處理不了的語法，若是給出的推導式不符合，那麼這個有限狀態自動機依舊不能正確解析，可是以前給出的語法都是符合LALR(1)語法的