從零寫一個編譯器（八）：語義分析之構造符號表

時間 2019-11-09

標籤一個編譯器語義分析構造符號简体版

原文原文鏈接

項目的完整代碼在 C2j-Compilerjava

前言

在以前完成了描述符號表的數據結構，如今就能夠正式構造符號表了。符號表的建立天然是要根據語法分析過程當中走的，因此符號表的建立就在LRStateTableParser裏的takeActionForReduce方法git

不過在此以前，固然還須要一個方便對這個符號表操做的類了github

這一篇主要的兩個文件是數據結構

TypeSystem.java
LRStateTableParser.java

操做符號表

操做符號表的方法都在TypeSystem類裏。函數

TypeSystem裏主要有這幾個方法：ui

類型說明符

邏輯都很簡單code

public TypeLink newType(String typeText) {
      Specifier sp;
      int type = Specifier.NONE;
      boolean isLong = false, isSigned = true;
      switch (typeText.charAt(0)) {
          case 'c':
              if (typeText.charAt(1) == 'h') {
                  type = Specifier.CHAR;
              }
              break;
          case 'd':
          case 'f':
              System.err.println("Floating point Numbers are not supported");
              System.exit(1);
              break;
          case 'i':
              type = Specifier.INT;
              break;
          case 'l':
              isLong = true;
              break;
          case 'u':
              isSigned = false;
              break;
          case 'v':
              if (typeText.charAt(2) == 'i') {
                  type = Specifier.VOID;
              }
              break;
          case 's':
              //ignore short signed
              break;
          default:
              break;
      }

      sp = new Specifier();
      sp.setType(type);
      sp.setLong(isLong);
      sp.setSign(isSigned);

      TypeLink link = new TypeLink(false, false, sp);

      return link;
  }

建立存儲類型

其實這一部分有的到後面解釋執行或者代碼生成的時候，如今這個編譯器是不處理的ci

這一部分的邏輯也很簡單作用域

public TypeLink newClass(String classText) {
      Specifier sp = new Specifier();
      sp.setType(Specifier.NONE);
      setClassType(sp, classText.charAt(0));

      TypeLink link = new TypeLink(false, false, sp);
      return link;
  }

  private void setClassType(Specifier sp, char c) {
      switch(c) {
          case 0:
              sp.setStorageClass(Specifier.FIXED);
              sp.setStatic(false);
              sp.setExternal(false);
              break;
          case 't':
              sp.setStorageClass(Specifier.TYPEDEF);
              break;
          case 'r':
              sp.setStorageClass(Specifier.REGISTER);
              break;
          case 's':
              sp.setStatic(true);
              break;
          case 'e':
              sp.setExternal(true);
              break;
          default:
              System.err.println("Internal error, Invalid Class type");
              System.exit(1);
              break;
      }
  }

給符號添加修飾符

addSpecifierToDeclaration是爲當前整個Symbol鏈都加上修飾符，好比碰見int x,y,z;這種狀況get

public Declarator addDeclarator(Symbol symbol, int declaratorType) {
      Declarator declarator = new Declarator(declaratorType);
      TypeLink link = new TypeLink(true, false, declarator);
      symbol.addDeclarator(link);

      return declarator;
  }

public void addSpecifierToDeclaration(TypeLink specifier, Symbol symbol) {
    while (symbol != null) {
        symbol.addSpecifier(specifier);
        symbol = symbol.getNextSymbol();
    }
}

剩下用到再提

構造符號表

構造符號表的過程在語法分析的過程裏，也就是進行reduce操做的時候。很好理解問什麼符號表的構建會在reduce操做時發生，由於當發生reduce操做的時候就表明產生了一個變量名或者是產生了一個變量定義，這時候是把它們加入符號表的最好時機

因此在語法分析的過程當中加入一個方法來處理reduce時的操做，此外還須要一個屬性堆棧來輔助操做，屬性堆棧的做用就是用來保存以前的操做，以方便後面使用

好比如今產生了一個修飾符，可是語法分析過程尚未讀入變量名，就先把這個修飾符壓入屬性堆棧，等讀入變量名的時候就能夠建立一個Symbol，再把修飾符彈出堆棧連接到Symbol上

takeActionForReduce

takeActionForReduce方法的參數就是作reduce操做依據的產生式的編號

只看一下比較複雜的：

StructSpecifier_TO_TypeSpecifier：

這是結構定義生成一個結構體類型的declartor，對應的推導式是

* TYPE_SPECIFIER -> STRUCT_SPECIFIER
* STTUCT_SPECIFIER -> STRUCT OPT_TAG LC DEF_LIST RC
*                     | STRUCT TAG

先生成一個Specifier再設置它的vStruct屬性，也就是聲明這是一個結構體，以後拿到結構體定義，從這個推導式中咱們能夠看出，結構體定義確定在它的上一步，也就是被放入了屬性堆棧的頂端

SPECIFIERS_TypeOrClass_TO_SPECIFIERS

這裏對應的推導式是

*  SPECIFIERS ->  SPECIFIERS TYPE_OR_CLASS

目的其實就是合併多個specifier

DEFLIST

case SyntaxProductionInit.ExtDeclList_COMMA_ExtDecl_TO_ExtDeclList:
case SyntaxProductionInit.VarList_COMMA_ParamDeclaration_TO_VarList:
case SyntaxProductionInit.DeclList_Comma_Decl_TO_DeclList:
case SyntaxProductionInit.DefList_Def_TO_DefList:

其實這部分是鏈接一系列變量的定義，好比ExtDeclList_COMMA_ExtDecl_TO_ExtDeclList就是對應像x,y這樣用逗號分割的連續定義，把它們的符號鏈接起來

DECLARTOR

case SyntaxProductionInit.OptSpecifier_ExtDeclList_Semi_TO_ExtDef:
case SyntaxProductionInit.TypeNT_VarDecl_TO_ParamDeclaration:
case SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def:

這裏實際上是完成一個完成的定義，像int x,y;後把拿到Specifier放入到每一個符號去，比較特殊的是拿到Specifier的位置，這是根據reduce次數計算的。

STRUCT和FUNCTION

接下來比較須要注意的就是處理struct和function的方法

在處理連續的變量聲明的時候，若是碰見的類型是結構體的話，就進行這樣一個處理，若是當前是個結構體聲明，咱們就直接把這個結構體裏的符號，也就是structDefine的fields放入argList（這個本來應該是放函數參數的）

private void handleStructVariable(Symbol symbol) {
      if (symbol == null) {
          return;
      }

      boolean isStruct = false;
      TypeLink typeLink = symbol.typeLinkBegin;
      Specifier specifier = null;
      while (typeLink != null) {
          if (!typeLink.isDeclarator) {
              specifier = (Specifier) typeLink.getTypeObject();
              if (specifier.getType() == Specifier.STRUCTURE) {
                  isStruct = true;
                  break;
              }
          }

          typeLink = typeLink.toNext();
      }

      if (isStruct) {
          StructDefine structDefine = specifier.getStruct();
          Symbol copy = null, headCopy = null, original = structDefine.getFields();
          while (original != null) {
              if (copy != null) {
                  Symbol sym = original.copy();
                  copy.setNextSymbol(sym);
                  copy = sym;
              } else {
                  copy = original.copy();
                  headCopy = copy;
              }

              original = original.getNextSymbol();
          }

          symbol.setArgList(headCopy);
      }
  }

這個方法其實就是根據有沒有參數來判斷當前函數的名字在堆棧的哪一個位置

private void setFunctionSymbol(boolean hasArgs) {
    Symbol funcSymbol;
    if (hasArgs) {
        funcSymbol = (Symbol) valueStack.get(valueStack.size() - 4);
    } else {
        funcSymbol = (Symbol) valueStack.get(valueStack.size() - 3);
    }

    typeSystem.addDeclarator(funcSymbol, Declarator.FUNCTION);
    attributeForParentNode = funcSymbol;
}

takeActionForReduce源碼

private void takeActionForReduce(int productionNum) {
      switch (productionNum) {
          case SyntaxProductionInit.TYPE_TO_TYPE_SPECIFIER:
              attributeForParentNode = typeSystem.newType(text);
              break;
          case SyntaxProductionInit.EnumSpecifier_TO_TypeSpecifier:
              attributeForParentNode = typeSystem.newType("int");
              break;
          case SyntaxProductionInit.StructSpecifier_TO_TypeSpecifier:
              attributeForParentNode = typeSystem.newType(text);
              TypeLink link = (TypeLink) attributeForParentNode;
              Specifier sp = (Specifier) link.getTypeObject();
              sp.setType(Specifier.STRUCTURE);
              StructDefine struct = (StructDefine) valueStack.get(valueStack.size() - 1);
              sp.setStruct(struct);
              break;

          case SyntaxProductionInit.SPECIFIERS_TypeOrClass_TO_SPECIFIERS:
              attributeForParentNode = valueStack.peek();
              Specifier last = (Specifier) ((TypeLink) valueStack.get(valueStack.size() - 2)).getTypeObject();
              Specifier dst = (Specifier) ((TypeLink) attributeForParentNode).getTypeObject();
              typeSystem.specifierCopy(dst, last);
              break;
          case SyntaxProductionInit.NAME_TO_NewName:
          case SyntaxProductionInit.Name_TO_NameNT:
              attributeForParentNode = typeSystem.newSymbol(text, nestingLevel);
              break;
          case SyntaxProductionInit.START_VarDecl_TO_VarDecl:
          case SyntaxProductionInit.Start_VarDecl_TO_VarDecl:
              typeSystem.addDeclarator((Symbol) attributeForParentNode, Declarator.POINTER);
              break;

          case SyntaxProductionInit.VarDecl_LB_ConstExpr_RB_TO_VarDecl:
              Declarator declarator = typeSystem.addDeclarator((Symbol) valueStack.get(valueStack.size() - 4), Declarator.ARRAY);
              int arrayNum = (Integer) attributeForParentNode;
              declarator.setElementNum(arrayNum);
              attributeForParentNode = valueStack.get(valueStack.size() - 4);
              break;

          case SyntaxProductionInit.Name_TO_Unary:
              attributeForParentNode = typeSystem.getSymbolByText(text, nestingLevel, symbolScope);
              break;

          case SyntaxProductionInit.ExtDeclList_COMMA_ExtDecl_TO_ExtDeclList:
          case SyntaxProductionInit.VarList_COMMA_ParamDeclaration_TO_VarList:
          case SyntaxProductionInit.DeclList_Comma_Decl_TO_DeclList:
          case SyntaxProductionInit.DefList_Def_TO_DefList: {

              Symbol currentSym = (Symbol) attributeForParentNode;
              Symbol lastSym = null;
              if (productionNum == SyntaxProductionInit.DefList_Def_TO_DefList) {
                  lastSym = (Symbol) valueStack.get(valueStack.size() - 2);
              } else {
                  lastSym = (Symbol) valueStack.get(valueStack.size() - 3);
              }

              currentSym.setNextSymbol(lastSym);
          }
          break;
          case SyntaxProductionInit.OptSpecifier_ExtDeclList_Semi_TO_ExtDef:
          case SyntaxProductionInit.TypeNT_VarDecl_TO_ParamDeclaration:
          case SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def:
              Symbol symbol = (Symbol) attributeForParentNode;

              TypeLink specifier;
              if (productionNum == SyntaxProductionInit.TypeNT_VarDecl_TO_ParamDeclaration) {
                  specifier = (TypeLink) (valueStack.get(valueStack.size() - 2));
              } else {
                  specifier = (TypeLink) (valueStack.get(valueStack.size() - 3));
              }

              typeSystem.addSpecifierToDeclaration(specifier, symbol);
              typeSystem.addSymbolsToTable(symbol, symbolScope);

              handleStructVariable(symbol);
              break;

          case SyntaxProductionInit.VarDecl_Equal_Initializer_TO_Decl:
              //Here you need to set the Symbol object for the response, otherwise there will be an error above Symbol symbol = (Symbol)attributeForParentNode;
              attributeForParentNode = (Symbol) valueStack.get(valueStack.size() - 2);
              break;

          case SyntaxProductionInit.NewName_LP_VarList_RP_TO_FunctDecl:
              setFunctionSymbol(true);
              Symbol argList = (Symbol) valueStack.get(valueStack.size() - 2);
              ((Symbol) attributeForParentNode).args = argList;
              typeSystem.addSymbolsToTable((Symbol) attributeForParentNode, symbolScope);

              symbolScope = ((Symbol) attributeForParentNode).getName();
              Symbol sym = argList;
              while (sym != null) {
                  sym.addScope(symbolScope);
                  sym = sym.getNextSymbol();
              }
              break;

          case SyntaxProductionInit.NewName_LP_RP_TO_FunctDecl:
              setFunctionSymbol(false);
              typeSystem.addSymbolsToTable((Symbol) attributeForParentNode, symbolScope);
              symbolScope = ((Symbol) attributeForParentNode).getName();

              break;

          case SyntaxProductionInit.OptSpecifiers_FunctDecl_CompoundStmt_TO_ExtDef:
              symbol = (Symbol) valueStack.get(valueStack.size() - 2);
              specifier = (TypeLink) (valueStack.get(valueStack.size() - 3));
              typeSystem.addSpecifierToDeclaration(specifier, symbol);

              symbolScope = GLOBAL_SCOPE;
              break;

          case SyntaxProductionInit.Name_To_Tag:
              symbolScope = text;
              attributeForParentNode = typeSystem.getStructFromTable(text);
              if (attributeForParentNode == null) {
                  attributeForParentNode = new StructDefine(text, nestingLevel, null);
                  typeSystem.addStructToTable((StructDefine) attributeForParentNode);
              }

              break;

          case SyntaxProductionInit.Struct_OptTag_LC_DefList_RC_TO_StructSpecifier:
              Symbol defList = (Symbol) valueStack.get(valueStack.size() - 2);
              StructDefine structObj = (StructDefine) valueStack.get(valueStack.size() - 4);
              structObj.setFields(defList);
              attributeForParentNode = structObj;
              symbolScope = GLOBAL_SCOPE;
              break;

          case SyntaxProductionInit.Enum_TO_EnumNT:
              enumVal = 0;
              break;

          case SyntaxProductionInit.NameNT_TO_Emurator:
              doEnum();
              break;
          case SyntaxProductionInit.Name_Eequal_ConstExpr_TO_Enuerator:
              enumVal = (Integer) (valueStack.get(valueStack.size() - 1));
              attributeForParentNode = (Symbol) (valueStack.get(valueStack.size() - 3));
              doEnum();
              break;
          case SyntaxProductionInit.Number_TO_ConstExpr:
          case SyntaxProductionInit.Number_TO_Unary:
              attributeForParentNode = Integer.valueOf(text);
              break;
          default:
              break;
      }

      astBuilder.buildSyntaxTree(productionNum, text);
  }