項目的完整代碼在 C2j-Compilerjava
在上一篇完成了符號表的構建,下一步就是輸出抽象語法樹(Abstract Syntax Tree,AST)node
抽象語法樹(abstract syntax tree 或者縮寫爲 AST),是源代碼的抽象語法結構的樹狀表現形式,這裏特指編程語言的源代碼。樹上的每一個節點都表示源代碼中的一種結構。git
AST對於編譯器是相當重要的,如今的編譯型語言通常經過AST來生成IR,解釋型語言也能夠不用虛擬機而直接遍歷AST來解釋執行,以後要寫解釋器和編譯器都依賴這個ASTgithub
這一篇主要文件有:編程
public interface AstNode { AstNode addChild(AstNode node); AstNode getParent(); ArrayList<AstNode> getChildren(); void setAttribute(NodeKey key, Object value); Object getAttribute(NodeKey key); boolean isChildrenReverse(); void reverseChildren(); AstNode copy(); }
這是對AstNode接口的實現,而且繼承HashMap,這裏的NodeKey是數組
TokenType, VALUE, SYMBOL, PRODUCTION, TEXT
對應的value,數據結構
public class AstNodeImpl extends HashMap<NodeKey, Object> implements AstNode { private Token type; private AstNodeImpl parent; private ArrayList<AstNode> children; String name; private boolean isChildrenReverse = false; public AstNodeImpl(Token type) { this.type = type; this.parent = null; this.children = new ArrayList<>(); setAttribute(NodeKey.TokenType, type); } @Override public AstNode addChild(AstNode node) { if (node != null) { children.add(node); ((AstNodeImpl) node).parent = this; } return node; } @Override public AstNode getParent() { return parent; } @Override public void reverseChildren() { if (isChildrenReverse) { return; } Collections.reverse(children); isChildrenReverse = true; } @Override public boolean isChildrenReverse() { return isChildrenReverse; } @Override public ArrayList<AstNode> getChildren() { reverseChildren(); return children; } @Override public void setAttribute(NodeKey key, Object value) { if (key == NodeKey.TEXT) { name = (String) value; } put(key, value); } @Override public Object getAttribute(NodeKey key) { return get(key); } @Override public String toString() { String info = ""; if (get(NodeKey.VALUE) != null) { info += "Node Value is " + get(NodeKey.VALUE).toString(); } if (get(NodeKey.TEXT) != null) { info += "\nNode Text is " + get(NodeKey.TEXT).toString(); } if (get(NodeKey.SYMBOL) != null) { info += "\nNode Symbol is " + get(NodeKey.SYMBOL).toString(); } return info + "\n Node Type is " + type.toString(); } @Override public AstNode copy() { AstNodeImpl copy = (AstNodeImpl) NodeFactory.createICodeNode(type); Set<Entry<NodeKey, Object>> attributes = entrySet(); Iterator<Map.Entry<NodeKey, Object>> it = attributes.iterator(); while (it.hasNext()) { Map.Entry<NodeKey, Object> attribute = it.next(); copy.put(attribute.getKey(), attribute.getValue()); } return copy; } }
NodeFactory就是簡單的返回一個節點的實現編程語言
public class NodeFactory { public static AstNode createICodeNode(Token type) { return new AstNodeImpl(type); } }
AST的建立也是須要在語法分析過程當中根據reduce操做進行操做的。也就是在takeActionForReduce方法中調用AstBuilder的buildSyntaxTree方法ide
在AstBuilder裏面仍是須要兩個堆棧來輔助操做函數
private Stack<AstNode> nodeStack = new Stack<>(); private LRStateTableParser parser = null; private TypeSystem typeSystem = null; private Stack<Object> valueStack = null; private String functionName; private HashMap<String, AstNode> funcMap = new HashMap<>(); private static AstBuilder instance;
構造AST的主要邏輯在buildSyntaxTree方法裏,須要注意的是有一些節點在解釋執行和代碼生成的時候是不同的,有時代碼生成須要的節點解釋執行的話並不須要
在這裏提一下UNARY這個非終結符,這個非終結符和NAME很像,可是它通常是表明進行運算和一些操做的時候,好比數組,++,--或者函數調用的時候
其實構建AST的過程和符號表的構建過程有點兒相似,都是根據reduce操做來建立信息和組合信息,符號表是組合修飾符說明符等,而AST則是組合節點間的關係變成一棵樹
咱們只看幾個操做
* DEF -> SPECIFIERS DECL_LIST SEMI * DECL -> VAR_DECL * VAR_DECL -> NEW_NAME * | VAR_DECL LP RP * | VAR_DECL LP VAR_LIST RP * | LP VAR_DECL RP * | START VAR_DECL
從推導式能夠看出,DEF節點的符號應該在valueStack.size() - 3,可是DECL和VAR_DECL沒有作reduce操做,因此符號應該在valueStack.size() - 2。這其實和前面的符號表構建算出以前符號的位置是同樣的。
這裏則是變量、數字或者字符串的節點,若是是個變量的號,這個節點就須要一個Symbol的value了
case SyntaxProductionInit.Number_TO_Unary: case SyntaxProductionInit.Name_TO_Unary: case SyntaxProductionInit.String_TO_Unary: node = NodeFactory.createICodeNode(Token.UNARY); if (production == SyntaxProductionInit.Name_TO_Unary) { assignSymbolToNode(node, text); } node.setAttribute(NodeKey.TEXT, text); break;
其他的節點無非是把一些語句拆分它的邏輯而後組成節點,真正的求值部分像Name_TO_Unary比較少,更可能是好比把一個if else塊分紅if節點、判斷節點、else節點,以後再按照這棵樹進行解釋執行或者代碼生成
public AstNode buildSyntaxTree(int production, String text) { AstNode node = null; Symbol symbol = null; AstNode child = null; if (Start.STARTTYPE == Start.INTERPRETER) { int p1 = SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def; int p2 = SyntaxProductionInit.Def_To_DefList; int p3 = SyntaxProductionInit.DefList_Def_TO_DefList; boolean isReturn = production == p1 || production == p2 || production == p3; if (isReturn) { return null; } } switch (production) { case SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def: node = NodeFactory.createICodeNode(Token.DEF); symbol = (Symbol) valueStack.get(valueStack.size() - 2); node.setAttribute(NodeKey.SYMBOL, symbol); break; case SyntaxProductionInit.Def_To_DefList: node = NodeFactory.createICodeNode(Token.DEF_LIST); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.DefList_Def_TO_DefList: node = NodeFactory.createICodeNode(Token.DEF_LIST); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Number_TO_Unary: case SyntaxProductionInit.Name_TO_Unary: case SyntaxProductionInit.String_TO_Unary: node = NodeFactory.createICodeNode(Token.UNARY); if (production == SyntaxProductionInit.Name_TO_Unary) { assignSymbolToNode(node, text); } node.setAttribute(NodeKey.TEXT, text); break; case SyntaxProductionInit.Unary_LP_RP_TO_Unary: node = NodeFactory.createICodeNode(Token.UNARY); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Unary_LP_ARGS_RP_TO_Unary: node = NodeFactory.createICodeNode(Token.UNARY); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Unary_Incop_TO_Unary: case SyntaxProductionInit.Unary_DecOp_TO_Unary: case SyntaxProductionInit.LP_Expr_RP_TO_Unary: case SyntaxProductionInit.Start_Unary_TO_Unary: node = NodeFactory.createICodeNode(Token.UNARY); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Unary_LB_Expr_RB_TO_Unary: node = NodeFactory.createICodeNode(Token.UNARY); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Uanry_TO_Binary: node = NodeFactory.createICodeNode(Token.BINARY); child = nodeStack.pop(); node.setAttribute(NodeKey.TEXT, child.getAttribute(NodeKey.TEXT)); node.addChild(child); break; case SyntaxProductionInit.Binary_TO_NoCommaExpr: case SyntaxProductionInit.NoCommaExpr_Equal_NoCommaExpr_TO_NoCommaExpr: node = NodeFactory.createICodeNode(Token.NO_COMMA_EXPR); child = nodeStack.pop(); String t = (String) child.getAttribute(NodeKey.TEXT); node.addChild(child); if (production == SyntaxProductionInit.NoCommaExpr_Equal_NoCommaExpr_TO_NoCommaExpr) { child = nodeStack.pop(); t = (String) child.getAttribute(NodeKey.TEXT); node.addChild(child); } break; case SyntaxProductionInit.Binary_Plus_Binary_TO_Binary: case SyntaxProductionInit.Binary_DivOp_Binary_TO_Binary: case SyntaxProductionInit.Binary_Minus_Binary_TO_Binary: case SyntaxProductionInit.Binary_Start_Binary_TO_Binary: node = NodeFactory.createICodeNode(Token.BINARY); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Binary_RelOP_Binary_TO_Binray: node = NodeFactory.createICodeNode(Token.BINARY); node.addChild(nodeStack.pop()); AstNode operator = NodeFactory.createICodeNode(Token.RELOP); operator.setAttribute(NodeKey.TEXT, parser.getRelOperatorText()); node.addChild(operator); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.NoCommaExpr_TO_Expr: node = NodeFactory.createICodeNode(Token.EXPR); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Expr_Semi_TO_Statement: case SyntaxProductionInit.CompountStmt_TO_Statement: node = NodeFactory.createICodeNode(Token.STATEMENT); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.LocalDefs_TO_Statement: node = NodeFactory.createICodeNode(Token.STATEMENT); if (Start.STARTTYPE == Start.CODEGEN) { node.addChild(nodeStack.pop()); } break; case SyntaxProductionInit.Statement_TO_StmtList: node = NodeFactory.createICodeNode(Token.STMT_LIST); if (nodeStack.size() > 0) { node.addChild(nodeStack.pop()); } break; case SyntaxProductionInit.FOR_OptExpr_Test_EndOptExpr_Statement_TO_Statement: node = NodeFactory.createICodeNode(Token.STATEMENT); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.StmtList_Statement_TO_StmtList: node = NodeFactory.createICodeNode(Token.STMT_LIST); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Expr_TO_Test: node = NodeFactory.createICodeNode(Token.TEST); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.If_Test_Statement_TO_IFStatement: node = NodeFactory.createICodeNode(Token.IF_STATEMENT); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.IfElseStatemnt_Else_Statemenet_TO_IfElseStatement: node = NodeFactory.createICodeNode(Token.IF_ELSE_STATEMENT); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.While_LP_Test_Rp_TO_Statement: case SyntaxProductionInit.Do_Statement_While_Test_To_Statement: node = NodeFactory.createICodeNode(Token.STATEMENT); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Expr_Semi_TO_OptExpr: case SyntaxProductionInit.Semi_TO_OptExpr: node = NodeFactory.createICodeNode(Token.OPT_EXPR); if (production == SyntaxProductionInit.Expr_Semi_TO_OptExpr) { node.addChild(nodeStack.pop()); } break; case SyntaxProductionInit.Expr_TO_EndOpt: node = NodeFactory.createICodeNode(Token.END_OPT_EXPR); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.LocalDefs_StmtList_TO_CompoundStmt: node = NodeFactory.createICodeNode(Token.COMPOUND_STMT); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.NewName_LP_RP_TO_FunctDecl: case SyntaxProductionInit.NewName_LP_VarList_RP_TO_FunctDecl: node = NodeFactory.createICodeNode(Token.FUNCT_DECL); node.addChild(nodeStack.pop()); child = node.getChildren().get(0); functionName = (String) child.getAttribute(NodeKey.TEXT); symbol = assignSymbolToNode(node, functionName); break; case SyntaxProductionInit.NewName_TO_VarDecl: nodeStack.pop(); break; case SyntaxProductionInit.NAME_TO_NewName: node = NodeFactory.createICodeNode(Token.NEW_NAME); node.setAttribute(NodeKey.TEXT, text); break; case SyntaxProductionInit.OptSpecifiers_FunctDecl_CompoundStmt_TO_ExtDef: node = NodeFactory.createICodeNode(Token.EXT_DEF); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); funcMap.put(functionName, node); break; case SyntaxProductionInit.NoCommaExpr_TO_Args: node = NodeFactory.createICodeNode(Token.ARGS); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.NoCommaExpr_Comma_Args_TO_Args: node = NodeFactory.createICodeNode(Token.ARGS); node.addChild(nodeStack.pop()); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Return_Semi_TO_Statement: node = NodeFactory.createICodeNode(Token.STATEMENT); break; case SyntaxProductionInit.Return_Expr_Semi_TO_Statement: node = NodeFactory.createICodeNode(Token.STATEMENT); node.addChild(nodeStack.pop()); break; case SyntaxProductionInit.Unary_StructOP_Name_TO_Unary: node = NodeFactory.createICodeNode(Token.UNARY); node.addChild(nodeStack.pop()); node.setAttribute(NodeKey.TEXT, text); break; default: break; } if (node != null) { node.setAttribute(NodeKey.PRODUCTION, production); nodeStack.push(node); } return node; }
其實構造AST和建立符號表上很是類似,都是依據reduce操做的信息來完成。在AST的構建中的主要任務就是對源代碼語句裏的邏輯進行分塊,好比對於一個ifelse語句:
上面的圖是我依據這個意思話的,和上面構造出來的AST不徹底一致
另外個人github博客:https://dejavudwh.cn/