AST 抽象語法樹

時間 2019-11-21

標籤 ast 抽象語法樹简体版

原文原文鏈接

提起 AST 抽象語法樹，你們可能並不感冒。可是提到它的使用場景，也許會讓你大吃一驚。原來它一直在你左右與你相伴，而你殊不知。
node

1、什麼是抽象語法樹

在計算機科學中，抽象語法樹（abstract syntax tree 或者縮寫爲 AST），或者語法樹（syntax tree），是源代碼的抽象語法結構的樹狀表現形式，這裏特指編程語言的源代碼。樹上的每一個節點都表示源代碼中的一種結構。git

之因此說語法是「抽象」的，是由於這裏的語法並不會表示出真實語法中出現的每一個細節。github

2、使用場景

JS 反編譯，語法解析
Babel 編譯 ES6 語法
代碼高亮
關鍵字匹配
做用域判斷
代碼壓縮

3、AST Explorer

咱們來看一個 ES6 的解釋器，聲明以下的代碼：web

1 let tips = [
2   "Jartto's AST Demo"
3 ];

看看是如何解析的, JSON 格式以下：express

 1 {
 2   "type": "Program",
 3   "start": 0,
 4   "end": 38,
 5   "body": [
 6     {
 7       "type": "VariableDeclaration",
 8       "start": 0,
 9       "end": 37,
10       "declarations": [
11         {
12           "type": "VariableDeclarator",
13           "start": 4,
14           "end": 36,
15           "id": {
16             "type": "Identifier",
17             "start": 4,
18             "end": 8,
19             "name": "tips"
20           },
21           "init": {
22             "type": "ArrayExpression",
23             "start": 11,
24             "end": 36,
25             "elements": [
26               {
27                 "type": "Literal",
28                 "start": 15,
29                 "end": 34,
30                 "value": "Jartto's AST Demo",
31                 "raw": "\"Jartto's AST Demo\""
32               }
33             ]
34           }
35         }
36       ],
37       "kind": "let"
38     }
39   ],
40   "sourceType": "module"
41 }

而它的語法樹大概如此：
編程

每一個結構都看的清清楚楚，這時候咱們會發現，這和 Dom 樹真的差不了多少。再來看一個例子：數組

1 (1+2)*3

AST Tree:
babel

咱們刪掉括號，看看規則是如何變化的？JSON 格式會一目瞭然：編程語言

 1 {
 2   "type": "Program",
 3   "start": 0,
 4   "end": 6,
 5   "body": [
 6     {
 7       "type": "ExpressionStatement",
 8       "start": 0,
 9       "end": 5,
10       "expression": {
11         "type": "BinaryExpression",
12         "start": 0,
13         "end": 5,
14         "left": {
15           "type": "Literal",
16           "start": 0,
17           "end": 1,
18           "value": 1,
19           "raw": "1"
20         },
21         "operator": "+",
22         "right": {
23           "type": "BinaryExpression",
24           "start": 2,
25           "end": 5,
26           "left": {
27             "type": "Literal",
28             "start": 2,
29             "end": 3,
30             "value": 2,
31             "raw": "2"
32           },
33           "operator": "*",
34           "right": {
35             "type": "Literal",
36             "start": 4,
37             "end": 5,
38             "value": 3,
39             "raw": "3"
40           }
41         }
42       }
43     }
44   ],
45   "sourceType": "module"
46 }

能夠看出來，（1+2)*3 和 1+2*3，語法樹是有差異的：
1.在肯定類型爲 ExpressionStatement 後，它會按照代碼執行的前後順序，將表達式 BinaryExpression 分爲 Left，operator 和 right 三塊；
2.每塊標明瞭類型，起止位置，值等信息；
3.操做符類型；函數

再來看看咱們最經常使用的箭頭函數：

1 const mytest = (a,b) => {
2   return a+b;
3 }

JSON 格式以下：

 1 {
 2   "type": "Program",
 3   "start": 0,
 4   "end": 42,
 5   "body": [
 6     {
 7       "type": "VariableDeclaration",
 8       "start": 0,
 9       "end": 41,
10       "declarations": [
11         {
12           "type": "VariableDeclarator",
13           "start": 6,
14           "end": 41,
15           "id": {
16             "type": "Identifier",
17             "start": 6,
18             "end": 12,
19             "name": "mytest"
20           },
21           "init": {
22             "type": "ArrowFunctionExpression",
23             "start": 15,
24             "end": 41,
25             "id": null,
26             "expression": false,
27             "generator": false,
28             "params": [
29               {
30                 "type": "Identifier",
31                 "start": 16,
32                 "end": 17,
33                 "name": "a"
34               },
35               {
36                 "type": "Identifier",
37                 "start": 18,
38                 "end": 19,
39                 "name": "b"
40               }
41             ],
42             "body": {
43               "type": "BlockStatement",
44               "start": 24,
45               "end": 41,
46               "body": [
47                 {
48                   "type": "ReturnStatement",
49                   "start": 28,
50                   "end": 39,
51                   "argument": {
52                     "type": "BinaryExpression",
53                     "start": 35,
54                     "end": 38,
55                     "left": {
56                       "type": "Identifier",
57                       "start": 35,
58                       "end": 36,
59                       "name": "a"
60                     },
61                     "operator": "+",
62                     "right": {
63                       "type": "Identifier",
64                       "start": 37,
65                       "end": 38,
66                       "name": "b"
67                     }
68                   }
69                 }
70               ]
71             }
72           }
73         }
74       ],
75       "kind": "const"
76     }
77   ],
78   "sourceType": "module"
79 }

AST Tree 結構以下圖：

咱們注意到了，增長了幾個新的字眼：

ArrowFunctionExpression
BlockStatement
ReturnStatement

到這裏，其實咱們已經慢慢明白了：

抽象語法樹其實就是將一類標籤轉化成通用標識符，從而結構出的一個相似於樹形結構的語法樹。

4、深刻原理

可視化的工具可讓咱們迅速有感官認識，那麼具體內部是如何實現的呢？

繼續使用上文的例子：

1 Function getAST(){}

JSON 也很簡單：

 1 {
 2   "type": "Program",
 3   "start": 0,
 4   "end": 19,
 5   "body": [
 6     {
 7       "type": "FunctionDeclaration",
 8       "start": 0,
 9       "end": 19,
10       "id": {
11         "type": "Identifier",
12         "start": 9,
13         "end": 15,
14         "name": "getAST"
15       },
16       "expression": false,
17       "generator": false,
18       "params": [],
19       "body": {
20         "type": "BlockStatement",
21         "start": 17,
22         "end": 19,
23         "body": []
24       }
25     }
26   ],
27   "sourceType": "module"
28 }

懷着好奇的心態，咱們來模擬一下用代碼實現：

 1 const esprima = require('esprima'); //解析js的語法的包
 2 const estraverse = require('estraverse'); //遍歷樹的包
 3 const escodegen = require('escodegen'); //生成新的樹的包
 4 let code = `function getAST(){}`;
 5 //解析js的語法
 6 let tree = esprima.parseScript(code);
 7 //遍歷樹
 8 estraverse.traverse(tree, {
 9   enter(node) {
10     console.log('enter: ' + node.type);
11   },
12   leave(node) {
13     console.log('leave: ' + node.type);
14   }
15 });
16 //生成新的樹
17 let r = escodegen.generate(tree);
18 console.log(r);

運行後，輸出：

 1 enter: Program
 2 enter: FunctionDeclaration
 3 enter: Identifier
 4 leave: Identifier
 5 enter: BlockStatement
 6 leave: BlockStatement
 7 leave: FunctionDeclaration
 8 leave: Program
 9 function getAST() {
10 }

咱們看到了遍歷語法樹的過程，這裏應該是深度優先遍歷。

稍做修改，咱們來改變函數的名字 getAST => Jartto：

 1 const esprima = require('esprima'); //解析js的語法的包
 2 const estraverse = require('estraverse'); //遍歷樹的包
 3 const escodegen = require('escodegen'); //生成新的樹的包
 4 let code = `function getAST(){}`;
 5 //解析js的語法
 6 let tree = esprima.parseScript(code);
 7 //遍歷樹
 8 estraverse.traverse(tree, {
 9   enter(node) {
10     console.log('enter: ' + node.type);
11     if (node.type === 'Identifier') {
12       node.name = 'Jartto';
13     }
14   }
15 });
16 //生成新的樹
17 let r = escodegen.generate(tree);
18 console.log(r);

運行後，輸出：

1 enter: Program
2 enter: FunctionDeclaration
3 enter: Identifier
4 enter: BlockStatement
5 function Jartto() {
6 }

能夠看到，在咱們的干預下，輸出的結果發生了變化，方法名編譯後方法名變成了 Jartto。

這就是抽象語法樹的強大之處，本質上經過編譯，咱們能夠去改變任何輸出結果。

補充一點：關於 node 類型，全集大體以下：

(parameter) node: Identifier | SimpleLiteral | RegExpLiteral | Program | FunctionDeclaration | FunctionExpression | ArrowFunctionExpression | SwitchCase | CatchClause | VariableDeclarator | ExpressionStatement | BlockStatement | EmptyStatement | DebuggerStatement | WithStatement | ReturnStatement | LabeledStatement | BreakStatement | ContinueStatement | IfStatement | SwitchStatement | ThrowStatement | TryStatement | WhileStatement | DoWhileStatement | ForStatement | ForInStatement | ForOfStatement | VariableDeclaration | ClassDeclaration | ThisExpression | ArrayExpression | ObjectExpression | YieldExpression | UnaryExpression | UpdateExpression | BinaryExpression | AssignmentExpression | LogicalExpression | MemberExpression | ConditionalExpression | SimpleCallExpression | NewExpression | SequenceExpression | TemplateLiteral | TaggedTemplateExpression | ClassExpression | MetaProperty | AwaitExpression | Property | AssignmentProperty | Super | TemplateElement | SpreadElement | ObjectPattern | ArrayPattern | RestElement | AssignmentPattern | ClassBody | MethodDefinition | ImportDeclaration | ExportNamedDeclaration | ExportDefaultDeclaration | ExportAllDeclaration | ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier | ExportSpecifier

說到這裏，聰明的你，可能想到了 Babel，想到了 js 混淆，想到了更多背後的東西。接下來，咱們要介紹介紹 Babel 是如何將 ES6 轉成 ES5 的。

5、關於 `Babel`

因爲 ES6 的兼容問題，不少狀況下，咱們都在使用 Babel 插件來進行編譯，那麼有沒有想過 Babel 是如何工做的呢？先來看看：

1 let sum = (a, b)=>{return a+b};

AST 大概如此：

JSON 格式可能會看的清楚些：

 1 {
 2   "type": "Program",
 3   "start": 0,
 4   "end": 31,
 5   "body": [
 6     {
 7       "type": "VariableDeclaration",
 8       "start": 0,
 9       "end": 31,
10       "declarations": [
11         {
12           "type": "VariableDeclarator",
13           "start": 4,
14           "end": 30,
15           "id": {
16             "type": "Identifier",
17             "start": 4,
18             "end": 7,
19             "name": "sum"
20           },
21           "init": {
22             "type": "ArrowFunctionExpression",
23             "start": 10,
24             "end": 30,
25             "id": null,
26             "expression": false,
27             "generator": false,
28             "params": [
29               {
30                 "type": "Identifier",
31                 "start": 11,
32                 "end": 12,
33                 "name": "a"
34               },
35               {
36                 "type": "Identifier",
37                 "start": 14,
38                 "end": 15,
39                 "name": "b"
40               }
41             ],
42             "body": {
43               "type": "BlockStatement",
44               "start": 18,
45               "end": 30,
46               "body": [
47                 {
48                   "type": "ReturnStatement",
49                   "start": 19,
50                   "end": 29,
51                   "argument": {
52                     "type": "BinaryExpression",
53                     "start": 26,
54                     "end": 29,
55                     "left": {
56                       "type": "Identifier",
57                       "start": 26,
58                       "end": 27,
59                       "name": "a"
60                     },
61                     "operator": "+",
62                     "right": {
63                       "type": "Identifier",
64                       "start": 28,
65                       "end": 29,
66                       "name": "b"
67                     }
68                   }
69                 }
70               ]
71             }
72           }
73         }
74       ],
75       "kind": "let"
76     }
77   ],
78   "sourceType": "module"
79 }

結構大概如此，那咱們再用代碼模擬一下：

 1 const babel = require('babel-core'); //babel核心解析庫
 2 const t = require('babel-types'); //babel類型轉化庫
 3 let code = `let sum = (a, b)=>{return a+b}`;
 4 let ArrowPlugins = {
 5 //訪問者模式
 6 visitor: {
 7   //捕獲匹配的API
 8     ArrowFunctionExpression(path) {
 9       let { node } = path;
10       let body = node.body;
11       let params = node.params;
12       let r = t.functionExpression(null, params, body, false, false);
13       path.replaceWith(r);
14     }
15   }
16 }
17 let d = babel.transform(code, {
18   plugins: [
19     ArrowPlugins
20   ]
21 })
22 console.log(d.code);

記得安裝 babel-core，babel-types 這倆插件，以後運行 babel.js，咱們看到了這樣的輸出：

1 let sum = function (a, b) {
2   return a + b;
3 };

這裏，咱們完美的將箭頭函數轉換成了標準函數。

那麼問題又來了，若是是簡寫呢，像這樣，還能正常編譯嗎？

1 let sum = (a, b)=>a+b

Body 部分的結構發生了變化，因此，咱們的 babel.js 運行就會報錯了。

TypeError: unknown: Property body of FunctionExpression expected node to be of a type ["BlockStatement"] but instead got "BinaryExpression"

意思很明瞭，咱們的 body 類型變成 BinaryExpression 再也不是 BlockStatement，因此須要作一些修改：

 1 const babel = require('babel-core'); //babel核心解析庫
 2 const t = require('babel-types'); //babel類型轉化庫
 3 let code = `let sum = (a, b)=> a+b`;
 4 let ArrowPlugins = {
 5 //訪問者模式
 6   visitor: {
 7   //捕獲匹配的API
 8     ArrowFunctionExpression(path) {
 9       let { node } = path;
10       let params = node.params;
11       let body = node.body;
12       if(!t.isBlockStatement(body)){
13         let returnStatement = t.returnStatement(body);
14         body = t.blockStatement([returnStatement]);
15       }
16       let r = t.functionExpression(null, params, body, false, false);
17       path.replaceWith(r);
18     }
19   }
20 }
21 let d = babel.transform(code, {
22   plugins: [
23     ArrowPlugins
24   ]
25 })
26 console.log(d.code);

看看輸出結果：

1 let sum = function (a, b) {
2   return a + b;
3 };

看起來不錯，堪稱完美～

6、深刻 Babel

固然，上文咱們簡單演示了 Babel 是如何來編譯代碼的，可是並不是簡單如此。

Babel 使用一個基於 ESTree 並修改過的 AST，它的內核說明文檔能夠在這裏找到。

正如咱們上面示例代碼同樣，Babel 的三個主要處理步驟分別是：解析（parse），轉換（transform），生成（generate）。

1.解析（parse）：解析步驟接收代碼並輸出 AST。這個步驟分爲兩個階段：詞法分析 Lexical Analysis 和語法分析Syntactic Analysis。

詞法分析：詞法分析階段把字符串形式的代碼轉換爲令牌（tokens）流。你能夠把令牌看做是一個扁平的語法片斷數組：
```
n * n;
```

例如上面的代碼片斷，解析結果以下：

[
  { type: { ... }, value: "n", start: 0, end: 1, loc: { ... } },
  { type: { ... }, value: "*", start: 2, end: 3, loc: { ... } },
  { type: { ... }, value: "n", start: 4, end: 5, loc: { ... } },
  ...
]

每個 type 有一組屬性來描述該令牌，和 AST 節點同樣它們也有 start，end，loc 屬性：

{
  type: {
    label: 'name',
    keyword: undefined,
    beforeExpr: false,
    startsExpr: true,
    rightAssociative: false,
    isLoop: false,
    isAssign: false,
    prefix: false,
    postfix: false,
    binop: null,
    updateContext: null
  },
  ...
}

語法分析：語法分析階段會把一個令牌流轉換成 AST 的形式。這個階段會使用令牌中的信息把它們轉換成一個 AST 的表述結構，這樣更易於後續的操做。

2.轉換（transform）：接收 AST 並對其進行遍歷，在此過程當中對節點進行添加、更新及移除等操做。這是 Babel 或是其餘編譯器中最複雜的過程，同時也是插件將要介入工做的部分。

3.生成（generate）：代碼生成步驟把最終（通過一系列轉換以後）的 AST 轉換成字符串形式的代碼，同時還會建立源碼映射（source maps）。

代碼生成其實很簡單：深度優先遍歷整個 AST，而後構建能夠表示轉換後代碼的字符串。

瞭解這這些過程，咱們回頭再來參悟一下以前的示例代碼：

 1 const babel = require('babel-core'); //babel核心解析庫
 2 const t = require('babel-types'); //babel類型轉化庫
 3 let code = `let sum = (a, b)=>{return a+b}`;
 4 let ArrowPlugins = {
 5 //訪問者模式
 6   visitor: {
 7   //捕獲匹配的API
 8     ArrowFunctionExpression(path) {
 9       let { node } = path;
10       let body = node.body;
11       let params = node.params;
12       let r = t.functionExpression(null, params, body, false, false);
13       path.replaceWith(r);
14     }
15   }
16 }
17 let d = babel.transform(code, {
18   plugins: [
19     ArrowPlugins
20   ]
21 })
22 console.log(d.code);

是否是發現忽然簡單易懂了。

7、關於遍歷

想要轉換 AST 你須要進行遞歸的樹形遍歷。

比方說咱們有一個 FunctionDeclaration 類型。它有幾個屬性：id，params，和 body，每個都有一些內嵌節點。

 1 {
 2   type: "FunctionDeclaration",
 3   id: {
 4     type: "Identifier",
 5     name: "square"
 6   },
 7   params: [{
 8     type: "Identifier",
 9     name: "n"
10   }],
11   body: {
12     type: "BlockStatement",
13     body: [{
14       type: "ReturnStatement",
15       argument: {
16         type: "BinaryExpression",
17         operator: "*",
18         left: {
19           type: "Identifier",
20           name: "n"
21         },
22         right: {
23           type: "Identifier",
24           name: "n"
25         }
26       }
27     }]
28   }
29 }