手寫編譯器之詞法分析器一

時間 2019-11-09

標籤手寫編譯器詞法分析器简体版

原文原文鏈接

　　寫一個編譯器，首先要知道的就是什麼是編譯器，我以爲能看到我這篇文章的基本上都知道了。我認爲，編譯器就是node

讓計算機讀懂代碼的程序，在這個程序裏，定義了各類規則（編程語言的語法），只要人們按照這個規則和計算機說ios

話（編程）就能讓計算機懂得咱們想幹嗎。編程

　　編譯器包括幾個模塊，也能夠說是過程，即詞法分析，語法分析，中間代碼生成等等。好吧我認可我知道的不清楚，數組

不過萬物起源詞法分析（我編的）必定沒問題。這裏咱們就先來第一步，詞法分析。詞法分析是編譯器中比較簡單的模塊了，數據結構

也是最基礎的模塊。它的做用就是將輸入的程序文本文件切割成一個一個的單詞和符號，以便接下來的模塊使用。編程語言

　　好了，高深的道理就不講了，我如今的目標就是要寫一個函數，這個函數接收一個文件地址，將該文件中的代碼分割ide

成單詞和符號，每一個單詞和符號被稱爲一個token，返回一個鏈表，存儲全部的token。函數定義就寫爲：函數

TokenList_tag LexExcute(string sourcefilepath);//TokenList_tag是鏈表頭指針，指向頭節點，sourcefilepath爲源文件地址

　　我將詞法分析程序寫在lex.cpp文件中，二話不說先上幾行代碼，頓時感受自信心爆棚。代碼4-7行能夠用一行this

using namespace std;代替spa

1 #include<iostream>
2 #include<string >
3 #include<fstream>
4 using std::cout;
5 using std::string;
6 using std::endl;
7 using std::ifstream;

　　講道理如今應該先定義一下token了，不過我還不知道token怎麼定義，就先不這麼搞了，先解決讀取文件的問題，

定義一個string變量（filepath）存儲文件地址，定義文件流（in）用於讀取文件內容，而後從流中每次讀取一行，存入

字符串變量line_str，這樣思路就很清晰了，咱們從line_str中逐個取出字符，而後分析。

1 ifstream  in;//文件流
2 string    filepath;//文件路徑
3 string    line_str;//存儲每次讀出的行
4 int       line_len;//讀出行的長度
5 int       line_pos;//逐字符處理時，用於記錄位置
6 int       line_num;//記錄當前源文件第幾行，可能之後報語法錯誤的時候有用
7 char      ch;//逐字符處理時，記錄當前字符
8 string    token_str;//記錄token字符串

　　爲防止出錯，咱們定義初始化函數init()，將全部變量的初始化放在該函數中，以下：

 1 void init()
 2 {
 3     cout<<"func enter: init"<<endl;
 4     line_pos=0;
 5     line_num=0;
 6     token_str="";
 7     ch='\0';
 8     //end_of_file=false;
 9     //end_of_lex=false;
10     //tokenlist=new TokenNode_tag;
11     //tokenlist->token_str="#";
12     //tokenlist->next=NULL;
13     //listtail=tokenlist;
14     in.open(filepath.c_str());
15     cout<<"func end: init"<<endl;
16 }

　　在此函數執行以前，filepath就已經在函數外賦值了，我懶得傳參數，就在該函數調用前加了一句：

filepath=sourcefilepath; //sourcefilepath是傳進來的參數。

　　下面，咱們就開始定義token數據結構，目前我能想到的數據結構屬性只有兩個，一個token_str，

一個token_type。token_str用來保存token字符串，而token_type記錄該token是什麼類型，好比標識符，

關鍵字，數字等等。在定義token前先定義TokenType（枚舉類型），以下：

 1 enum TokenType
 2 {
 3 //順序不可改變
 4     KEYWORDS_INT,
 5     KEYWORDS_DOUBLE,
 6     KEYWORDS_FLOAT,
 7     KEYWORDS_IF,
 8     KEYWORDS_ELSE,
 9     KEYWORDS_ELSIF,
10     KEYWORDS_WHILE,
11     KEYWORDS_FOR,
12 
13 
14     IDENTIFY,
15     OP_EQUAL,//==
16     OP_ASSIGN,//=
17     OP_LP,//(
18     OP_RP,//)
19     OP_ADD,//+
20     OP_SUB,//-
21     OP_MUL,//*
22     OP_DIV,// /
23     OP_SEMI,//;
24     NUM_FLOAT,
25     NUM_INT
26 };

　　TokenType裏面的註釋確定就很清晰了，就很少介紹，下面就是TokenNode的定義：

1 typedef struct TokenNode_tag
2 {
3     string    token_str;
4     TokenType token_type;
5     struct    TokenNode_tag   *next;//用於製做鏈表
6 }*TokenList_tag,TokenNode_tag;

　　可能會有人疑惑，命名爲何要加一個tag呢？首先，名字是什麼都無所謂（關鍵字除外），其次

加一個tag是爲了區別內部使用和外部調用，所謂內部就是lex.cpp中的函數使用，外部就是爲了之後的

其餘模塊調用（到時候會給這些結構從新起名爲TokenNode）。

　　好了，萬事俱備，只欠最關鍵的函數了：

1 TokenList_tag LexExcute(string sourcefilepath)
2 {  
3     cout<<"func enter: main"<<endl;
4     filepath=sourcefilepath;//;
5     init();
　　　　//這裏將寫關鍵代碼，一個大while循環
6     cout<<"func end: main"<<endl;
7  }

　　這個函數做爲主函數，他的工做原理就是循環從line_str中取出字符，而後根據字符判斷目前是否是一個單詞

或符號，好比出現空格就說明字符串結束等（說法不許確）。好吧，我知道這裏有個自動機什麼的，我也講不清楚，

下面直接說個人實現方法：

　　我將詞法分析運做過程分爲幾個階段，每一個階段都用一種狀態記錄，好比說，當while大循環第一次運做的時候，

此時爲初始狀態（STATUS_NON），在這個狀態下若取出字符ch爲數字（0-9），那麼詞法分析狀態就轉變爲數字

態（STATUS_NUM）,在該狀態下若取出字符ch爲數字則狀態不變，若ch爲 ‘.’ （小數點）說明數字爲浮點型，狀態轉爲浮點

態（STATUS_FLOAT），若ch爲空或者其餘字符，說明數字結束，狀態轉爲初始態，開始下一個循環。

　　可能我本身以爲講的挺清晰，看到的人反而有困難，不如我就畫個圖：

　　好吧，真是不畫不知道本身畫多醜，不過拋開這些仍是挺清晰的吧。。。額，這不重要，這應該就有傳說

中的自動機的影子了吧，好吧，不強求能不能看懂了，最下面會有函數完整代碼，看懂代碼確定就沒問題了。

　　在寫主函數（LexExcute）以前，咱們還要作一些必要工做：

 1 bool      end_of_file;//初始爲false，當讀取文件結束的時候置爲true
 2 bool      end_of_lex;//當讀取文件結束，而其當前行也分析完畢時置爲true（詞法分析結束標誌）
 3 TokenList_tag tokenlist;//詞法分析器執行結束後產生的 token  串（主函數返回鏈表的頭指針）
 4 TokenNode_tag*listtail;//鏈表尾指針，用於尾部插入節點 5 void getline();//文件中讀取一行
 6 void getch();//獲取一個字符
 7 bool ischar();//是否爲a-z或者A-Z
 8 bool isnum();//是否爲0-9
 9 void concat();//鏈接到token_str
10 //
11 void backwords();//回退一個字符
12 void tokenlist_insert(TokenNode_tag*);//鏈表插入函數
13 void tokenlist_visit();//鏈表遍歷函數，用於檢查正誤，對功能沒用用
14 int iskeywords(string );//判斷是不是關鍵字

　　關鍵字數組，自動機狀態定義具體函數實現就看下面的完整代碼吧：

 1  string keywords[]=
 2 {
 3     //順序不可改變
 4     "int",
 5     "double",
 6     "float",
 7     "if",
 8     "else",
 9     "elsif",
10     "while",
11     "for"
12 };
13 
14 enum LexStatus
15 {
16     STATUS_NON,
17     STATUS_NUM,
18     STATUS_STR,
19     STATUS_FLOAT,
20     STATUS_ASSIGN,
21 
22 };

  1 TokenList_tag LexExcute(string sourcefilepath)
  2 {
  3     cout<<"func enter: main"<<endl;
  4     filepath=sourcefilepath;//;
  5     init();
  6     getline();
  7     LexStatus lexstatus=STATUS_NON;
  8     TokenNode_tag *tokennode;
  9     while(!end_of_lex)
 10     {
 11         getch();
 12         if(lexstatus==STATUS_NON)//初始狀態下
 13         {
 14             cout<<"LexStatus: STATUS_NON"<<endl;
 15             if(ischar()||ch=='_')
 16             {
 17                 lexstatus=STATUS_STR;
 18                 cout<<"LexStatus: STATUS_NON -> STATUS_STR"<<endl;
 19                 concat();
 20 
 21             }
 22             else if(isnum())
 23             {
 24                 lexstatus=STATUS_NUM;
 25                 cout<<"LexStatus: STATUS_NON -> STATUS_NUM"<<endl;
 26                 concat();
 27             }
 28             else if(ch=='(')
 29             {
 30                 concat();
 31                 tokennode=new TokenNode_tag;
 32                 tokennode->token_str=token_str;
 33                 tokennode->token_type=OP_LP;
 34                 tokennode->next=NULL;
 35                 tokenlist_insert(tokennode);
 36                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
 37                 lexstatus=STATUS_NON;
 38                 token_str="";
 39             }
 40             else if(ch==')')
 41             {
 42                 concat();
 43                 tokennode=new TokenNode_tag;
 44                 tokennode->token_str=token_str;
 45                 tokennode->token_type=OP_RP;
 46                 tokennode->next=NULL;
 47                 tokenlist_insert(tokennode);
 48                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
 49                 lexstatus=STATUS_NON;
 50                 token_str="";
 51             }
 52             else if(ch==' ')
 53             {
 54                 cout<<"do nothing"<<endl;
 55             }
 56             else if(ch=='\n')
 57             {
 58                 cout<<"do nothing"<<endl;
 59             }
 60             else if(ch=='=')
 61             {
 62                 concat();
 63                 cout<<"LexStatus: STATUS_NON -> STATUS_ASSIGN"<<endl;
 64                 lexstatus=STATUS_ASSIGN;
 65                 token_str="";
 66             }
 67             else if(ch=='+')
 68             {
 69                 concat();
 70                 tokennode=new TokenNode_tag;
 71                 tokennode->token_str=token_str;
 72                 tokennode->token_type=OP_ADD;
 73                 tokennode->next=NULL;
 74                 tokenlist_insert(tokennode);
 75                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
 76                 lexstatus=STATUS_NON;
 77                 token_str="";
 78 
 79             }
 80             else if(ch=='-')
 81             {
 82                 concat();
 83                 tokennode=new TokenNode_tag;
 84                 tokennode->token_str=token_str;
 85                 tokennode->token_type=OP_SUB;
 86                 tokennode->next=NULL;
 87                 tokenlist_insert(tokennode);
 88                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
 89                 lexstatus=STATUS_NON;
 90                 token_str="";
 91             }
 92             else if(ch=='*')
 93             {
 94                 concat();
 95                 tokennode=new TokenNode_tag;
 96                 tokennode->token_str=token_str;
 97                 tokennode->token_type=OP_MUL;
 98                 tokennode->next=NULL;
 99                 tokenlist_insert(tokennode);
100                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
101                 lexstatus=STATUS_NON;
102                 token_str="";
103             }
104             else if(ch=='/')
105             {
106                 concat();
107                 tokennode=new TokenNode_tag;
108                 tokennode->token_str=token_str;
109                 tokennode->token_type=OP_DIV;
110                 tokennode->next=NULL;
111                 tokenlist_insert(tokennode);
112                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
113                 lexstatus=STATUS_NON;
114                 token_str="";
115             }
116             else if(ch==';')
117             {
118                 concat();
119                 tokennode=new TokenNode_tag;
120                 tokennode->token_str=token_str;
121                 tokennode->token_type=OP_SEMI;
122                 tokennode->next=NULL;
123                 tokenlist_insert(tokennode);
124                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
125                 lexstatus=STATUS_NON;
126                 token_str="";
127             }
128 
129         }//lexstatus
130         else if(lexstatus==STATUS_ASSIGN)
131         {
132             if(ch=='=')//==
133             {
134                 concat();
135                 tokennode=new TokenNode_tag;
136                 tokennode->token_str=token_str;
137                 tokennode->token_type=OP_EQUAL;
138                 tokennode->next=NULL;
139                 tokenlist_insert(tokennode);
140                 cout<<"LexStatus: STATUS_ASSIGN -> STATUS_NON"<<endl;
141                 lexstatus=STATUS_NON;
142                 token_str="";
143             }
144             else  //=
145             {
146                 tokennode=new TokenNode_tag;
147                 tokennode->token_str=token_str;
148                 tokennode->token_type=OP_ASSIGN;
149                 tokennode->next=NULL;
150                 tokenlist_insert(tokennode);
151                 cout<<"LexStatus: STATUS_ASSIGN -> STATUS_NON"<<endl;
152                 lexstatus=STATUS_NON;
153                 token_str="";
154                 backwords();
155             }
156         }
157         else if(lexstatus==STATUS_FLOAT)
158         {
159             if(isnum())
160             {
161                 concat();
162             }
163             else
164             {
165                 tokennode=new TokenNode_tag;
166                 tokennode->token_str=token_str;
167                 tokennode->token_type=NUM_FLOAT;
168                 tokennode->next=NULL;
169                 tokenlist_insert(tokennode);
170                 cout<<"LexStatus: STATUS_FLOAT -> STATUS_NON"<<endl;
171                 lexstatus=STATUS_NON;
172                 token_str="";
173                 backwords();
174             }
175         }
176         else if(lexstatus==STATUS_NUM)
177         {
178             if(isnum())
179             {
180                 concat();
181             }
182             else if(ch=='.')
183             {
184                 concat();
185                 lexstatus=STATUS_FLOAT;
186             }
187             else
188             {
189                 tokennode=new TokenNode_tag;
190                 tokennode->token_str=token_str;
191                 tokennode->token_type=NUM_INT;
192                 tokennode->next=NULL;
193                 tokenlist_insert(tokennode);
194                 cout<<"LexStatus: STATUS_NUM -> STATUS_NON"<<endl;
195                 lexstatus=STATUS_NON;
196                 token_str="";
197                 backwords();
198             }
199         }
200         else if(lexstatus==STATUS_STR)
201         {
202             if(ischar()||ch=='_'||isnum())
203             {
204                 concat();
205             }
206             else
207             {
208                 tokennode=new TokenNode_tag;
209                 tokennode->token_str=token_str;
210                 int check=iskeywords(token_str);
211                 if(check!=-1)tokennode->token_type=(TokenType)check;
212                 else tokennode->token_type=IDENTIFY;
213                 tokennode->next=NULL;
214                 tokenlist_insert(tokennode);
215                 cout<<"LexStatus: STATUS_STR -> STATUS_NON"<<endl;
216                 lexstatus=STATUS_NON;
217                 token_str="";
218                 backwords();
219             }
220         }
221 
222 
223 
224     }
225 
226     cout<<"func end: main"<<endl;
227     tokenlist_visit();
228     return tokenlist;
229 }

完整代碼：（代碼是從好多文件剪出來的，應該能夠運行吧，額。。。）感受代碼好長，高手應該用很短就好了吧，慚愧。

  1 #include<iostream>
  2 #include<string >
  3 #include<fstream>
  4 using std::cout;
  5 using std::string;
  6 using std::endl;
  7 using std::ifstream;
  8  string keywords[]=
  9 {
 10     //順序不可改變
 11     "int",
 12     "double",
 13     "float",
 14     "if",
 15     "else",
 16     "elsif",
 17     "while",
 18     "for"
 19 };
 20 
 21 enum LexStatus
 22 {
 23     STATUS_NON,
 24     STATUS_NUM,
 25     STATUS_STR,
 26     STATUS_FLOAT,
 27     STATUS_ASSIGN,
 28 
 29 };
 30 enum TokenType
 31 {
 32 //順序不可改變
 33     KEYWORDS_INT,
 34     KEYWORDS_DOUBLE,
 35     KEYWORDS_FLOAT,
 36     KEYWORDS_IF,
 37     KEYWORDS_ELSE,
 38     KEYWORDS_ELSIF,
 39     KEYWORDS_WHILE,
 40     KEYWORDS_FOR,
 41 
 42 
 43     IDENTIFY,
 44     OP_EQUAL,//==
 45     OP_ASSIGN,//=
 46     OP_LP,//(
 47     OP_RP,//)
 48     OP_ADD,//+
 49     OP_SUB,//-
 50     OP_MUL,//*
 51     OP_DIV,// /
 52     OP_SEMI,//;
 53     NUM_FLOAT,
 54     NUM_INT
 55 };
 56 typedef struct TokenNode_tag
 57 {
 58     string    token_str;
 59     TokenType token_type;
 60     struct    TokenNode_tag   *next;
 61 }*TokenList_tag,TokenNode_tag;
 62 
 63 
 64 ifstream  in;
 65 string    filepath;
 66 string    line_str;
 67 int       line_len;
 68 int       line_pos;
 69 int       line_num;
 70 char      ch;
 71 string    token_str;
 72 
 73 bool      end_of_file;
 74 bool      end_of_lex;
 75 TokenList_tag tokenlist;//詞法分析器執行結束後產生的 token  串
 76 TokenNode_tag*listtail;
 77 void init();
 78 void getline();
 79 void getch();
 80 bool ischar();
 81 bool isnum();
 82 void concat();
 83 void backwords();
 84 void tokenlist_insert(TokenNode_tag*);
 85 void tokenlist_visit();
 86 int iskeywords(string );
 87 TokenList_tag LexExcute(string sourcefilepath)
 88 {
 89     cout<<"func enter: main"<<endl;
 90     filepath=sourcefilepath;//;
 91     init();
 92     getline();
 93     LexStatus lexstatus=STATUS_NON;
 94     TokenNode_tag *tokennode;
 95     while(!end_of_lex)
 96     {
 97         getch();
 98         if(lexstatus==STATUS_NON)//初始狀態下
 99         {
100             cout<<"LexStatus: STATUS_NON"<<endl;
101             if(ischar()||ch=='_')
102             {
103                 lexstatus=STATUS_STR;
104                 cout<<"LexStatus: STATUS_NON -> STATUS_STR"<<endl;
105                 concat();
106 
107             }
108             else if(isnum())
109             {
110                 lexstatus=STATUS_NUM;
111                 cout<<"LexStatus: STATUS_NON -> STATUS_NUM"<<endl;
112                 concat();
113             }
114             else if(ch=='(')
115             {
116                 concat();
117                 tokennode=new TokenNode_tag;
118                 tokennode->token_str=token_str;
119                 tokennode->token_type=OP_LP;
120                 tokennode->next=NULL;
121                 tokenlist_insert(tokennode);
122                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
123                 lexstatus=STATUS_NON;
124                 token_str="";
125             }
126             else if(ch==')')
127             {
128                 concat();
129                 tokennode=new TokenNode_tag;
130                 tokennode->token_str=token_str;
131                 tokennode->token_type=OP_RP;
132                 tokennode->next=NULL;
133                 tokenlist_insert(tokennode);
134                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
135                 lexstatus=STATUS_NON;
136                 token_str="";
137             }
138             else if(ch==' ')
139             {
140                 cout<<"do nothing"<<endl;
141             }
142             else if(ch=='\n')
143             {
144                 cout<<"do nothing"<<endl;
145             }
146             else if(ch=='=')
147             {
148                 concat();
149                 cout<<"LexStatus: STATUS_NON -> STATUS_ASSIGN"<<endl;
150                 lexstatus=STATUS_ASSIGN;
151                 token_str="";
152             }
153             else if(ch=='+')
154             {
155                 concat();
156                 tokennode=new TokenNode_tag;
157                 tokennode->token_str=token_str;
158                 tokennode->token_type=OP_ADD;
159                 tokennode->next=NULL;
160                 tokenlist_insert(tokennode);
161                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
162                 lexstatus=STATUS_NON;
163                 token_str="";
164 
165             }
166             else if(ch=='-')
167             {
168                 concat();
169                 tokennode=new TokenNode_tag;
170                 tokennode->token_str=token_str;
171                 tokennode->token_type=OP_SUB;
172                 tokennode->next=NULL;
173                 tokenlist_insert(tokennode);
174                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
175                 lexstatus=STATUS_NON;
176                 token_str="";
177             }
178             else if(ch=='*')
179             {
180                 concat();
181                 tokennode=new TokenNode_tag;
182                 tokennode->token_str=token_str;
183                 tokennode->token_type=OP_MUL;
184                 tokennode->next=NULL;
185                 tokenlist_insert(tokennode);
186                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
187                 lexstatus=STATUS_NON;
188                 token_str="";
189             }
190             else if(ch=='/')
191             {
192                 concat();
193                 tokennode=new TokenNode_tag;
194                 tokennode->token_str=token_str;
195                 tokennode->token_type=OP_DIV;
196                 tokennode->next=NULL;
197                 tokenlist_insert(tokennode);
198                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
199                 lexstatus=STATUS_NON;
200                 token_str="";
201             }
202             else if(ch==';')
203             {
204                 concat();
205                 tokennode=new TokenNode_tag;
206                 tokennode->token_str=token_str;
207                 tokennode->token_type=OP_SEMI;
208                 tokennode->next=NULL;
209                 tokenlist_insert(tokennode);
210                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
211                 lexstatus=STATUS_NON;
212                 token_str="";
213             }
214 
215         }//lexstatus
216         else if(lexstatus==STATUS_ASSIGN)
217         {
218             if(ch=='=')//==
219             {
220                 concat();
221                 tokennode=new TokenNode_tag;
222                 tokennode->token_str=token_str;
223                 tokennode->token_type=OP_EQUAL;
224                 tokennode->next=NULL;
225                 tokenlist_insert(tokennode);
226                 cout<<"LexStatus: STATUS_ASSIGN -> STATUS_NON"<<endl;
227                 lexstatus=STATUS_NON;
228                 token_str="";
229             }
230             else  //=
231             {
232                 tokennode=new TokenNode_tag;
233                 tokennode->token_str=token_str;
234                 tokennode->token_type=OP_ASSIGN;
235                 tokennode->next=NULL;
236                 tokenlist_insert(tokennode);
237                 cout<<"LexStatus: STATUS_ASSIGN -> STATUS_NON"<<endl;
238                 lexstatus=STATUS_NON;
239                 token_str="";
240                 backwords();
241             }
242         }
243         else if(lexstatus==STATUS_FLOAT)
244         {
245             if(isnum())
246             {
247                 concat();
248             }
249             else
250             {
251                 tokennode=new TokenNode_tag;
252                 tokennode->token_str=token_str;
253                 tokennode->token_type=NUM_FLOAT;
254                 tokennode->next=NULL;
255                 tokenlist_insert(tokennode);
256                 cout<<"LexStatus: STATUS_FLOAT -> STATUS_NON"<<endl;
257                 lexstatus=STATUS_NON;
258                 token_str="";
259                 backwords();
260             }
261         }
262         else if(lexstatus==STATUS_NUM)
263         {
264             if(isnum())
265             {
266                 concat();
267             }
268             else if(ch=='.')
269             {
270                 concat();
271                 lexstatus=STATUS_FLOAT;
272             }
273             else
274             {
275                 tokennode=new TokenNode_tag;
276                 tokennode->token_str=token_str;
277                 tokennode->token_type=NUM_INT;
278                 tokennode->next=NULL;
279                 tokenlist_insert(tokennode);
280                 cout<<"LexStatus: STATUS_NUM -> STATUS_NON"<<endl;
281                 lexstatus=STATUS_NON;
282                 token_str="";
283                 backwords();
284             }
285         }
286         else if(lexstatus==STATUS_STR)
287         {
288             if(ischar()||ch=='_'||isnum())
289             {
290                 concat();
291             }
292             else
293             {
294                 tokennode=new TokenNode_tag;
295                 tokennode->token_str=token_str;
296                 int check=iskeywords(token_str);
297                 if(check!=-1)tokennode->token_type=(TokenType)check;
298                 else tokennode->token_type=IDENTIFY;
299                 tokennode->next=NULL;
300                 tokenlist_insert(tokennode);
301                 cout<<"LexStatus: STATUS_STR -> STATUS_NON"<<endl;
302                 lexstatus=STATUS_NON;
303                 token_str="";
304                 backwords();
305             }
306         }
307 
308 
309 
310     }
311 
312     cout<<"func end: main"<<endl;
313     tokenlist_visit();
314     return tokenlist;
315 }//main
316 int iskeywords(string str)
317 {
318     int i=0;
319     int len=sizeof(keywords)/sizeof(string);
320     cout<<"length of keyword[]: "<<len<<endl;
321     while(i<len)
322     {
323         if(str==keywords[i])
324         {
325             cout<<str<<" is keywords"<<endl;
326             return i;
327         }
328         ++i;
329     }
330     cout<<str<<" isn't keywords"<<endl;
331     return -1;
332 }
333 void tokenlist_visit()
334 {
335     TokenNode_tag *tn;
336     tn=tokenlist;
337     while(tn->next!=NULL)
338     {
339         cout<<tn->next->token_str<<"   token_type"<<tn->next->token_type<<endl;
340         tn=tn->next;
341     }
342 
343 }
344 void tokenlist_insert(TokenNode_tag* token)
345 {
346     cout<<"func enter: tokenlist_insert"<<endl;
347     listtail->next=token;
348     listtail=listtail->next;
349     cout<<"func end: tokenlist_insert"<<endl;
350 }
351 bool ischar()
352 {
353     cout<<"func enter: ischar  ->ch:"<<ch<<endl;
354     if((ch>='a'&&ch<='z')||(ch>='A'&&ch<='Z'))
355     {
356         cout<<"func end: ischar ->"<<ch<<" is char"<<endl;
357         return true;
358     }
359     else
360     {
361         cout<<"func end: ischar ->"<<ch<<" isn't char"<<endl;
362         return false;
363     }
364 }
365 bool isnum()
366 {
367     cout<<"func enter: isnum ->ch:"<<ch<<endl;
368     if(ch>='0'&&ch<='9')
369     {
370         cout<<"func end: isnum ->"<<ch<<" is num"<<endl;
371         return true;
372     }
373     else
374     {
375         cout<<"func end: isnum ->"<<ch<<" isn't num"<<endl;
376         return false;
377     }
378 
379 }
380 /**
381 將ch加到token_str後
382 */
383 void concat()
384 {
385     cout<<"func enter: concat ->token:"<<token_str<<" ch:"<<ch<<endl;
386     token_str+=ch;
387     cout<<"func end: concat ->token:"<<token_str<<endl;
388 }
389 void backwords()
390 {
391     cout<<"func enter: backwords -> pos:"<<line_pos<<endl;
392     if(line_pos>0)line_pos--;
393     else cout<<"this is first ch in this line,can not backwords!";
394     cout<<"func end: backwords -> pos:"<<line_pos<<endl;
395 }
396 void getch()
397 {
398     cout<<"func enter: getch"<<endl;
399     if(line_pos<line_len)//從當前行獲取一個字符
400     {
401         ch=line_str[line_pos++];
402         cout<<"ch: "<<ch<<endl;
403     }
404     else//此行結束
405     {
406         cout<<"end of line"<<endl;
407         if(end_of_file)
408         {
409             cout<<"file over!!!"<<endl;
410             end_of_lex=true;
411             ch='\0';//結束標誌
412         }
413         else//文件並未結束，獲取新行
414         {
415             cout<<"new line"<<endl;
416             getline();
417             getch();
418         }
419     }
420     cout<<"func end: getch"<<endl;
421 }
422 void init()
423 {
424     cout<<"func enter: init"<<endl;
425     line_pos=0;
426     line_num=0;
427     token_str="";
428     ch='\0';
429     end_of_file=false;
430     end_of_lex=false;
431     tokenlist=new TokenNode_tag;
432     tokenlist->token_str="#";
433     tokenlist->next=NULL;
434     listtail=tokenlist;
435     in.open(filepath.c_str());
436     cout<<"func end: init"<<endl;
437 }
438 
439 void getline()
440 {
441     cout<<"func enter: getline"<<endl;
442     if(in)
443     {
444         line_num++;
445         getline(in,line_str);
446         //in>>line_str;
447         line_str+='\n';
448         line_len=line_str.length();
449         if(line_len==0)cout<<"line"<<line_num<<": "<<line_str<<endl;
450         else
451         {
452             cout<<"line"<<line_num<<": "<<line_str<<endl;
453             cout<<"length: "<<line_len<<endl;
454         }
455         line_pos=0;
456     }
457     else
458     {
459         cout<<"end of file"<<endl;
460         end_of_file=true;
461     }
462     cout<<"func end: getline"<<endl;
463 }

View Code

歡迎發現錯誤，歡迎留言！！！

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。