本節咱們先從一個簡易的能夠識別四則運算和整數值的詞法分析掃描器開始。它實現的功能也很簡單,就是讀取咱們給定的文件,並識別出文件中的token將其輸出。git
這個簡易的掃描器支持的詞法元素只有五個:ide
咱們須要事先定義好每個token,使用枚舉類型來表示:函數
//defs.h // Tokens enum { T_PLUS, T_MINUS, T_STAR, T_SLASH, T_INTLIT };
在掃描到token後將其存儲在一個以下的結構體中,當標記是 T_INTLIT(即整數文字)時,該intvalue 字段將保存咱們掃描的整數值:oop
//defs.h // Token structure struct token { int token; int intvalue; };
咱們如今假定有一個文件,其內部的的代碼就是一個四則運算表達式:spa
2 + 34 * 5 - 8 / 3
咱們要實現的是讀取他的每個有效字符並輸出,就像這樣:token
Token intlit, value 2 Token + Token intlit, value 34 Token * Token intlit, value 5 Token - Token intlit, value 8 Token / Token intlit, value 3
咱們看到了最終要實現的目標,讓咱們來一步步分析須要的功能。ip
// Get the next character from the input file. static int next(void) { int c; if (Putback) { // Use the character put c = Putback; // back if there is one Putback = 0; return c; } c = fgetc(Infile); // Read from input file if ('\n' == c) Line++; // Increment line count return c; }
// Skip past input that we don't need to deal with, // i.e. whitespace, newlines. Return the first // character we do need to deal with. static int skip(void) { int c; c = next(); while (' ' == c || '\t' == c || '\n' == c || '\r' == c || '\f' == c) { c = next(); } return (c); }
// Return the position of character c // in string s, or -1 if c not found static int chrpos(char *s, int c) { char *p; p = strchr(s, c); return (p ? p - s : -1); } // Scan and return an integer literal // value from the input file. Store // the value as a string in Text. static int scanint(int c) { int k, val = 0; // Convert each character into an int value while ((k = chrpos("0123456789", c)) >= 0) { val = val * 10 + k; c = next(); } // We hit a non-integer character, put it back. putback(c); return val; }
因此如今咱們能夠在跳過空格的同時讀取字符;若是咱們讀到一個字符太遠,咱們也能夠放回一個字符。咱們如今能夠編寫咱們的第一個詞法掃描器:rem
int scan(struct token *t) { int c; // Skip whitespace c = skip(); // Determine the token based on // the input character switch (c) { case EOF: return (0); case '+': t->token = T_PLUS; break; case '-': t->token = T_MINUS; break; case '*': t->token = T_STAR; break; case '/': t->token = T_SLASH; break; default: // If it's a digit, scan the // literal integer value in if (isdigit(c)) { t->intvalue = scanint(c); t->token = T_INTLIT; break; } printf("Unrecognised character %c on line %d\n", c, Line); exit(1); } // We found a token return (1); }
如今咱們能夠讀取token並將其返回。get
main() 函數打開一個文件,而後掃描它的令牌:input
void main(int argc, char *argv[]) { ... init(); ... Infile = fopen(argv[1], "r"); ... scanfile(); exit(0); }
並scanfile()在有新token時循環並打印出token的詳細信息:
// List of printable tokens char *tokstr[] = { "+", "-", "*", "/", "intlit" }; // Loop scanning in all the tokens in the input file. // Print out details of each token found. static void scanfile() { struct token T; while (scan(&T)) { printf("Token %s", tokstr[T.token]); if (T.token == T_INTLIT) printf(", value %d", T.intvalue); printf("\n"); } }
咱們本節的內容就到此爲止。下一部分中,咱們將構建一個解析器來解釋咱們輸入文件的語法,並計算並打印出每一個文件的最終值。