Lex是lexical compiler的縮寫,是Unix環境下很是著名的工具, Lex (最先是埃裏克·施密特和 Mike Lesk 製做)是許多 UNIX 系統的標準詞法分析器(lexical analyzer)產生程式,並且這個工具所做的行爲被詳列爲 POSIX 標準的一部分。
Lex的基本工做原理爲:由正則表達式生成NFA,將NFA變換成DFA,DFA經化簡後,模擬生成詞法分析器。
Lex 主要功能是生成一個詞法分析器(scanner)的 C 源碼,描述規則採用正則表達式(regular expression)。描述詞法分析器的文件 *.l 通過lex編譯後,生成一個lex.yy.c 的文件,而後由 C 編譯器編譯生成一個詞法分析器。詞法分析器,簡言之,就是將輸入的各類符號,轉化成相應的標識符(token),轉化後的標識符很容易被後續階段處理,如Yacc 或 Bison,過程如圖 :
在linux系統上,咱們最經常使用的是Flex,Flex (fast lexical analyser generator) 是 Lex 的另外一個替代品。它常常和自由軟件 Bison 語法分析器生成器 一塊兒使用。Flex 最初由 Vern Paxson 於 1987 年用C語言寫成。Flex手冊裏對 Flex 描述以下:html
FLEX (fast lexical analyzer generator) is a tool/computer program for generating lexical analyzers (scanners or lexers) written by Vern Paxson in C around 1987. It is used together with Berkeley Yacc parser generator or GNU Bison parser generator. Flex and Bison both are more flexible than Lex and Yacc and produces faster code.
Bison produces parser from the input file provided by the user. The function yylex() is automatically generated by the flex when it is provided with a .l file and this yylex() function is expected by parser to call to retrieve tokens from current/this token stream.
Flex 的輸入文件包含了三部分,分別是定義區(definitions)、規則區(rules)和用戶代碼區(user code)而且由單獨佔一行的兩個連續的百分號("%%")分隔開:linux
definitions %% rules %% user code
下面對 Flex 輸入文件的三個部分作出解釋: git
1 定義部分:定義部分包含變量的聲明,正則定義,清單常量。在定義部分,文本放在「%{%}」括號中。用花括號括起來的全部內容都會直接複製到lex.yy.c文件中。
語法正則表達式
%{ // Definitions %}
2 規則部分:rules部分包含一系列規則,格式爲:pattern action,而且模式 pattern 位於行首不能縮進,action 也應該起始於同一行,規則部分包含在「%% %%」中。
語法:express
%% pattern action %%
下表顯示了一些模式匹配。api
Pattern | It can match with | |
---|---|---|
[0-9] | all the digits between 0 and 9 | |
[0+9] | either 0, + or 9 | |
[0, 9] | either 0, ‘, ‘ or 9 | |
[0 9] | either 0, ‘ ‘ or 9 | |
[-09] | either -, 0 or 9 | |
[-0-9] | either – or all digit between 0 and 9 | |
[0-9]+ | one or more digit between 0 and 9 | |
[^a] | all the other characters except a | |
[^A-Z] | all the other characters except the upper case letters | |
a{2, 4} | either aa, aaa or aaaa | |
a{2, } | two or more occurrences of a | |
a{4} | exactly 4 a’s i.e, aaaa | |
. | any character except newline | |
a* | 0 or more occurrences of a | |
a+ | 1 or more occurrences of a | |
[a-z] | all lower case letters | |
[a-zA-Z] | any alphabetic letter | |
w(x \ | y)z | wxz or wyz |
3 用戶代碼部分:這部分包含C語句和其餘功能。咱們還能夠分別編譯這些函數並使用詞法分析器加載。ide
要運行該程序,首先應將其保存爲擴展名.l或.lex。在終端上運行如下命令以運行程序文件。
步驟1:lex filename.l
或lex filename.lex
取決於擴展文件
步驟2:gcc lex.yy.c
步驟3:./ a.out
步驟4:在須要時將輸入提供給程序
注意:按Ctrl + D或使用某些規則中止接受用戶輸入。請查看如下程序的輸出圖像以清除是否有疑問以運行程序。函數
/*** Definition Section has one variable which can be accessed inside yylex() and main() ***/ %{ int count = 0; %} /*** Rule Section has three rules, first rule matches with capital letters, second rule matches with any character except newline and third rule does not take input after the enter***/ %% [A-Z] {printf("%s capital letter\n", yytext); count++;} . {printf("%s not a capital letter\n", yytext);} \n {return 0;} %% /*** Code Section prints the number of capital letter present in the given input***/ int yywrap(){} int main(){ // Explanation: // yywrap() - wraps the above rule section /* yyin - takes the file pointer which contains the input*/ /* yylex() - this is the main flex function which runs the Rule Section*/ // yytext is the text in the buffer // Uncomment the lines below // to take input from file // FILE *fp; // char filename[50]; // printf("Enter the filename: \n"); // scanf("%s",filename); // fp = fopen(filename,"r"); // yyin = fp; yylex(); printf("\nNumber of Captial letters " "in the given input - %d\n", count); return 0; }
運行:
工具
%{ /* 用flex寫一個查找讀取文本全部整數的程序 */ int count = 0; %} %% [+-]?[0-9]+ { count++; printf("%s\n", yytext); } /* Print integers */ \n {} /* newline */ . {} /* For others, do nothing */ %% void main() { yylex(); printf("Number count is %d\n", count); } int yywrap() { return 1; }
運行:
flex
參考文章: