做者:史寧寧(snsn1984)ide
Clang的Lexer(詞法分析器)的源碼的主要位置以下:函數
clang/lib/Lex 這裏是主要的Lexer的代碼;ui
clang/include/clang/Lex 這裏是Lexer的頭文件代碼的位置;this
同時,Lexer還使用了clangBasic庫,因此要分析Lexer的代碼,clangBasic(clang/lib/Basic)的一些代碼也會用到。spa
首先從Lexer入手。.net
clang/include/clang/Lex/Lexer.h
clang::Lexer:
code
00057 //===--------------------------------------------------------------------===// 00058 // Context-specific lexing flags set by the preprocessor. 00059 // 00060 00061 /// ExtendedTokenMode - The lexer can optionally keep comments and whitespace 00062 /// and return them as tokens. This is used for -C and -CC modes, and 00063 /// whitespace preservation can be useful for some clients that want to lex 00064 /// the file in raw mode and get every character from the file. 00065 /// 00066 /// When this is set to 2 it returns comments and whitespace. When set to 1 00067 /// it returns comments, when it is set to 0 it returns normal tokens only. 00068 unsigned char ExtendedTokenMode; 00069 00070 //===--------------------------------------------------------------------===//這個成員變量保存詞法分析的一個狀態,根據它的值的不一樣:0、一、2,分別對應只返回正常的token,返回comments
00162 /// isKeepWhitespaceMode - Return true if the lexer should return tokens for 00163 /// every character in the file, including whitespace and comments. This 00164 /// should only be used in raw mode, as the preprocessor is not prepared to 00165 /// deal with the excess tokens. 00166 bool isKeepWhitespaceMode() const { 00167 return ExtendedTokenMode > 1; 00168 } 00169 00170 /// SetKeepWhitespaceMode - This method lets clients enable or disable 00171 /// whitespace retention mode. 00172 void SetKeepWhitespaceMode(bool Val) { 00173 assert((!Val || LexingRawMode || LangOpts.TraditionalCPP) && 00174 "Can only retain whitespace in raw mode or -traditional-cpp"); 00175 ExtendedTokenMode = Val ? 2 : 0; 00176 } 00177 00178 /// inKeepCommentMode - Return true if the lexer should return comments as 00179 /// tokens. 00180 bool inKeepCommentMode() const { 00181 return ExtendedTokenMode > 0; 00182 } 00183 00184 /// SetCommentRetentionMode - Change the comment retention mode of the lexer 00185 /// to the specified mode. This is really only useful when lexing in raw 00186 /// mode, because otherwise the lexer needs to manage this. 00187 void SetCommentRetentionState(bool Mode) { 00188 assert(!isKeepWhitespaceMode() && 00189 "Can't play with comment retention state when retaining whitespace"); 00190 ExtendedTokenMode = Mode ? 1 : 0; 00191 } 00192 00193 /// Sets the extended token mode back to its initial value, according to the 00194 /// language options and preprocessor. This controls whether the lexer 00195 /// produces comment and whitespace tokens. 00196 /// 00197 /// This requires the lexer to have an associated preprocessor. A standalone 00198 /// lexer has nothing to reset to. 00199 void resetExtendedTokenMode();關於raw mode:
00049 /// \brief True if in raw mode. 00050 /// 00051 /// Raw mode disables interpretation of tokens and is a far faster mode to 00052 /// lex in than non-raw-mode. This flag: 00053 /// 1. If EOF of the current lexer is found, the include stack isn't popped. 00054 /// 2. Identifier information is not looked up for identifier tokens. As an 00055 /// effect of this, implicit macro expansion is naturally disabled. 00056 /// 3. "#" tokens at the start of a line are treated as normal tokens, not 00057 /// implicitly transformed by the lexer. 00058 /// 4. All diagnostic messages are disabled. 00059 /// 5. No callbacks are made into the preprocessor. 00060 /// 00061 /// Note that in raw mode that the PP pointer may be null. 00062 bool LexingRawMode;它能夠代表Lexer是否在raw mode下。同時,這裏的註釋也說明了raw model的做用。
從clang::Lexer的定義能夠看出,它是clang::PreprocessorLexer的子類,上面raw model的部分也引用了clang::PreprocessorLexer類的代碼,下面看下clang::PreprocessorLexer的代碼。orm
clang/include/clang/Lex/PreprocessorLexer.hblog
00022 namespace clang { 00023 00024 class FileEntry; 00025 class Preprocessor;從這裏能夠看出clang::PreprocessorLexer使用了上面兩個類,而在頭文件中的具體位置就是:
00027 class PreprocessorLexer { 00028 virtual void anchor(); 00029 protected: 00030 Preprocessor *PP; // Preprocessor object controlling lexing.以及
00164 /// getFileEntry - Return the FileEntry corresponding to this FileID. Like 00165 /// getFileID(), this only works for lexers with attached preprocessors. 00166 const FileEntry *getFileEntry() const;
從代碼中能夠看出,這兩個類,一個是做爲成員變量,一個是做爲了一個成員函數的返回類型來使用的。咱們跟蹤代碼去看下這兩個類的具體實現。這兩個類的具體實現,FileEntry較爲簡單,很容易看出到底內容;而Preprocessor類較爲複雜,牽涉內容較多,在這裏暫且不做分析。後續繼續分析。token