做者:史寧寧(snsn1984)html
源碼位置:clang/lib/Lexer.cpp前端
源碼網絡地址:http://clang.llvm.org/doxygen/Lexer_8cpp_source.html網絡
Lexer.cpp這個文件,是Clang這個前端的詞法分析器的主要文件,它的內容是對Lexer這個類的具體實現,原文件的註釋中:「This file implements the Lexer and Token interfaces.」 這麼解釋這個文件的,可是Token只有兩個簡單函數的實現,剩下的都是Lexer的實現。因此要想搞清楚Clang的詞法分析器是怎麼實現的,那麼必須對這個文件有着深刻的理解。ide
從Lexer的初始化函數開始入手:函數
void Lexer::InitLexer(const char *BufStart, const char *BufPtr, 56 const char *BufEnd) { 57 BufferStart = BufStart; 58 BufferPtr = BufPtr; 59 BufferEnd = BufEnd; 60 61 assert(BufEnd[0] == 0 && 62 "We assume that the input buffer has a null character at the end" 63 " to simplify lexing!"); 64 65 // Check whether we have a BOM in the beginning of the buffer. If yes - act 66 // accordingly. Right now we support only UTF-8 with and without BOM, so, just 67 // skip the UTF-8 BOM if it's present. 68 if (BufferStart == BufferPtr) { 69 // Determine the size of the BOM. 70 StringRef Buf(BufferStart, BufferEnd - BufferStart); 71 size_t BOMLength = llvm::StringSwitch<size_t>(Buf) 72 .StartsWith("\xEF\xBB\xBF", 3) // UTF-8 BOM 73 .Default(0); 74 75 // Skip the BOM. 76 BufferPtr += BOMLength; 77 } 78 79 Is_PragmaLexer = false; 80 CurrentConflictMarkerState = CMK_None; 81 82 // Start of the file is a start of line. 83 IsAtStartOfLine = true; 84 IsAtPhysicalStartOfLine = true; 85 86 HasLeadingSpace = false; 87 HasLeadingEmptyMacro = false; 88 89 // We are not after parsing a #. 90 ParsingPreprocessorDirective = false; 91 92 // We are not after parsing #include. 93 ParsingFilename = false; 94 95 // We are not in raw mode. Raw mode disables diagnostics and interpretation 96 // of tokens (e.g. identifiers, thus disabling macro expansion). It is used 97 // to quickly lex the tokens of the buffer, e.g. when handling a "#if 0" block 98 // or otherwise skipping over tokens. 99 LexingRawMode = false; 100 101 // Default to not keeping comments. 102 ExtendedTokenMode = 0; 103 }
這個初始化函數,是在Lexer類的兩個構造函數裏被調用的,具體代碼以下:ui
104 105 /// Lexer constructor - Create a new lexer object for the specified buffer 106 /// with the specified preprocessor managing the lexing process. This lexer 107 /// assumes that the associated file buffer and Preprocessor objects will 108 /// outlive it, so it doesn't take ownership of either of them. 109 Lexer::Lexer(FileID FID, const llvm::MemoryBuffer *InputFile, Preprocessor &PP) 110 : PreprocessorLexer(&PP, FID), 111 FileLoc(PP.getSourceManager().getLocForStartOfFile(FID)), 112 LangOpts(PP.getLangOpts()) { 113 114 InitLexer(InputFile->getBufferStart(), InputFile->getBufferStart(), 115 InputFile->getBufferEnd()); 116 117 resetExtendedTokenMode(); 118 } 119 120 void Lexer::resetExtendedTokenMode() { 121 assert(PP && "Cannot reset token mode without a preprocessor"); 122 if (LangOpts.TraditionalCPP) 123 SetKeepWhitespaceMode(true); 124 else 125 SetCommentRetentionState(PP->getCommentRetentionState()); 126 } 127 128 /// Lexer constructor - Create a new raw lexer object. This object is only 129 /// suitable for calls to 'LexFromRawLexer'. This lexer assumes that the text 130 /// range will outlive it, so it doesn't take ownership of it. 131 Lexer::Lexer(SourceLocation fileloc, const LangOptions &langOpts, 132 const char *BufStart, const char *BufPtr, const char *BufEnd) 133 : FileLoc(fileloc), LangOpts(langOpts) { 134 135 InitLexer(BufStart, BufPtr, BufEnd); 136 137 // We *are* in raw mode. 138 LexingRawMode = true; 139 }
Lexer的構造函數,在本身的類內部,分別被如下的函數所調用:spa
Create_PragmaLexer: Lexer constructor - Create a new lexer object for _Pragma expansion.
http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#ac7f3b1ce4f2eeaec8d787d22bf197cd0
.net
getSpelling - This method is used to get the spelling of a token into a preallocated buffer, instead of as an std::string.code
http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#a94f2c5710332ae19d7955c609ac37adbhtm
getRawToken
http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#adac8b8cf001621ec3b109d82a7074f05
getBeginningOfFileToken
http://clang.llvm.org/doxygen/Lexer_8cpp.html#a4845396d18432c436e605303b057dbb4
findLocationAfterToken
http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#a099b99b2d19ef5cdd8fcb80d8cf4064e