深刻研究Clang(五) Clang Lexer代碼閱讀筆記之Lexer

做者:史寧寧(snsn1984)ide


Clang的Lexer(詞法分析器)的源代碼的主要位置例如如下:函數

clang/lib/Lex    這裏是基本的Lexer的代碼;ui

clang/include/clang/Lex   這裏是Lexer的頭文件代碼的位置;this

同一時候,Lexer還使用了clangBasic庫,因此要分析Lexer的代碼,clangBasic(clang/lib/Basic)的一些代碼也會用到。spa


首先從Lexer入手。.net


clang/include/clang/Lex/Lexer.h

clang::Lexer:
code

00057   //===--------------------------------------------------------------------===//
00058   // Context-specific lexing flags set by the preprocessor.
00059   //
00060
00061   /// ExtendedTokenMode - The lexer can optionally keep comments and whitespace
00062   /// and return them as tokens.  This is used for -C and -CC modes, and
00063   /// whitespace preservation can be useful for some clients that want to lex
00064   /// the file in raw mode and get every character from the file.
00065   ///
00066   /// When this is set to 2 it returns comments and whitespace.  When set to 1
00067   /// it returns comments, when it is set to 0 it returns normal tokens only.
00068   unsigned char ExtendedTokenMode;
00069
00070   //===--------------------------------------------------------------------===//
這個成員變量保存詞法分析的一個狀態,依據它的值的不一樣:0、一、2,分別相應僅僅返回正常的token。返回comments
和正常的token。返回空格、comments和正常的token。


如下是幾個操做這個成員變量的函數。基本上都是獲取值、設置值和重設值。orm

代碼不復雜,
blog

00162   /// isKeepWhitespaceMode - Return true if the lexer should return tokens for
00163   /// every character in the file, including whitespace and comments.  This
00164   /// should only be used in raw mode, as the preprocessor is not prepared to
00165   /// deal with the excess tokens.
00166   bool isKeepWhitespaceMode() const {
00167     return ExtendedTokenMode > 1;
00168   }
00169
00170   /// SetKeepWhitespaceMode - This method lets clients enable or disable
00171   /// whitespace retention mode.
00172   void SetKeepWhitespaceMode(bool Val) {
00173     assert((!Val || LexingRawMode || LangOpts.TraditionalCPP) &&
00174            "Can only retain whitespace in raw mode or -traditional-cpp");
00175     ExtendedTokenMode = Val ? 2 : 0;
00176   }
00177
00178   /// inKeepCommentMode - Return true if the lexer should return comments as
00179   /// tokens.
00180   bool inKeepCommentMode() const {
00181     return ExtendedTokenMode > 0;
00182   }
00183
00184   /// SetCommentRetentionMode - Change the comment retention mode of the lexer
00185   /// to the specified mode.  This is really only useful when lexing in raw
00186   /// mode, because otherwise the lexer needs to manage this.
00187   void SetCommentRetentionState(bool Mode) {
00188     assert(!isKeepWhitespaceMode() &&
00189            "Can't play with comment retention state when retaining whitespace");
00190     ExtendedTokenMode = Mode ?

1 : 0; 00191 } 00192 00193 /// Sets the extended token mode back to its initial value, according to the 00194 /// language options and preprocessor. This controls whether the lexer 00195 /// produces comment and whitespace tokens. 00196 /// 00197 /// This requires the lexer to have an associated preprocessor. A standalone 00198 /// lexer has nothing to reset to. 00199 void resetExtendedTokenMode();token

關於raw mode:
raw mode的時候。ExtendedTokenMode = 2,Lexer會輸出包括空格、comments和正常tokens在內的所有
字符。在Lexer的父類:clang::PreprocessorLexer類中(),有一個成員變量:
00049   /// \brief True if in raw mode.
00050   ///
00051   /// Raw mode disables interpretation of tokens and is a far faster mode to
00052   /// lex in than non-raw-mode.  This flag:
00053   ///  1. If EOF of the current lexer is found, the include stack isn't popped.
00054   ///  2. Identifier information is not looked up for identifier tokens.  As an
00055   ///     effect of this, implicit macro expansion is naturally disabled.
00056   ///  3. "#" tokens at the start of a line are treated as normal tokens, not
00057   ///     implicitly transformed by the lexer.
00058   ///  4. All diagnostic messages are disabled.
00059   ///  5. No callbacks are made into the preprocessor.
00060   ///
00061   /// Note that in raw mode that the PP pointer may be null.
00062   bool LexingRawMode;
它可以代表Lexer是否在raw mode下。同一時候,這裏的凝視也說明了raw model的做用。

從clang::Lexer的定義可以看出,它是clang::PreprocessorLexer的子類,上面raw model的部分也引用了clang::PreprocessorLexer類的代碼,如下看下clang::PreprocessorLexer的代碼。

clang/include/clang/Lex/PreprocessorLexer.h

00022 namespace clang {
00023 
00024 class FileEntry;
00025 class Preprocessor;
從這裏可以看出clang::PreprocessorLexer使用了上面兩個類,而在頭文件裏的詳細位置就是:

00027 class PreprocessorLexer {
00028   virtual void anchor();
00029 protected:
00030   Preprocessor *PP;              // Preprocessor object controlling lexing.
以及

00164   /// getFileEntry - Return the FileEntry corresponding to this FileID.  Like
00165   /// getFileID(), this only works for lexers with attached preprocessors.
00166   const FileEntry *getFileEntry() const;

從代碼中可以看出。這兩個類,一個是做爲成員變量。一個是做爲了一個成員函數的返回類型來使用的。咱們跟蹤代碼去看下這兩個類的詳細實現。這兩個類的詳細實現。FileEntry較爲簡單。很是easy看出究竟內容。而Preprocessor類較爲複雜,牽涉內容較多,在這裏暫且不做分析。興許繼續分析。

相關文章
相關標籤/搜索