LLVM Coding Standards 官網 | 歷史翻譯版本 Githubhtml
This document describes coding standards that are used in the LLVM project. Although no coding standards should be regarded as absolute requirements to be followed in all instances, coding standards are particularly important for large-scale code bases that follow a library-based design (like LLVM).ios
While this document may provide guidance for some mechanical formatting issues, whitespace, or other 「microscopic details」, these are not fixed standards. Always follow the golden rule: If you are extending, enhancing, or bug fixing already implemented code, use the style that is already being used so that the source is uniform and easy to follow.c++
Note that some code bases (e.g. libc++) have special reasons to deviate from the coding standards. For example, in the case of libc++, this is because the naming and other conventions are dictated by the C++ standard.git
There are some conventions that are not uniformly followed in the code base (e.g. the naming convention). This is because they are relatively new, and a lot of code was written before they were put in place. Our long term goal is for the entire codebase to follow the convention, but we explicitly do not want patches that do large-scale reformatting of existing code. On the other hand, it is reasonable to rename the methods of a class if you’re about to change it in some other way. Please commit such changes separately to make code review easier.github
The ultimate goal of these guidelines is to increase the readability and maintainability of our common source base.golang
本文檔介紹在 LLVM 工程中使用的編碼規範。儘管在任何狀況下編碼規範不該該被視爲絕對遵循的需求,可是編碼規範對於遵循基於庫的設計(例如 LLVM)的大規模代碼庫而言尤其重要。算法
本文可能給一些機械的格式問題,空格或者其它細微的細節提供指引,不過沒有固定的規範。始終遵照黃金規則: 使用當前的代碼規範對已經實現的代碼進行擴展,加強或者bug修復,這樣源碼風格統一而且易於遵照。express
注意,某些代碼庫(如libc++)有特殊的理由偏離此代碼規範。比方說libc++,其命名和其它約定是由C++標準決定的。windows
在代碼庫中有一些約定沒有被統一遵循(如命名約定)。這是由於這些約定相對較新而很是多的代碼在這些約定出現以前就已經存在了。咱們長期目標是整個代碼庫都遵循約定,可是咱們明確不但願出現對現有代碼進行大規模從新格式化的補丁。另外一方面,若是你要以其它方式進行修改,則重命名類的方法是合理的。請分別提交此類修改,以便更容易地進行代碼審查。api
這些準則的最終目標就是提升咱們公共源碼庫的可讀性和可維護性。
Most source code in LLVM and other LLVM projects using these coding standards is C++ code. There are some places where C code is used either due to environment restrictions, historical restrictions, or due to third-party source code imported into the tree. Generally, our preference is for standards conforming, modern, and portable C++ code as the implementation language of choice.
大部分在 LLVM 和其它 LLVM 項目中使用該代碼規範的源碼爲C++代碼。因爲一些環境、歷史限制或者引入至源碼樹中的第三方源碼致使在某些地方存在C代碼。一般而言,咱們傾向於符合標準,現代和可移植的C++代碼做爲選擇實現的語言。
Unless otherwise documented, LLVM subprojects are written using standard C++14 code and avoid unnecessary vendor-specific extensions.
Nevertheless, we restrict ourselves to features which are available in the major toolchains supported as host compilers (see Getting Started with the LLVM System page, section Software).
Each toolchain provides a good reference for what it accepts:
除非另有說明,不然 LLVM 子項目使用 C++14 標準編寫而且避免沒必要要的廠商定製擴展。然而,咱們將本身限制在宿主編譯器支持的主要工具鏈中的可用功能之中(見LLVM系統入門, 軟件部分)。
每個工具鏈都提供了一個好的參考:
Instead of implementing custom data structures, we encourage the use of C++ standard library facilities or LLVM support libraries whenever they are available for a particular task. LLVM and related projects emphasize and rely on the standard library facilities and the LLVM support libraries as much as possible.
LLVM support libraries (for example, ADT) implement specialized data structures or functionality missing in the standard library. Such libraries are usually implemented in the llvm namespace and follow the expected standard interface, when there is one.
When both C++ and the LLVM support libraries provide similar functionality, and there isn’t a specific reason to favor the C++ implementation, it is generally preferable to use the LLVM library. For example, llvm::DenseMap should almost always be used instead of std::map or std::unordered_map, and llvm::SmallVector should usually be used instead of std::vector.
We explicitly avoid some standard facilities, like the I/O streams, and instead use LLVM’s streams library (raw_ostream). More detailed information on these subjects is available in the LLVM Programmer’s Manual.
For more information about LLVM’s data structures and the tradeoffs they make, please consult that section of the programmer’s manual.
只要能用於特定的任務,咱們推薦使用C++標準庫或者LLVM支持的庫設施,而不是自定義用戶數據結構。LLVM和相關項目儘量的突出和依賴標準庫和LLVM支持的庫設施。
LLVM支持的庫(例如ADT)實現了標準庫缺乏的特有數據結構或者功能。一些庫常常被實如今llvm
命名空間內而且遵循預期的標準接口。
當C++標準庫和LLVM支持的庫都支持一個類似的功能時,而且沒有特定的理由去偏向C++標準庫的實現,這種狀況一般更傾向於使用LLVM的庫。舉個例子:llvm::DenseMap
幾乎老是應該被使用而不是 std::map
或者 std::unordered_map
,而且 llvm::SmallVector
應該老是代替使用 std::vector
。咱們明確的避免使用一些標準庫設施,如標準 I/O stream,取而代之的是使用 LLVM 的 stream庫(raw_ostream)。更多相關主題的細節信息參考 LLVM 開發者手冊.
更多關於LLVM數據結構和作出權衡的信息,請參考 that section of the programmer’s manual。
Any code written in the Go programming language is not subject to the formatting rules below. Instead, we adopt the formatting rules enforced by the gofmt tool.
Go code should strive to be idiomatic. Two good sets of guidelines for what this means are Effective Go and Go Code Review Comments.
任何使用Go語言的寫下的代碼不受如下(指C++)的格式化規則約束,相反的,咱們使用gofmt工具強制執行格式化規則。
Go 代碼應該儘可能符合習慣。Effective Go 和 Go Code Review Comments 是兩套好的準則。
Comments are important for readability and maintainability. When writing comments, write them as English prose, using proper capitalization, punctuation, etc. Aim to describe what the code is trying to do and why, not how it does it at a micro level. Here are a few important things to document:
註釋對於可讀性和可維護性很是重要。當寫註釋時,使用適當的大小寫、標點符號等將其寫成英文句子。專一於描述代碼試圖作什麼和爲何這麼作,不要從微觀的角度去描述代碼作的事情。如下是一些重要的記錄:
Every source file should have a header on it that describes the basic purpose of the file. The standard header looks like this:
A few things to note about this particular format: The 「-- C++ --」 string on the first line is there to tell Emacs that the source file is a C++ file, not a C file (Emacs assumes .h files are C files by default).
This tag is not necessary in .cpp files. The name of the file is also on the first line, along with a very short description of the purpose of the file.
The next section in the file is a concise note that defines the license that the file is released under. This makes it perfectly clear what terms the source code can be distributed under and should not be modified in any way.
The main body is a Doxygen comment (identified by the /// comment marker instead of the usual //) describing the purpose of the file. The first sentence (or a passage beginning with \brief) is used as an abstract. Any additional information should be separated by a blank line. If an algorithm is based on a paper or is described in another source, provide a reference.
每個源碼文件應該有一個描述該文件基本用途的頭部首部,標準的首部看起來像這樣:
//===-- llvm/Instruction.h - Instruction class definition -------*- C++ -*-===// // // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // See https://llvm.org/LICENSE.txt for license information. // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // //===----------------------------------------------------------------------===// /// /// \file /// This file contains the declaration of the Instruction class, which is the /// base class for all of the VM instructions. /// //===----------------------------------------------------------------------===//
注意其中的一些特殊的格式:-*- C++ -*-
用於指示Emacs該文件爲C++文件而不是C文件,(在Emacs中頭文件默認爲C文件)。
注意該標記在 .cpp 文件中不是必要的。文件名及一句簡短的文件用途描述存在在文件的第一行。
文件的下一部分是一個簡短的說明,它定義了文件發佈所依據的 license。這樣就清楚的說明了源代碼在什麼條件下能夠發佈,而且不能以任何形式進行修改。
主體部分是一些描述該文件用途的Doxygen註釋(經過 ///
註釋記號而不是 //
來標記)。第一句(段落以 \brief
開始)用做於摘要。任何附加的信息都應該以空行分隔。若是一個算法是基於一篇論文或者在其它來源說明的,則提供一個引用。
Classes are a fundamental part of an object-oriented design. As such, a class definition should have a comment block that explains what the class is used for and how it works. Every non-trivial class is expected to have a doxygen comment block.
類是面向對象設計的基礎部分。一個類定義應該有一個描述該類的用途和如何工做的註釋塊。每個非平凡類都應該有一個doxygen註釋塊。
Methods and global functions should also be documented. A quick note about what it does and a description of the edge cases is all that is necessary here. The reader should be able to understand how to use interfaces without reading the code itself.
Good things to talk about here are what happens when something unexpected happens, for instance, does the method return null?
方法和全局函數一樣應該被記錄。簡單備註一下它的功能和對邊緣狀況的描述。讀者應該可以理解如何在不閱讀代碼本書的狀況下使用接口。
這裏更值得討論的事情是當意外出現時會發生什麼,例如,方法是否返回null?
In general, prefer C++-style comments (// for normal comments, /// for doxygen documentation comments). There are a few cases when it is useful to use C-style (/* */) comments however:
Object.emitName(nullptr);An in-line C-style comment makes the intent obvious:
Object.emitName(/*Prefix=*/nullptr);
Commenting out large blocks of code is discouraged, but if you really have to do this (for documentation purposes or as a suggestion for debug printing), use #if 0 and #endif. These nest properly and are better behaved in general than C style comments.
一般而言,更傾向於C++風格的註釋(//
爲通常的註釋,///
爲doxygen文檔註釋)。不過有一小部分狀況下C風格註釋也是有用的:
Object.emitName(nullptr);一行C風格註釋使其意圖變得明顯:
Object.emitName(/*Prefix=*/nullptr);
不推薦對大規模代碼塊進行註釋,若是必需要註釋的話(記錄意圖或者做爲調試打印的建議),使用 #if 0
和 #endif
。 適當的嵌套是比通常的使用C風格註釋更好的行爲。
Use the \file command to turn the standard file header into a file-level comment.
Include descriptive paragraphs for all public interfaces (public classes, member and non-member functions). Avoid restating the information that can be inferred from the API name. The first sentence (or a paragraph beginning with \brief) is used as an abstract. Try to use a single sentence as the \brief adds visual clutter. Put detailed discussion into separate paragraphs.
To refer to parameter names inside a paragraph, use the \p name command. Don’t use the \arg name command since it starts a new paragraph that contains documentation for the parameter.
Wrap non-inline code examples in \code ... \endcode.
To document a function parameter, start a new paragraph with the \param name command. If the parameter is used as an out or an in/out parameter, use the \param [out] name or \param [in,out] name command, respectively.
To describe function return value, start a new paragraph with the \returns command.
A minimal documentation comment:
A documentation comment that uses all Doxygen features in a preferred way:
使用 \file
指令將標準文件首部轉爲文件級別的註釋。
包括全部的公共接口(公共類、成員和非成員函數)描述性段落。避免重複能夠從 API 名字中獲取的信息。第一句(對段落而言以 \brief
開始)用做於一個概述。用單個句子做爲 \brief
的補充,具體的細節討論放在下一段落之中。
使用 \p name
指令在段落註釋中引用參數的名稱。不要在包含參數描述的新段落中使用 \arg name
指令。
使用 \code ... \endcode
指令來包裝非內聯代碼。
新起一個段落使用 \param name
指令來描述一個函數的參數,若是是用來輸出或者輸入輸出的參數,使用 \param [out] name
或者 \param [in,out] name
指令來表示。
新起一段使用 \returns
來描述函數的返回。
一個最簡短的文檔註釋 。。。
/// Sets the xyzzy property to \p Baz. void setXyzzy(bool Baz);
一個首選的文檔註釋是使用全部的 doxygen 特性。
/// Does foo and bar. /// /// Does not do foo the usual way if \p Baz is true. /// /// Typical usage: /// \code /// fooBar(false, "quux", Res); /// \endcode /// /// \param Quux kind of foo to do. /// \param [out] Result filled with bar sequence on foo success. /// /// \returns true on success. bool fooBar(bool Baz, StringRef Quux, std::vector<int> &Result);
Don’t duplicate the documentation comment in the header file and in the implementation file. Put the documentation comments for public APIs into the header file. Documentation comments for private APIs can go to the implementation file. In any case, implementation files can include additional comments (not necessarily in Doxygen markup) to explain implementation details as needed.
Don’t duplicate function or class name at the beginning of the comment. For humans it is obvious which function or class is being documented; automatic documentation processing tools are smart enough to bind the comment to the correct declaration.
不要在頭文件和實現的文件中複製文檔註釋。將對公共API的文檔註釋放在頭文件中,對私有API的文檔註釋能夠放到實現的文件中。不管如何,能夠按需在實現文件按中添加(在doxygen製做中不是必需的)額外的註釋信息來解釋實現的細節。
不要在註釋的起始處複製函數或者類名稱。對讀者而言,哪一個函數或者類被記錄下來是顯而易見的;自動的文檔處理工具能夠足夠聰明的將註釋綁定到正確的聲明上去。
避免:
// Example.h: // example - Does something important. void example(); // Example.cpp: // example - Does something important. void example() { ... }
推薦:
// Example.h: /// Does something important. void example(); // Example.cpp: /// Builds a B-tree in order to do foo. See paper by... void example() { ... }
Clear diagnostic messages are important to help users identify and fix issues in their inputs. Use succinct but correct English prose that gives the user the context needed to understand what went wrong. Also, to match error message styles commonly produced by other tools, start the first sentence with a lower-case letter, and finish the last sentence without a period, if it would end in one otherwise. Sentences which end with different punctuation, such as 「did you forget ‘;’?」, should still do so.
清晰的診斷消息對於幫助用戶識別和修復問題而言很是重要,使用簡短但正確的英文陳述能夠給出須要的上下文讓用戶理解。一樣,產生的錯誤信息的格式須要其它工具來匹配,第一句應該以小寫字母開始,若是以一種其它方式結束這句話則結尾不帶有句號。句子使用不一樣的標點符號結尾,像一些 「did you forget ‘;’?」, 則應該繼續保持不一樣。
良好的錯誤消息示例:
error: file.o: section header 3 is corrupt. Size is 10 when it should be 20
很差的消息,由於沒有提供有用的信息而且格式錯誤:
error: file.o: Corrupt section header.
As with other coding standards, individual projects, such as the Clang Static Analyzer, may have preexisting styles that do not conform to this. If a different formatting scheme is used consistently throughout the project, use that style instead. Otherwise, this standard applies to all LLVM tools, including clang, clang-tidy, and so on.
If the tool or project does not have existing functions to emit warnings or errors, use the error and warning handlers provided in Support/WithColor.h to ensure they are printed in the appropriate style, rather than printing to stderr directly.
When using report_fatal_error, follow the same standards for the message as regular error messages. Assertion messages and llvm_unreachable calls do not necessarily need to follow these same styles as they are automatically formatted, and thus these guidelines may not be suitable.
其它獨立項目中的編碼規範,如 Clang-Static-Analyzer,可能遺留不遵循該規範的風格。若是一個不一樣的格式化方案始終用於整個項目,則沿用那個風格。不然,該規範用於 LLVM 中的全部工具,包括 clang、clang-tidy 等等。
若是這個工具或項目中沒有能夠產生警告和錯誤的功能,則使用 Support/WithColor.h
提供的錯誤和警告Handler來確保這些信息能夠以適當的格式打印出來,而不是直接打印至 stderr
中。
當使用 report_fatal_error
,遵循相同的規範保持錯誤消息的規律。斷言消息和llvm_unreachable
調用則沒有必要遵循這種自動格式化的格式,所以這些參考不必定合適。
Immediately after the header file comment (and include guards if working on a header file), the minimal list of #includes required by the file should be listed. We prefer these #includes to be listed in this order:
and each category should be sorted lexicographically by the full path.
The Main Module Header file applies to .cpp files which implement an interface defined by a .h file. This #include should always be included first regardless of where it lives on the file system. By including a header file first in the .cpp files that implement the interfaces, we ensure that the header does not have any hidden dependencies which are not explicitly #included in the header, but should be. It is also a form of documentation in the .cpp file to indicate where the interfaces it implements are defined.
LLVM project and subproject headers should be grouped from most specific to least specific, for the same reasons described above. For example, LLDB depends on both clang and LLVM, and clang depends on LLVM. So an LLDB source file should include lldb headers first, followed by clang headers, followed by llvm headers, to reduce the possibility (for example) of an LLDB header accidentally picking up a missing include due to the previous inclusion of that header in the main source file or some earlier header file. clang should similarly include its own headers before including llvm headers. This rule applies to all LLVM subprojects.
緊接文件首部註釋(和包含保護,若是是頭文件)以後,該文件須要的最小 #include
列表應該被列出來。咱們推薦 #include
順序以下:
#include
而且每個種類的頭文件完整路徑應該作一個排序。
主模塊頭文件適用於聲明實現接口的 .cpp
文件的 .h
文件。該 #include
無論處在什麼文件系統下老是應該第一個被包含。經過在實現該接口的第一包含頭文件能夠確保頭文件沒有隱藏須要但沒有顯式包含的依賴。一樣的,這也是在 .cpp
中一種指明接口聲明在何處的記錄。
LLVM 項目和子項目的頭文件應該從最高優先級到最低優先級分組,理由同上。舉個例子,LLDB 同時依賴 clang 和 LLVM,而且 clang 依賴 LLVM。因此一個 LLDB 源文件應該最早包含 LLDB ,其次是 clang 頭文件,其次是 LLVM 頭文件,這樣作是爲了在源文件中的頭文件或更前的頭文件包含狀況降低低LLDB頭文件包含缺失的可能。clang 一樣的應該在包含LLVM文件以前包含其自身的頭文件。這個規則適用於全部的 LLVM 的子項目。
Write your code to fit within 80 columns.
There must be some limit to the width of the code in order to allow developers to have multiple files side-by-side in windows on a modest display. If you are going to pick a width limit, it is somewhat arbitrary but you might as well pick something standard. Going with 90 columns (for example) instead of 80 columns wouldn’t add any significant value and would be detrimental to printing out code. Also many other projects have standardized on 80 columns, so some people have already configured their editors for it (vs something else, like 90 columns).
調整代碼在80列之內。
代碼的寬度限制是爲了容許開發者同時在一個窗口中並排打開多個文件而能夠合適的顯示。若是你打算選擇一個寬度限制,這個限制是任意的可是你一樣須要選擇一個標準。選擇90列代替80列並不增長任何有意義的值而且將對輸出代碼有壞處。此外,有很是多個其它項目對80列進行了標準化,所以有些人爲它對編輯器作了配置(對比其它的寬度,例如90列)。
In all cases, prefer spaces to tabs in source files. People have different preferred indentation levels, and different styles of indentation that they like; this is fine. What isn’t fine is that different editors/viewers expand tabs out to different tab stops. This can cause your code to look completely unreadable, and it is not worth dealing with.
As always, follow the Golden Rule above: follow the style of existing code if you are modifying and extending it.
Do not add trailing whitespace. Some common editors will automatically remove trailing whitespace when saving a file which causes unrelated changes to appear in diffs and commits.
在全部狀況下,在源文件中較之製表符更推薦空格。對於縮進級別和縮進的格式人們有不一樣的偏心,這很好。很差的事情是不一樣的編輯器/閱讀器將製表符擴展爲不一樣的製表長度。這會形成你的代碼看起來徹底的不可讀而且這樣的狀況不值得去處理。
一如既往,遵循黃金規則如上:若是要修改和擴展代碼則遵循已存在的代碼風格。
不要在結尾添加空格,一些經常使用的編輯器會在保存文件時自動的移除結尾的空格,這樣致使在diff或者commit時產生不相關的改變。
When formatting a multi-line lambda, format it like a block of code. If there is only one multi-line lambda in a statement, and there are no expressions lexically after it in the statement, drop the indent to the standard two space indent for a block of code, as if it were an if-block opened by the preceding part of the statement:
像格式化代碼塊同樣格式化多行的lambda。若是語句中只有一個多行lambda,而且lambda以後沒有詞法表達式,將代碼塊降低標準的2空格縮進進行縮進,如同前面的語句打開的 if-block。
std::sort(foo.begin(), foo.end(), [&](Foo a, Foo b) -> bool { if (a.blah < b.blah) return true; if (a.baz < b.baz) return true; return a.bam < b.bam; });
To take best advantage of this formatting, if you are designing an API which accepts a continuation or single callable argument (be it a function object, or a std::function), it should be the last argument if at all possible.
If there are multiple multi-line lambdas in a statement, or additional parameters after the lambda, indent the block two spaces from the indent of the []:
爲了充分利用這種格式,若是你將設計一個接受延長或者可調用的參數(函數對象,或者 std::function)的API,它應該儘量的做爲最後一個參數。
若是一條語句中有多個多行lambda表達式,或者額外的參數在lambda以後,從 []
的縮進開始以2空格縮進這個代碼塊:
dyn_switch(V->stripPointerCasts(), [] (PHINode *PN) { // process phis... }, [] (SelectInst *SI) { // process selects... }, [] (LoadInst *LI) { // process loads... }, [] (AllocaInst *AI) { // process allocas... });
Starting from C++11, there are significantly more uses of braced lists to perform initialization. For example, they can be used to construct aggregate temporaries in expressions. They now have a natural way of ending up nested within each other and within function calls in order to build up aggregates (such as option structs) from local variables.
The historically common formatting of braced initialization of aggregate variables does not mix cleanly with deep nesting, general expression contexts, function arguments, and lambdas. We suggest new code use a simple rule for formatting braced initialization lists: act as-if the braces were parentheses in a function call. The formatting rules exactly match those already well understood for formatting nested function calls. Examples:
This formatting scheme also makes it particularly easy to get predictable, consistent, and automatic formatting with tools like Clang Format.
從C++11開始,花括號列表執行初始化的用途明顯變多。如,在表達式中構造臨時集合。如今,他們有一種天然的方式結束,能夠互相嵌套,也能夠嵌套在函數叼哦那個之中,以便於從局部變量中構建集合(如結構體)。
原來常見的花括號集合變量初始化的格式化不能清晰的混合深層的嵌套、通常的表達式上下文、函數參數和 lambda。咱們建議使用一條簡單的規則來格式化新代碼中的花括號初始化列表:在函數調用中將花括號{}
做爲圓括號()
來對待。該格式化規則與已經易懂的嵌套函數調用格式化規則徹底匹配。如:
foo({a, b, c}, {1, 2, 3}); llvm::Constant *Mask[] = { llvm::ConstantInt::get(llvm::Type::getInt32Ty(getLLVMContext()), 0), llvm::ConstantInt::get(llvm::Type::getInt32Ty(getLLVMContext()), 1), llvm::ConstantInt::get(llvm::Type::getInt32Ty(getLLVMContext()), 2)};
這個格式化方案一樣讓工具例如 clang-format 能夠很是簡單的預測、一致的自動格式化。
Compiler warnings are often useful and help improve the code. Those that are not useful, can be often suppressed with a small code change. For example, an assignment in the if condition is often a typo:
編譯警告每每是有效的而且能夠有利於改進代碼。而那些無用的警告經過代碼小改動就能夠被消除。例如,在 if
條件中使用賦值是一個常常出現的問題:
if (V = getValue()) { ... }
Several compilers will print a warning for the code above. It can be suppressed by adding parentheses:
多數的編譯器會針對於以上代碼打印一條警告。能夠經過添加一個括號來消除警告:
if ((V = getValue())) { ... }
In almost all cases, it is possible to write completely portable code. When you need to rely on non-portable code, put it behind a well-defined and well-documented interface.
在絕大數的狀況下,儘量的編寫徹底可移植的代碼。當你須要依賴於不可移植的代碼時,將其放入良好聲明和良好的文檔接口中。
In an effort to reduce code and executable size, LLVM does not use exceptions or RTTI (runtime type information, for example, dynamic_cast<>).
That said, LLVM does make extensive use of a hand-rolled form of RTTI that use templates like isa<>, cast<>, and dyn_cast<>. This form of RTTI is opt-in and can be added to any class.
爲了儘可能減少代碼和可執行文件的大小,LLVM 未使用異常或者RTTI(運行時信息,如 dynamic_cast<>)。也就是說,LLVM確實使用了大量的手動滾動的RTTI的形式,該形式使用了像 isa<>,cast<> 和 dyn_cast<> 之類的模板。這種形式的RTTI是可選的而且能夠添加至任何類中。
Static constructors and destructors (e.g., global variables whose types have a constructor or destructor) should not be added to the code base, and should be removed wherever possible.
Globals in different source files are initialized in arbitrary order https://yosefk.com/c++fqa/ctors.html#fqa-10.12, making the code more difficult to reason about.
Static constructors have negative impact on launch time of programs that use LLVM as a library. We would really like for there to be zero cost for linking in an additional LLVM target or other library into an application, but static constructors undermine this goal.
靜態構造函數和析構函數(如,包含構造和析構函數的全局變量)不該該添加到代碼中,而且不管在哪都儘量的刪除它。
全局變量在不一樣的文件中的初始化順序是任意的 https://yosefk.com/c++fqa/ctors.html#fqa-10.12,這樣使代碼更加難以推理。
將LLVM做爲庫使用時,靜態構造函數在程序啓動的時候具備負面的影響。咱們真正但願將額外的LLVM目標或者其它庫連接至應用程序中的開銷爲零,可是靜態構造函數違背了這個目標。
In C++, the class and struct keywords can be used almost interchangeably. The only difference is when they are used to declare a class: class makes all members private by default while struct makes all members public by default.
在C++中,class
和 struct
關鍵字在絕大數的狀況下能夠互換。僅僅的區別在於定義一個 class 時全部的成員默認爲 private,而struct默認爲 public。
class
和 struct
的全部生命和定義必須使用同一個關鍵字// Avoid if `Example` is defined as a struct. class Example; // OK. struct Example; struct Example { ... };
struct
// Avoid using `struct` here, use `class` instead. struct Foo { private: int Data; public: Foo() : Data(0) { } int getData() const { return Data; } void setData(int D) { Data = D; } }; // OK to use `struct`: all members are public. struct Bar { int Data; Bar() : Data(0) { } };
Starting from C++11 there is a 「generalized initialization syntax」 which allows calling constructors using braced initializer lists. Do not use these to call constructors with non-trivial logic or if you care that you’re calling some particular constructor. Those should look like function calls using parentheses rather than like aggregate initialization. Similarly, if you need to explicitly name the type and call its constructor to create a temporary, don’t use a braced initializer list. Instead, use a braced initializer list (without any type for temporaries) when doing aggregate initialization or something notionally equivalent. Examples:
從C++11開始的通用初始化語法,容許使用花括號初始化列表來調用構造函數。調用非平凡邏輯的構造函數或者調用一些特定的構造函數時不要使用這個語法,這些狀況應該看起來像函數調用同樣使用圓括號而不是聚合初始化。一樣的,若是你須要顯式化類型名稱而且調用其構造函數做爲一個臨時對象,也不要使用花括號初始化列表。相反的,當執行聚合初始化或等效的概念時,使用花括號初始化列表(無任何臨時變量類型)。如:
class Foo { public: // Construct a Foo by reading data from the disk in the whizbang format, ... Foo(std::string filename); // Construct a Foo by looking up the Nth element of some global data ... Foo(int N); // ... }; // The Foo constructor call is reading a file, don't use braces to call it. std::fill(foo.begin(), foo.end(), Foo("name")); // The pair is being constructed like an aggregate, use braces. bar_map.insert({my_key, my_value});
If you use a braced initializer list when initializing a variable, use an equals before the open curly brace:
當使用初始化列表初始化一個變量時,在開括號以前使用一個等號:
int data[] = {0, 1, 2, 3};
Some are advocating a policy of 「almost always auto」 in C++11, however LLVM uses a more moderate stance. Use auto if and only if it makes the code more readable or easier to maintain. Don’t 「almost always」 use auto, but do use auto with initializers like cast
Similarly, C++14 adds generic lambda expressions where parameter types can be auto. Use these where you would have used a template.
一些人主張在C++11種幾乎老是使用auto的原則,然而LLVM保持更溫和的態度。僅在若是使用auto能夠提高代碼的可讀性或者更利於維護的狀況下使用。並不幾乎老是使用 auto
,而是在初始化像 cast<Foo>(...)
或者類型能夠明顯的從上下文中獲取的地方中使用。其它時候 auto
良好用於其用途是不管如何該類型都會被抽象化,常常出如今容器的 typedef
以後如 std::vector<T>::iterator
。
一樣的,C++14 新增類型能夠爲auto的通用lambda表達式,能夠在本來使用模板的地方使用它。
The convenience of auto makes it easy to forget that its default behavior is a copy. Particularly in range-based for loops, careless copies are expensive.
Use auto & for values and auto * for pointers unless you need to make a copy.
auto
的便利性更容易遺忘它默認是拷貝的行爲,尤爲是在基於範圍的循環,粗心的代價很昂貴。
值使用 auto &
指針使用 auto *
除非你須要進行拷貝。
// Typically there's no reason to copy. for (const auto &Val : Container) { observe(Val); } for (auto &Val : Container) { Val.change(); } // Remove the reference if you really want a new copy. for (auto Val : Container) { Val.change(); saveSomewhere(Val); } // Copy pointers, but make it clear that they're pointers. for (const auto *Ptr : Container) { observe(*Ptr); } for (auto *Ptr : Container) { Ptr->change(); }
In general, there is no relative ordering among pointers. As a result, when unordered containers like sets and maps are used with pointer keys the iteration order is undefined. Hence, iterating such containers may result in non-deterministic code generation. While the generated code might work correctly, non-determinism can make it harder to reproduce bugs and debug the compiler.
In case an ordered result is expected, remember to sort an unordered container before iteration. Or use ordered containers like vector/MapVector/SetVector if you want to iterate pointer keys.
一般而言,指針之間沒有相對的順序。但結果是,當set
和map
之類的無序容器使用指針做爲key值時,其迭代器的順序是未定義的。所以,迭代這些容器時可能致使不肯定的代碼產生。而這些產生的代碼可能正常的運行,不肯定性很難重現bug和調試編譯器。
若是指望結果有序,切記在迭代器以前對無序容器進行排序。或者你想迭代指針的key則使用順序容器像 vector/MapVector/SetVector
。
std::sort uses a non-stable sorting algorithm in which the order of equal elements is not guaranteed to be preserved. Thus using std::sort for a container having equal elements may result in non-deterministic behavior. To uncover such instances of non-determinism, LLVM has introduced a new llvm::sort wrapper function. For an EXPENSIVE_CHECKS build this will randomly shuffle the container before sorting. Default to using llvm::sort instead of std::sort.
std::sort
使用了一個不穩定的排序算法,相同元素的順序不能保證被保留下來。所以使用std::sort
對一個具備相同元素的容器排序時可能出現不肯定的行爲。爲了發現這些不肯定的狀況,LLVM 引入了一個新的 llvm::sort
封裝函數。使用 EXPENSIVE_CHECKS
編譯時在排序以前隨機的打亂容器內的順序。默認使用 llvm::sort
代替 std::sort
。
Header files should be self-contained (compile on their own) and end in .h. Non-header files that are meant for inclusion should end in .inc and be used sparingly.
All header files should be self-contained. Users and refactoring tools should not have to adhere to special conditions to include the header. Specifically, a header should have header guards and include all other headers it needs.
There are rare cases where a file designed to be included is not self-contained. These are typically intended to be included at unusual locations, such as the middle of another file. They might not use header guards, and might not include their prerequisites. Name such files with the .inc extension. Use sparingly, and prefer self-contained headers when possible.
In general, a header should be implemented by one or more .cpp files. Each of these .cpp files should include the header that defines their interface first. This ensures that all of the dependences of the header have been properly added to the header itself, and are not implicit. System headers should be included after user headers for a translation unit.
頭文件應該保持獨立(獨立編譯)而且以.h
結尾。非頭文件的包含應該以.inc
結尾而且謹慎使用。
全部的頭文件都應該是獨立的。用戶或者重構工具不該該強制附加特定條件纔可以包含這個頭文件。特別的,一個頭文件應該存在頭文件保護和全部須要的其餘頭文件。
有一些及少見的狀況,設計被包含的文件不是獨立的。它們每每有意的被包含在不經常使用的地方,好比一個文件的中間位置。它們可能不使用頭文件保護而且可能不包括必要的先決條件。這些文件的名稱以 .inc
擴展。謹慎使用這種文件,儘量使用獨立的頭文件。
一般而言,一個頭文件應該被一個或多個 .cpp
實現。每個 .cpp
文件都應該首先包含定義接口的頭文件。確保頭文件全部的依賴都可以顯式正確的添加至該頭文件中。
A directory of header files (for example include/llvm/Foo) defines a library (Foo). Dependencies between libraries are defined by the LLVMBuild.txt file in their implementation (lib/Foo). One library (both its headers and implementation) should only use things from the libraries listed in its dependencies.
Some of this constraint can be enforced by classic Unix linkers (Mac & Windows linkers, as well as lld, do not enforce this constraint). A Unix linker searches left to right through the libraries specified on its command line and never revisits a library. In this way, no circular dependencies between libraries can exist.
This doesn’t fully enforce all inter-library dependencies, and importantly doesn’t enforce header file circular dependencies created by inline functions. A good way to answer the 「is this layered correctly」 would be to consider whether a Unix linker would succeed at linking the program if all inline functions were defined out-of-line. (& for all valid orderings of dependencies - since linking resolution is linear, it’s possible that some implicit dependencies can sneak through: A depends on B and C, so valid orderings are 「C B A」 or 「B C A」, in both cases the explicit dependencies come before their use. But in the first case, B could still link successfully if it implicitly depended on C, or the opposite in the second case)
頭文件目錄(如 include/llvm/Foo)定義了庫(Foo)。庫之間的依賴在它們實現(lib/Foo)中的 LLVMBuild.txt 定義。一個庫應該僅使用依賴所列出庫的內容。
一些經典的Unix連接器(Mac & Windows 連接器,好比lld,不強制執行)能夠強制執行某些約束。Unix 連接器從左往右在命令行中搜索特定的庫而且不會重複訪問同一個庫。這種狀況下,庫之間就不存在循環依賴關係。
這樣不會徹底強制的執行全部的相互庫依賴,且重要的是不會執行由內聯函數帶來的頭文件循環依賴。回答是否正確分層的一個好方法是判斷 Unix 連接器是否能夠正確連接使用非內聯函數代替內聯函數的程序。(對於全部依賴的有效順序 - 因爲連接的方案是線性的,所以可能潛伏一些隱式的依賴:A 依賴於B和C,因此有效的順序爲「CBA」 或者「BCA」,兩種顯示的依賴都在使用以前出現。但對於第一種狀況,B若是隱式的依賴於C仍然能夠連接成功,或者在第二種狀況,相反的依賴也能夠連接成功)
#include
hurts compile time performance. Don’t do it unless you have to, especially in header files.
But wait! Sometimes you need to have the definition of a class to use it, or to inherit from it. In these cases go ahead and #include that header file. Be aware however that there are many cases where you don’t need to have the full definition of a class. If you are using a pointer or reference to a class, you don’t need the header file. If you are simply returning a class instance from a prototyped function or method, you don’t need it. In fact, for most cases, you simply don’t need the definition of a class. And not #includeing speeds up compilation.
It is easy to try to go too overboard on this recommendation, however. You must include all of the header files that you are using — you can include them either directly or indirectly through another header file. To make sure that you don’t accidentally forget to include a header file in your module header, make sure to include your module header first in the implementation file (as mentioned above). This way there won’t be any hidden dependencies that you’ll find out about later.
#include
下降了編譯的性能,在不是必需的時候不要包含,尤爲是在頭文件中。
有一些狀況是你須要獲取到類的定義再去使用或者繼承它,這些狀況則在文件的首部進行 #include
。然而也要想到,存在很是多的狀況是不須要擁有類的完整定義的。若是正在使用類指針或引用,則沒必要包含該頭文件。若是隻是的從原型函數或者方法中返回一個類的實例,也不須要頭文件。實際上,在大多數的狀況下,你根本不須要類的定義。不進行 #include
能夠加速編譯。
這個建議很容易致使偏激的態度,然而,你必須包含全部正在使用的頭文件,不管是直接仍是間接從其它文件包含。爲確保你不會意外的忘記在你的模塊頭文件中包含頭文件,確認在實現文件中第一個包含你的模塊頭文件(就像上面提到的)。這樣,你後面就會發現不會再有隱藏的依賴了。
Many modules have a complex implementation that causes them to use more than one implementation (.cpp) file. It is often tempting to put the internal communication interface (helper classes, extra functions, etc) in the public module header file. Don’t do this!
If you really need to do something like this, put a private header file in the same directory as the source files, and include it locally. This ensures that your private interface remains private and undisturbed by outsiders.
It’s okay to put extra implementation methods in a public class itself. Just make them private (or protected) and all is well.
不少模塊實現很是複雜致使使用了多個實現文件(.cpp
文件)。不要嘗試將公共交互接口(幫助類,擴展函數等)放入到公共的模塊頭文件中。
若是你真的須要這麼作的話,在相同的目錄下放入一個私有的頭文件讓源文件局部包含。這樣確保你的私有接口保留了私有屬性而且不向外發布。
容許將擴展實現的方法放入到一個公共類自身中,可是須要讓它們設爲私有(保護),一切就正常。
When providing an out of line implementation of a function in a source file, do not open namespace blocks in the source file. Instead, use namespace qualifiers to help ensure that your definition matches an existing declaration. Do this:
當源文件中提供一個非內聯(或者叫外部實現)的函數實現時,不要在源文件中打開 namespace 塊(namespace xx {})。相反的,使用 namespace 標記符來幫助確保你的定義匹配上已經存在的聲明。像這樣:
// Foo.h namespace llvm { int foo(const char *s); } // Foo.cpp #include "Foo.h" using namespace llvm; int llvm::foo(const char *s) { // ... }
Doing this helps to avoid bugs where the definition does not match the declaration from the header. For example, the following C++ code defines a new overload of llvm::foo instead of providing a definition for the existing function declared in the header:
這樣作可以避免實現不能匹配頭文件中聲明的bug。舉個例子,下面的C++代碼定義了一個新的重載llvm::foo
來取代在頭文件中存在函數聲明的實現:
// Foo.cpp #include "Foo.h" namespace llvm { int foo(char *s) { // Mismatch between "const char *" and "char *" } } // end namespace llvm
This error will not be caught until the build is nearly complete, when the linker fails to find a definition for any uses of the original function. If the function were instead defined with a namespace qualifier, the error would have been caught immediately when the definition was compiled.
Class method implementations must already name the class and new overloads cannot be introduced out of line, so this recommendation does not apply to them.
這個錯誤在快構建完成以前不可以被捕獲,直到產生函數調用找不到函數定義的連接錯誤。若是這個函數使用namespace標記符來代替,這個錯誤直接在這個定義編譯時被捕獲。
類方法實現必須類已經被命名而且新的重載不可以引入到外部,因此這個建議對它們不起做用。
When reading code, keep in mind how much state and how many previous decisions have to be remembered by the reader to understand a block of code. Aim to reduce indentation where possible when it doesn’t make it more difficult to understand the code. One great way to do this is by making use of early exits and the continue keyword in long loops. Consider this code that does not use an early exit:
在閱讀代碼時,讀者須要在記住有多少狀態及前置條件去閱讀代碼塊。爲了減小盡量縮進而又不是代碼變得更難理解,在長循環中使用提早退出或者continue
關鍵字是一個好的方法。觀察未使用提早退出的代碼:
Value *doSomething(Instruction *I) { if (!I->isTerminator() && I->hasOneUse() && doOtherThing(I)) { ... some long code .... } return 0; }
This code has several problems if the body of the 'if' is large. When you’re looking at the top of the function, it isn’t immediately clear that this only does interesting things with non-terminator instructions, and only applies to things with the other predicates. Second, it is relatively difficult to describe (in comments) why these predicates are important because the if statement makes it difficult to lay out the comments. Third, when you’re deep within the body of the code, it is indented an extra level. Finally, when reading the top of the function, it isn’t clear what the result is if the predicate isn’t true; you have to read to the end of the function to know that it returns null.
It is much preferred to format the code like this:
這份代碼當if
做用體很是龐大時有一些問題。當你正在查看這個函數的頂部時,對因而只作和結束指令相關的事情仍是其餘的判斷操做是不夠清晰的。其二,因爲if語句使註釋難以佈局,相對的也更難以去描述這個判斷爲什麼很重要。其三,當你深刻代碼時,它已經被縮進了一級。最後,當閱讀函數的頂部代碼時,if判斷的結構是否是爲真的結果不夠清晰;你必需要閱讀到函數的底端才知道返回了 null。
更好的代碼格式以下:
Value *doSomething(Instruction *I) { // Terminators never need 'something' done to them because ... if (I->isTerminator()) return 0; // We conservatively avoid transforming instructions with multiple uses // because goats like cheese. if (!I->hasOneUse()) return 0; // This is really just here for example. if (!doOtherThing(I)) return 0; ... some long code .... }
This fixes these problems. A similar problem frequently happens in for loops. A silly example is something like this:
這修復了上面提到的那些問題。一個相似的問題頻繁的出如今for
循環中,一個簡單的例子以下:
for (Instruction &I : BB) { if (auto *BO = dyn_cast<BinaryOperator>(&I)) { Value *LHS = BO->getOperand(0); Value *RHS = BO->getOperand(1); if (LHS != RHS) { ... } } }
When you have very, very small loops, this sort of structure is fine. But if it exceeds more than 10-15 lines, it becomes difficult for people to read and understand at a glance. The problem with this sort of code is that it gets very nested very quickly. Meaning that the reader of the code has to keep a lot of context in their brain to remember what is going immediately on in the loop, because they don’t know if/when the if conditions will have elses etc. It is strongly preferred to structure the loop like this:
當你的循環很是很是簡短的循環時,這種結構是良好的。可是當循環超過10-15行時,用戶在掃視時會變得難以閱讀和理解。這種代碼存在的問題是縮進的很是的快,這意味着讀者腦殼必需要保持大量的上下文,來記住在這個循環中隨即發生的事情,由於他們不知道if
條件是否或者什麼時候存在 else
。更強烈推薦的循環結構以下:
for (Instruction &I : BB) { auto *BO = dyn_cast<BinaryOperator>(&I); if (!BO) continue; Value *LHS = BO->getOperand(0); Value *RHS = BO->getOperand(1); if (LHS == RHS) continue; ... }
This has all the benefits of using early exits for functions: it reduces nesting of the loop, it makes it easier to describe why the conditions are true, and it makes it obvious to the reader that there is no else coming up that they have to push context into their brain for. If a loop is large, this can be a big understandability win.
在函數中使用盡快退出的好處是:減小循環的縮進,更易於描述爲什麼條件爲真,而沒有 else
出現則在讀者的腦殼中變得更加明顯。若是一個循環很是的龐大,這些作法具備巨大的理解優點。
For similar reasons as above (reduction of indentation and easier reading), please do not use 'else' or 'else if' after something that interrupts control flow — like return, break, continue, goto, etc. For example:
一樣的理由如上(減小縮進和易於理解),請不要使用 else
或者else if
在終止的控制流以後,就像是 return
, break
, continue
, goto
等等,如:
case 'J': { if (Signed) { Type = Context.getsigjmp_bufType(); if (Type.isNull()) { Error = ASTContext::GE_Missing_sigjmp_buf; return QualType(); } else { break; // Unnecessary. } } else { Type = Context.getjmp_bufType(); if (Type.isNull()) { Error = ASTContext::GE_Missing_jmp_buf; return QualType(); } else { break; // Unnecessary. } } }
更好的作法以下:
case 'J': if (Signed) { Type = Context.getsigjmp_bufType(); if (Type.isNull()) { Error = ASTContext::GE_Missing_sigjmp_buf; return QualType(); } } else { Type = Context.getjmp_bufType(); if (Type.isNull()) { Error = ASTContext::GE_Missing_jmp_buf; return QualType(); } } break;
這個例子更好的寫法以下:
case 'J': if (Signed) Type = Context.getsigjmp_bufType(); else Type = Context.getjmp_bufType(); if (Type.isNull()) { Error = Signed ? ASTContext::GE_Missing_sigjmp_buf : ASTContext::GE_Missing_jmp_buf; return QualType(); } break;
The idea is to reduce indentation and the amount of code you have to keep track of when reading the code.
這樣作的目的是減小縮進和在閱讀代碼時須要跟蹤的代碼量。
It is very common to write small loops that just compute a boolean value. There are a number of ways that people commonly write these, but an example of this sort of thing is:
經過短循環計算一個 boolean 值是很是常見的寫法。有一系列的常見方式,好比這種:
bool FoundFoo = false; for (unsigned I = 0, E = BarList.size(); I != E; ++I) if (BarList[I]->isFoo()) { FoundFoo = true; break; } if (FoundFoo) { ... }
Instead of this sort of loop, we prefer to use a predicate function (which may be static) that uses early exits:
相對這種循環的方式,咱們更偏向使用一個使用盡早退出的判斷函數(多是靜態):
/// \returns true if the specified list has an element that is a foo. static bool containsFoo(const std::vector<Bar*> &List) { for (unsigned I = 0, E = List.size(); I != E; ++I) if (List[I]->isFoo()) return true; return false; } ... if (containsFoo(BarList)) { ... }
There are many reasons for doing this: it reduces indentation and factors out code which can often be shared by other code that checks for the same predicate. More importantly, it forces you to pick a name for the function, and forces you to write a comment for it. In this silly example, this doesn’t add much value. However, if the condition is complex, this can make it a lot easier for the reader to understand the code that queries for this predicate. Instead of being faced with the in-line details of how we check to see if the BarList contains a foo, we can trust the function name and continue reading with better locality.
這樣作有不少理由:減小了縮進而且分離出可讓其它代碼檢測相同判斷的共享代碼。更重要的是,讓你強制爲這個函數起一個名字,而且強制讓你再爲它寫上註釋。在這個簡短的例子中沒有添加不少值,然而若是這個if條件很是複雜,讀者經過這個判斷能夠很是容易地理解代碼。而不是一開始就面對如何檢測 BarList
中是否包含了 foo
的內聯細節,咱們能夠信任這個函數名稱而且繼續在更好的位置進行閱讀。
Poorly-chosen names can mislead the reader and cause bugs. We cannot stress enough how important it is to use descriptive names. Pick names that match the semantics and role of the underlying entities, within reason. Avoid abbreviations unless they are well known. After picking a good name, make sure to use consistent capitalization for the name, as inconsistency requires clients to either memorize the APIs or to look it up to find the exact spelling.
In general, names should be in camel case (e.g. TextFileReader and isLValue()). Different kinds of declarations have different rules:
選擇糟糕的命名會誤導讀者和形成bug。咱們怎麼強調描述性名稱的重要性都不足爲過。在合理的範圍內選擇語義和基本實體規則相匹配的命名。避免縮寫除非約定成俗。在取好一個名字以後,確保爲命名使用一致的大寫,由於不一致的大寫須要客戶端去記憶這個API或者查找準確的拼寫。
一般而言,命名應該使用駝峯式(如 TextFileReader
和 isLValue()
)。不一樣種類的聲明有不一樣的規則:
Leader
或 Boats
)。openFile()
或 isFoo()
)。emum { ... }
)是類型,因此應該遵循類型的命名規則。枚舉的常見用法是做爲union
的區分符或內部類的指示符。當一個枚舉這樣使用的時候應該增長一個 Kind
後綴(如 ValueKind
)。enum {Foo, Bar}
)和公開成員變量像類型同樣,以大寫開頭。除非枚舉成員是在它本身的小命名空間或內部類中定義,不然應該有一個枚舉聲明名稱對應的前綴。例如, enum ValueKind {...};
可能包含枚舉成員像 VK_Argument
,VK_BasicBlock
等等。枚舉變量做爲便利的常量能夠不須要前綴,舉個栗子:enum { MaxSize = 42, Density = 12 };
As an exception, classes that mimic STL classes can have member names in STL’s style of lower-case words separated by underscores (e.g. begin(), push_back(), and empty()). Classes that provide multiple iterators should add a singular prefix to begin() and end() (e.g. global_begin() and use_begin()).
Here are some examples:
做爲例外,模仿的STL類能夠擁有以STL式的小寫字母下劃線分隔命名的成員。(如:begin()
, push_back()
和 empty()
)。提供多個迭代器的類應該爲begin()
和end()
添加一個單數前綴(如 global_begin()
和use_begin()
)。
一個簡單的例子:
class VehicleMaker { ... Factory<Tire> F; // Avoid: a non-descriptive abbreviation. Factory<Tire> Factory; // Better: more descriptive. Factory<Tire> TireFactory; // Even better: if VehicleMaker has more than one // kind of factories. }; Vehicle makeVehicle(VehicleType Type) { VehicleMaker M; // Might be OK if scope is small. Tire Tmp1 = M.makeTire(); // Avoid: 'Tmp1' provides no information. Light Headlight = M.makeLight("head"); // Good: descriptive. ... }
Use the 「assert」 macro to its fullest. Check all of your preconditions and assumptions, you never know when a bug (not necessarily even yours) might be caught early by an assertion, which reduces debugging time dramatically. The 「
To further assist with debugging, make sure to put some kind of error message in the assertion statement, which is printed if the assertion is tripped. This helps the poor debugger make sense of why an assertion is being made and enforced, and hopefully what to do about it. Here is one complete example:
充分的使用assert
宏。檢測你全部的前置條件和假設,你永遠不知道的一個bug(甚至不是你的)可能被及早地捕獲,而顯著的減小調試的時間。<cassert>
頭文件可能已經被其它在使用的頭文件包含,因此你能夠不花費任何代價去使用它。
爲了進一步的協助調試,確保將一些種類的錯誤消息放入到斷言語句中,當斷言被觸發時消息就被打印出來。這樣能夠幫助弱雞調試器理解斷言是爲什麼存在和執行的,而且所指望作的事情。這裏有一個完整的示例:
inline Value *getOperand(unsigned I) { assert(I < Operands.size() && "getOperand() out of range!"); return Operands[I]; }
一些更多的示例:
assert(Ty->isPointerType() && "Can't allocate a non-pointer type!"); assert((Opcode == Shl || Opcode == Shr) && "ShiftInst Opcode invalid!"); assert(idx < getNumSuccessors() && "Successor # out of range!"); assert(V1.getType() == V2.getType() && "Constant types must be identical!"); assert(isa<PHINode>(Succ->front()) && "Only works on PHId BBs!");
In the past, asserts were used to indicate a piece of code that should not be reached. These were typically of the form:
在過去,斷言慣用於指示一小段不該該被達成的代碼。一個典型的形式以下:
assert(0 && "Invalid radix for integer literal");
This has a few issues, the main one being that some compilers might not understand the assertion, or warn about a missing return in builds where assertions are compiled out.
Today, we have something much better: llvm_unreachable:
這存在着一些問題,最主要的一個是一些編譯器可能不理解該斷言,或者在編譯完成構建斷言時發出了缺乏返回的警告。如今,咱們有一個更好的東西: llvm_unreachable
:
llvm_unreachable("Invalid radix for integer literal");
When assertions are enabled, this will print the message if it’s ever reached and then exit the program. When assertions are disabled (i.e. in release builds), llvm_unreachable becomes a hint to compilers to skip generating code for this branch. If the compiler does not support this, it will fall back to the 「abort」 implementation.
Use llvm_unreachable to mark a specific point in code that should never be reached. This is especially desirable for addressing warnings about unreachable branches, etc., but can be used whenever reaching a particular code path is unconditionally a bug (not originating from user input; see below) of some kind. Use of assert should always include a testable predicate (as opposed to assert(false)).
Neither assertions or llvm_unreachable will abort the program on a release build. If the error condition can be triggered by user input then the recoverable error mechanism described in LLVM Programmer’s Manual should be used instead. In cases where this is not practical, report_fatal_error may be used.
Another issue is that values used only by assertions will produce an 「unused value」 warning when assertions are disabled. For example, this code will warn:
當斷言被激活時,若是斷言曾被達成則將打印這條消息而後退出程序。當斷言被禁用時(如release構建),llvm_unreachable
成爲編譯器爲該分支忽略生成代碼的指引。當編譯器不支持該實現時,將退化成 abort
的實現。
使用 llvm_unreachable
在代碼中標記一個應該永遠不會達成的特定的點。這對於解決有關不可達到的分支等的警告很是理想,而在特定路徑到達是無條件的錯誤狀況下使用就是某種bug(不是源自用戶輸入,見下)。assert
的使用老是應該包含一個能夠測試的判斷(與 assert(false)
相反)。
其它的問題是當斷言被禁用時,僅在斷言中使用的值將產生一個「值未使用」的警告。舉個例子,這些代碼將發出警告:
unsigned Size = V.size(); assert(Size > 42 && "Vector smaller than it should be"); bool NewToSet = Myset.insert(Value); assert(NewToSet && "The value shouldn't be in the set yet");
These are two interesting different cases. In the first case, the call to V.size() is only useful for the assert, and we don’t want it executed when assertions are disabled. Code like this should move the call into the assert itself. In the second case, the side effects of the call must happen whether the assert is enabled or not. In this case, the value should be cast to void to disable the warning. To be specific, it is preferred to write the code like this:
代碼有兩種有趣的狀況。在第一種狀況中,V.size()
調用僅僅在斷言時有效,而且咱們不讓該調用在斷言被禁用的時候執行,這種狀況下咱們應該將這個調用移入到這個斷言內。在第二種狀況中,調用的反作用必定會發生不管這個斷言禁用與否。將值轉換成void
來禁用這個警告。具體來講,更好的寫法以下:
assert(V.size() > 42 && "Vector smaller than it should be"); bool NewToSet = Myset.insert(Value); (void)NewToSet; assert(NewToSet && "The value shouldn't be in the set yet");
In LLVM, we prefer to explicitly prefix all identifiers from the standard namespace with an 「std::」 prefix, rather than rely on 「using namespace std;」.
In header files, adding a 'using namespace XXX' directive pollutes the namespace of any source file that #includes the header, creating maintenance issues.
In implementation files (e.g. .cpp files), the rule is more of a stylistic rule, but is still important. Basically, using explicit namespace prefixes makes the code clearer, because it is immediately obvious what facilities are being used and where they are coming from. And more portable, because namespace clashes cannot occur between LLVM code and other namespaces. The portability rule is important because different standard library implementations expose different symbols (potentially ones they shouldn’t), and future revisions to the C++ standard will add more symbols to the std namespace. As such, we never use 'using namespace std;' in LLVM.
The exception to the general rule (i.e. it’s not an exception for the std namespace) is for implementation files. For example, all of the code in the LLVM project implements code that lives in the ‘llvm’ namespace. As such, it is ok, and actually clearer, for the .cpp files to have a 'using namespace llvm;' directive at the top, after the #includes. This reduces indentation in the body of the file for source editors that indent based on braces, and keeps the conceptual context cleaner. The general form of this rule is that any .cpp file that implements code in any namespace may use that namespace (and its parents’), but should not use any others.
在LLVM中,咱們更推薦顯式的在標準命名空間內的標識符添加前綴std::
,而不是依靠using namespace std;
。
在頭文件中,添加一個"using namespace XXX"指令污染全部包含這個頭文件的源文件的命名空間,將形成維護的問題。
在實現的文件中(如 .cpp
文件),這個規則更多的是格式化的規則,可是仍然很重要。基本的,顯式的使用命名空間前綴讓代碼變得清晰,由於使用了什麼設備和來自哪裏的都顯而易見。而且更具移植性,由於命名空間衝突不可能發生在LLVM和其它命名空間之間。這個可移植性的規則是很是重要的由於不一樣的標準庫實現暴露不一樣的符號(潛臺詞是不該該這樣),而且向C++標準添加新特性修訂版將在標準命名空間內添加更多的符號。因此,咱們永遠不要使用 'using namespace std;' 在LLVM中。
通常規則的例外(std
命名空間不是例外)是實現文件。打個比方,LLVM項目中全部實現的代碼都位於llvm命名空間中。因此在.cpp
文件include以後使用 using namespace std;
指令是可行的,且實際上更清晰。在基於花括號縮進的源碼編輯器能夠減小主體的縮進,而且讓上下文在概念上更加清晰。此規則的通常形式是,在任何命名空間內實現代碼的任何.cpp
文件均可以使用該命名空間(及其父命名空間),但不該該使用其它的命名空間。
If a class is defined in a header file and has a vtable (either it has virtual methods or it derives from classes with virtual methods), it must always have at least one out-of-line virtual method in the class. Without this, the compiler will copy the vtable and RTTI into every .o file that #includes the header, bloating .o file sizes and increasing link times.
若是一個定義在頭文件內的類存在虛表(含有虛表或者從有虛表繼承),在類中必須老是至少含有一個非內聯的虛方法。不然,編譯器將複製 vtable 和 RTTI 信息到每個包含該頭文件中的 .o
文件中,致使 .o
文件大小膨脹而且增長連接耗時。
-Wswitch warns if a switch, without a default label, over an enumeration does not cover every enumeration value. If you write a default label on a fully covered switch over an enumeration then the -Wswitch warning won’t fire when new elements are added to that enumeration. To help avoid adding these kinds of defaults, Clang has the warning -Wcovered-switch-default which is off by default but turned on when building LLVM with a version of Clang that supports the warning.
A knock-on effect of this stylistic requirement is that when building LLVM with GCC you may get warnings related to 「control may reach end of non-void function」 if you return from each case of a covered switch-over-enum because GCC assumes that the enum expression may take any representable value, not just those of individual enumerators. To suppress this warning, use llvm_unreachable after the switch.
-Wswitch
在 switch 沒有默認標籤而且枚舉沒有徹底覆蓋枚舉值狀況下發出警告。若是在全覆蓋枚舉的switch中寫了默認的標籤,那麼之後再添加枚舉值得時候 -Wswitch
將不會再發出警告。爲了不添加此類默認的枚舉,Clang 可使用 -Wcoverd-switch-default
選項發出警告,默認狀況下保持關閉可是使用支持該選項的 Clang 構建 LLVM 時被打開。
這種風格上的需求帶來一系列連鎖反應是當使用 GCC 編譯 LLVM 時,若是你從每種枚舉都覆蓋的case中返回,可能獲得 "control may reach end of non-void function" 相關的警告,這是由於 GCC 假設枚舉表達式能夠取任何能夠表達的值,而不只僅是那些獨立的枚舉值。在switch後使用 llvm_unreachable
來消除這些警告。(感受有點像是在黑GCC)
The introduction of range-based for loops in C++11 means that explicit manipulation of iterators is rarely necessary. We use range-based for loops wherever possible for all newly added code. For example:
C++11引入的基於範圍的for
循環意味着顯式對迭代器的操做不是必要的。咱們儘量地在新添加的代碼中使用基於範圍的for
循環。如:
BasicBlock *BB = ... for (Instruction &I : *BB) ... use I ...
In cases where range-based for loops can’t be used and it is necessary to write an explicit iterator-based loop, pay close attention to whether end() is re-evaluated on each loop iteration. One common mistake is to write a loop in this style:
若是不能使用基於範圍的for
循環,則有必要顯式的寫出基於迭代器的的循環,而且注意end()
是否在每一個循環中都從新被計算。一個常見的錯誤循環格式以下:
BasicBlock *BB = ... for (auto I = BB->begin(); I != BB->end(); ++I) ... use I ...
The problem with this construct is that it evaluates 「BB->end()」 every time through the loop. Instead of writing the loop like this, we strongly prefer loops to be written so that they evaluate it once before the loop starts. A convenient way to do this is like so:
這個結構的問題式每次都計算了BB->end()
,這個過程貫穿了整個循環。與這種寫法相反,咱們強烈推薦的循環寫法是在循環開始計算一次,一個方便的方法以下:
BasicBlock *BB = ... for (auto I = BB->begin(), E = BB->end(); I != E; ++I) ... use I ...
The observant may quickly point out that these two loops may have different semantics: if the container (a basic block in this case) is being mutated, then 「BB->end()」 may change its value every time through the loop and the second loop may not in fact be correct. If you actually do depend on this behavior, please write the loop in the first form and add a comment indicating that you did it intentionally.
Why do we prefer the second form (when correct)? Writing the loop in the first form has two problems. First it may be less efficient than evaluating it at the start of the loop. In this case, the cost is probably minor — a few extra loads every time through the loop. However, if the base expression is more complex, then the cost can rise quickly. I’ve seen loops where the end expression was actually something like: 「SomeMap[X]->end()」 and map lookups really aren’t cheap. By writing it in the second form consistently, you eliminate the issue entirely and don’t even have to think about it.
The second (even bigger) issue is that writing the loop in the first form hints to the reader that the loop is mutating the container (a fact that a comment would handily confirm!). If you write the loop in the second form, it is immediately obvious without even looking at the body of the loop that the container isn’t being modified, which makes it easier to read the code and understand what it does.
While the second form of the loop is a few extra keystrokes, we do strongly prefer it.
能夠迅速察覺出來兩個循環可能有不一樣的語義:若是一個容器(一個基本的代碼塊)正在被改變,而後BB->end()
可能每次通過循環時都改變它的值,而且第二次循環可能實際上不正確。若是真正依賴這種行爲,那麼使用第一種循環的形式,而且添加註釋來講明這樣作是有意的。
爲何咱們更推薦第二種形式(正確的時候)?第一種循環寫法的形式有兩個問題。第一種是它可能比在循環開始計算更爲低效。這種狀況,消耗可能較小,每次循環有一些額外的加載操做。然而當基礎表達式更爲複雜時,消耗將直線上升。見過的一些end表達式其實是相似 SomeMap[X]->end()
而且map查找消耗至關高。堅持使用第二種形式,你徹底能夠消除這種問題,甚至都不須要去考慮這種狀況。
第二種(甚至更大)的問題是第一種形式的循環暗示讀者循環正在改變容器(註釋手動肯定的事實下)。若是你寫了第二種形式的循環,甚至在不觀察循環體的狀況下,容器不發生改變也是顯而易見的,這樣更利於代碼閱讀和理解作了什麼。
即便第二種循環的形式有一些額外的按鍵,咱們依然強烈推薦。
The use of #include <iostream>
in library files is hereby forbidden, because many common implementations transparently inject a static constructor into every translation unit that includes it.
Note that using the other stream headers (<sstream>
for example) is not problematic in this regard — just <iostream>
. However, raw_ostream provides various APIs that are better performing for almost every use than std::ostream style APIs.
禁止在庫中使用 iostream
頭文件,由於不少實現隱含的(透明)將靜態構造函數引入每個包含這個頭文件的編譯單元中。
注意,在這種考慮下使用其它流的頭文件(好比sstream
)是沒有問題的,就 iostream
有問題。然而 raw_ostream
提供了多種APIs,他們幾乎在各類用途上都比std::ostream
樣式的APIs在性能上都更強。
新代碼應該老是使用 raw_ostream 來寫文件,或者 llvm::MemoryBuffer 來讀文件。
LLVM includes a lightweight, simple, and efficient stream implementation in llvm/Support/raw_ostream.h, which provides all of the common features of std::ostream. All new code should use raw_ostream instead of ostream.
Unlike std::ostream, raw_ostream is not a template and can be forward declared as class raw_ostream. Public headers should generally not include the raw_ostream header, but use forward declarations and constant references to raw_ostream instances.
LLVM 包含一個輕量的,簡單和高效的流實現,位於 llvm/Support/raw_ostream.h
,而且提供了全部關於std::ostream
的通用特性。全部新的代碼應該使用raw_ostream
而不是 ostream
。
和 std::ostream
不一樣,raw_ostream
不是模板而且做爲 class raw_ostream
能夠被前向聲明。公共頭文件應該不包含該 raw_ostream 頭文件,而是應該使用前向聲明和常量raw_ostream
實例。
The std::endl
modifier, when used with iostreams
outputs a newline to the output stream specified. In addition to doing this, however, it also flushes the output stream. In other words, these are equivalent:
std::endl
修飾符與 iostream
一塊兒使用,像特定的輸出流中輸出一個換行。這樣作以外,然而它還會對輸出流進行 flush
操做,換句話說,它們是等效的:
std::cout << std::endl; std::cout << '\n' << std::flush;
Most of the time, you probably have no reason to flush the output stream, so it’s better to use a literal '\n'.
大多數狀況下,你可能沒有理由去 flush 輸出流,因此更好的作法使用字面量 '\n'
.
A member function defined in a class definition is implicitly inline, so don’t put the inline keyword in this case.
定義在類中的成員函數實現是隱式內斂的,因此這種狀況下不要使用 inline
關鍵字:
不要這樣作:
class Foo { public: inline void bar() { // ... } };
而是這樣:
class Foo { public: void bar() { // ... } };
This section describes preferred low-level formatting guidelines along with reasoning on why we prefer them.
這個部分描述了推薦低級的格式化規則及推薦它們的理由。
Put a space before an open parenthesis only in control flow statements, but not in normal function call expressions and function-like macros. For example:
在控制流語句的開括號前添加空格,而在通常的函數調用表達式或者類函數的宏前這樣作。舉個例子:
if (X) ... for (I = 0; I != 100; ++I) ... while (LLVMRocks) ... somefunc(42); assert(3 != 4 && "laws of math are failing me"); A = foo(42, 92) + bar(X);
The reason for doing this is not completely arbitrary. This style makes control flow operators stand out more, and makes expressions flow better.
這樣作的理由並非任意的,這種格式讓控制流操做符更加突出而且讓表達式更流暢。
Hard fast rule: Preincrement (++X) may be no slower than postincrement (X++) and could very well be a lot faster than it. Use preincrementation whenever possible.
The semantics of postincrement include making a copy of the value being incremented, returning it, and then preincrementing the 「work value」. For primitive types, this isn’t a big deal. But for iterators, it can be a huge issue (for example, some iterators contains stack and set objects in them… copying an iterator could invoke the copy ctor’s of these as well). In general, get in the habit of always using preincrement, and you won’t have a problem.
固定規則:前置自增(++X)可能不比後置自增(X++)慢而且極可能比後置自增快得多。不管什麼時候京可能使用前置自增。
後置自增的語義上包含拷貝被遞增的值,返回它,再前遞增這個 「工做值」。對於基本類型,這樣作沒什麼大不了的。可是對於迭代器,這樣作是一個很大的問題(舉個例子,一些迭代器中包含 stack 和 set 對象,拷貝這些迭代器一樣可能會調用這些對象的拷貝構造函數)。一般而言,養成老是使用前置自增的習慣,就不會有問題。
In general, we strive to reduce indentation wherever possible. This is useful because we want code to fit into 80 columns without excessive wrapping, but also because it makes it easier to understand the code. To facilitate this and avoid some insanely deep nesting on occasion, don’t indent namespaces. If it helps readability, feel free to add a comment indicating what namespace is being closed by a }. For example:
一般,不管再哪咱們都努力的儘量減小縮進。這樣是頗有用的,由於咱們但願代碼容納在80列的範圍內,但一樣是由於這樣作讓代碼變得更容易理解。爲了促進這樣作而且避免某種場合下瘋狂的深層次嵌套,不要縮進命名空間。若是這樣對可讀性有利,那麼順手添加一行註釋,來指示}
關閉了哪一個命名空間。
namespace llvm { namespace knowledge { /// This class represents things that Smith can have an intimate /// understanding of and contains the data associated with it. class Grokable { ... public: explicit Grokable() { ... } virtual ~Grokable() = 0; ... }; } // end namespace knowledge } // end namespace llvm
Feel free to skip the closing comment when the namespace being closed is obvious for any reason. For example, the outer-most namespace in a header file is rarely a source of confusion. But namespaces both anonymous and named in source files that are being closed half way through the file probably could use clarification.
當因爲任何緣由而關閉的命名空間很明顯時,請隨時跳過結束註釋。例如,頭文件中最外面的名稱空間不多引發混亂。可是匿名命名空間和源文件中半途被關閉的命名空間可能須要說明。
After talking about namespaces in general, you may be wondering about anonymous namespaces in particular. Anonymous namespaces are a great language feature that tells the C++ compiler that the contents of the namespace are only visible within the current translation unit, allowing more aggressive optimization and eliminating the possibility of symbol name collisions. Anonymous namespaces are to C++ as 「static」 is to C functions and global variables. While 「static」 is available in C++, anonymous namespaces are more general: they can make entire classes private to a file.
The problem with anonymous namespaces is that they naturally want to encourage indentation of their body, and they reduce locality of reference: if you see a random function definition in a C++ file, it is easy to see if it is marked static, but seeing if it is in an anonymous namespace requires scanning a big chunk of the file.
Because of this, we have a simple guideline: make anonymous namespaces as small as possible, and only use them for class declarations. For example:
在討論完通常的命名空間後,你可能特別想了解匿名命名空間。匿名命名空間是個很好的語言特性,告訴C++編譯器該命名空間的內容僅在當前編譯單元內可見,容許更多積極的優化和消除命名符號的衝突。對C++而言,匿名命名空間等同 static
對於C函數和全局變量,命名空間更加的通用:它可使真個類私用於文件。
匿名命名空間的問題在於它們天然但願鼓勵縮進其主體,而且下降了引用的位置:若是你在C++文件中隨機看函數定義,則很容易看到它是否標記爲靜態,但它是否位於匿名命名空間中,須要掃描文件的很大一部分。
由此,咱們有一個簡單的規則:讓匿名命名空間儘量的小,而且僅在類聲明時使用,舉個栗子:
namespace { class StringSort { ... public: StringSort(...) bool operator<(const char *RHS) const; }; } // end anonymous namespace static void runHelper() { ... } bool StringSort::operator<(const char *RHS) const { ... }
Avoid putting declarations other than classes into anonymous namespaces:
避免將類以外的聲明放入命名空間:
namespace { // ... many declarations ... void runHelper() { ... } // ... many declarations ... } // end anonymous namespace
When you are looking at 「runHelper」 in the middle of a large C++ file, you have no immediate way to tell if this function is local to the file. In contrast, when the function is marked static, you don’t need to cross-reference faraway places in the file to tell that the function is local.
當你看到大規模的C++文件中部看到 "runHelper" 時,你沒法當即判斷這是不是文件局部函數。相反,若是這個文件被標記爲靜態,則你不須要在這個文件中交叉查看更遠的地方就能說出這個文件是局部的。
A lot of these comments and recommendations have been culled from other sources. Two particularly important books for our work are:
If you get some free time, and you haven’t read them: do so, you might learn something.
大量註釋和推薦是從其它來源中挑選出來的,對於咱們工做特別重要的兩本書:
If you get some free time, and you haven’t read them: do so, you might learn something.
若是你有一些空間時間的化,而且尚未閱讀過他們:那麼,你可能會學到一些東西。