【整理】__builtin_expect 解惑

時間 2019-11-11

標籤整理 builtin expect 解惑简体版

原文原文鏈接

最近看 GLib 的代碼遇到這個東東，網上搜索一圈，發現不少人都寫過這個，本身今天才研究到，汗顏一下，掃盲一個點，留此記錄爲證！

首先看一篇最官方的講解：

======
likely() and unlikely()

What are they ?
In Linux kernel code, one often find calls to likely() and unlikely(), in conditions, like :

bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx);
if (unlikely(!bvl)) {
  mempool_free(bio, bio_pool);
  bio = NULL;
  goto out;
}

In fact, these functions are hints for the compiler that allows it to correctly optimize the branch, by knowing which is the likeliest one. The definitions of these macros, found in include/linux/compiler.h are the following :

#define likely(x)       __builtin_expect(!!(x), 1)
#define unlikely(x)     __builtin_expect(!!(x), 0)

The GCC documentation explains the role of __builtin_expect() :

-- Built-in Function: long __builtin_expect (long EXP, long C)
     You may use `__builtin_expect' to provide the compiler with branch
     prediction information.  In general, you should prefer to use
     actual profile feedback for this (`-fprofile-arcs'), as
     programmers are notoriously bad at predicting how their programs
     actually perform.  However, there are applications in which this
     data is hard to collect.

     The return value is the value of EXP, which should be an integral
     expression.  The value of C must be a compile-time constant.  The
     semantics of the built-in are that it is expected that EXP == C.
     For example:

          if (__builtin_expect (x, 0))
            foo ();

     would indicate that we do not expect to call `foo', since we
     expect `x' to be zero.  Since you are limited to integral
     expressions for EXP, you should use constructions such as

          if (__builtin_expect (ptr != NULL, 1))
            error ();

     when testing pointer or floating-point values.

How does it optimize things ?
It optimizes things by ordering the generated assembly code correctly, to optimize the usage of the processor pipeline. To do so, they arrange the code so that the likeliest branch is executed without performing any jmp instruction (which has the bad effect of flushing the processor pipeline).

To see how it works, let's compile the following simple C user space program with gcc -O2 :

#define likely(x)    __builtin_expect(!!(x), 1)
#define unlikely(x)  __builtin_expect(!!(x), 0)

int main(char *argv[], int argc)
{
   int a;

   /* Get the value from somewhere GCC can't optimize */
   a = atoi (argv[1]);

   if (unlikely (a == 2))
      a++;
   else
      a--;

   printf ("%d\n", a);

   return 0;
}

Now, disassemble the resulting binary using objdump -S (comments added by me) :

080483b0 <main>:
 // Prologue
 80483b0:       55                      push   %ebp
 80483b1:       89 e5                   mov    %esp,%ebp
 80483b3:       50                      push   %eax
 80483b4:       50                      push   %eax
 80483b5:       83 e4 f0                and    $0xfffffff0,%esp
 //             Call atoi()
 80483b8:       8b 45 08                mov    0x8(%ebp),%eax
 80483bb:       83 ec 1c                sub    $0x1c,%esp
 80483be:       8b 48 04                mov    0x4(%eax),%ecx
 80483c1:       51                      push   %ecx
 80483c2:       e8 1d ff ff ff          call   80482e4 <atoi@plt>
 80483c7:       83 c4 10                add    $0x10,%esp
 //             Test the value
 80483ca:       83 f8 02                cmp    $0x2,%eax
 //             --------------------------------------------------------
 //             If 'a' equal to 2 (which is unlikely), then jump,
 //             otherwise continue directly, without jump, so that it
 //             doesn't flush the pipeline.
 //             --------------------------------------------------------
 80483cd:       74 12                   je     80483e1 <main+0x31>
 80483cf:       48                      dec    %eax
 //             Call printf
 80483d0:       52                      push   %edx
 80483d1:       52                      push   %edx
 80483d2:       50                      push   %eax
 80483d3:       68 c8 84 04 08          push   $0x80484c8
 80483d8:       e8 f7 fe ff ff          call   80482d4 <printf@plt>
 //             Return 0 and go out.
 80483dd:       31 c0                   xor    %eax,%eax
 80483df:       c9                      leave
 80483e0:       c3                      ret

Now, in the previous program, replace the unlikely() by a likely(), recompile it, and disassemble it again (again, comments added by me) :

080483b0 <main>:
 //             Prologue
 80483b0:       55                      push   %ebp
 80483b1:       89 e5                   mov    %esp,%ebp
 80483b3:       50                      push   %eax
 80483b4:       50                      push   %eax
 80483b5:       83 e4 f0                and    $0xfffffff0,%esp
 //             Call atoi()
 80483b8:       8b 45 08                mov    0x8(%ebp),%eax
 80483bb:       83 ec 1c                sub    $0x1c,%esp
 80483be:       8b 48 04                mov    0x4(%eax),%ecx
 80483c1:       51                      push   %ecx
 80483c2:       e8 1d ff ff ff          call   80482e4 <atoi@plt>
 80483c7:       83 c4 10                add    $0x10,%esp
 //             --------------------------------------------------
 //             If 'a' equal 2 (which is likely), we will continue
 //             without branching, so without flusing the pipeline. The
 //             jump only occurs when a != 2, which is unlikely.
 //             ---------------------------------------------------
 80483ca:       83 f8 02                cmp    $0x2,%eax
 80483cd:       75 13                   jne    80483e2 <main+0x32>
 //             Here the a++ incrementation has been optimized by gcc
 80483cf:       b0 03                   mov    $0x3,%al
 //             Call printf()
 80483d1:       52                      push   %edx
 80483d2:       52                      push   %edx
 80483d3:       50                      push   %eax
 80483d4:       68 c8 84 04 08          push   $0x80484c8
 80483d9:       e8 f6 fe ff ff          call   80482d4 <printf@plt>
 //             Return 0 and go out.
 80483de:       31 c0                   xor    %eax,%eax
 80483e0:       c9                      leave
 80483e1:       c3                      ret

How should I use it ?
You should use it only in cases when the likeliest branch is very very very likely, or when the unlikeliest branch is very very very unlikely.

======

看完最權威的，下面看下「民間」的說法：

======
likely,unlikely宏與GCC內建函數__builtin_expect()

在 GCC 手冊中對 __builtin_expect() 的描述是這樣的：

因爲大部分程序員在分支預測方面作得很糟糕，因此 GCC 提供了這個內建函數來幫助程序員處理分支預測，優化程序。其第一個參數 exp 爲一個整型表達式，這個內建函數的返回值也是這個 exp ，而 c 爲一個編譯期常量。這個函數的語義是：你指望 exp 表達式的值等於常量 c ，從而 GCC 爲你優化程序，將符合這個條件的分支放在合適的地方。通常狀況下，你也許會更喜歡使用 gcc 的一個參數 '-fprofile-arcs' 來收集程序運行的關於執行流程和分支走向的實際反饋信息。
由於這個程序只提供了整型表達式，因此若是你要優化其餘類型的表達式，能夠採用指針的形式。

likely 和 unlikely 是 gcc 擴展的跟處理器相關的宏：

#define  likely(x)        __builtin_expect(!!(x), 1) 
#define  unlikely(x)      __builtin_expect(!!(x), 0)

如今處理器都是流水線的，有些裏面有多個邏輯運算單元，系統能夠提早取多條指令進行並行處理，但遇到跳轉時，則須要從新取指令，這相對於不用從新去指令就下降了速度。
因此就引入了 likely 和 unlikely ，目的是增長條件分支預測的準確性，cpu 會提早裝載後面的指令，遇到條件轉移指令時會提早預測並裝載某個分支的指令。unlikely 表示你能夠確認該條件是極少發生的，相反 likely 表示該條件多數狀況下會發生。編譯器會產生相應的代碼來優化 cpu 執行效率。

所以程序員在編寫代碼時能夠根據判斷條件發生的機率來優化處理器的取指操做。
例如：

int x, y; 
if(unlikely(x > 0)) 
    y = 1; 
else 
    y = -1;

上面的代碼中 gcc 編譯的指令會預先讀取 y = -1 這條指令，這適合 x 的值大於 0 的機率比較小的狀況。若是 x 的值在大部分狀況下是大於 0 的，就應該用 likely(x > 0)，這樣編譯出的指令是預先讀取 y = 1 這條指令了。這樣系統在運行時就會減小從新取指了。

======
內核中的 likely() 與 unlikely()

首先要明確：

if(likely(value)) 等價於 if(value)
if(unlikely(value)) 也等價於 if(value)

__builtin_expect() 是 GCC (version >= 2.96）提供給程序員使用的，目的是將「分支轉移」的信息提供給編譯器，這樣編譯器能夠對代碼進行優化，以減小指令跳轉帶來的性能降低。

__builtin_expect((x),1) 表示 x 的值爲真的可能性更大；
__builtin_expect((x),0) 表示 x 的值爲假的可能性更大。

也就是說，使用 likely() ，執行 if 後面的語句的機會更大，使用 unlikely()，執行 else 後面的語句的機會更大。經過這種方式，編譯器在編譯過程當中，會將可能性更大的代碼緊跟着起面的代碼，從而減小指令跳轉帶來的性能上的降低。

======

看完一圈別人寫的東西，本身也要輸出點乾貨，列舉 GLib-2.35.4 中 gmacros.h 代碼以下：

/*
 * The G_LIKELY and G_UNLIKELY macros let the programmer give hints to 
 * the compiler about the expected result of an expression. Some compilers
 * can use this information for optimizations.
 *
 * The _G_BOOLEAN_EXPR macro is intended to trigger a gcc warning when
 * putting assignments in g_return_if_fail ().  
 */
#if defined(__GNUC__) && (__GNUC__ > 2) && defined(__OPTIMIZE__)
#define _G_BOOLEAN_EXPR(expr)                   \
 G_GNUC_EXTENSION ({                            \
   int _g_boolean_var_;                         \
   if (expr)                                    \
      _g_boolean_var_ = 1;                      \
   else                                         \
      _g_boolean_var_ = 0;                      \
   _g_boolean_var_;                             \
})
// 爲條件判斷提供程序員指望的結果-- 用於編譯器優化
#define G_LIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 1))
#define G_UNLIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 0))
#else
#define G_LIKELY(expr) (expr)
#define G_UNLIKELY(expr) (expr)
#endif

由上能夠看出， GLib 中使用 _G_BOOLEAN_EXPR(expr) 代替了 !!(expr) 。但功能上是同樣的。

1. __builtin_expect 解惑
2. DOM疑惑點整理（一）
3. DOM疑惑點整理（三）
4. Python解惑：整數比較
5. Python 解惑：整數比較
6. JAVA解惑--長整除
7. likely() && unlikely() && __builtin_expect((x), 1
8. GCC __builtin_expect的做用
9. 困惑度理解
10. 解決網絡整改前的困惑
更多相關文章...
• SQLite Explain（解釋） - SQLite教程
• 錯誤處理 - RUST 教程
• Docker 清理命令
• ☆技術問答集錦（13）Java Instrument原理

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。