strlen爲什麼如此高效

時間 2019-11-17

原文原文鏈接

直接操做C標準庫提供的字符串操做函數是有必定風險的，稍有不慎就會致使內存問題。這周用業餘時間寫了一個小型的安全字符串操做庫，可是測試以後才發現本身的實現有很大的性能缺陷。
在Solaris上初步作了一個簡單的性能比對，如下是獲得的性能數據(以strlen的數據爲例)：
當傳入的字符串長度爲10時，執行100w次：
strlen 執行時間是：32762毫秒
my_strlen執行時間是：491836毫秒
當傳入的字符串長度爲20時，執行100w次：
strlen 執行時間是：35075毫秒
my_strlen執行時間是：770397毫秒
很顯然，標準庫中strlen的消耗僅是my_strlen的十分之一不到，且其性能消耗隨着字符串長度的增長並未有近線性的增長，而my_strlen則是變化明顯。想必你們這時也能猜到my_strlen採用了傳統的實現的方式，即採用逐個字節判斷是否爲'/0'方式，這也與測試出的現象相符。本着刨根問底的精神，我在網上找到了GNU提供的C標準庫中strlen實現的源碼，要看看GLIBC中strlen究竟採用何種技巧才達到了那麼高的性能。說實話在性能優化這方面本身一直還處於比較初級的位置，這也將是本身未來努力的一個方向。
下載了所有GLIBC的代碼包，這個包還真不小。在string子目錄下找到strlen.c，這就是大多數UNIX平臺、Linux平臺以及絕大多數GNU軟件使用的strlen的實現源碼了。這份代碼由Torbjorn Granlund(還實現了memcpy)編寫，Jim Blandy和Dan Sahlin提供了幫助和註釋。包括註釋在內，GLIBC的strlen的代碼足足有近130行，大體瀏覽一下，沒有怎麼看懂，可耐下心來細緻閱讀，仍是有些心得的。下面是strlen源碼摘要版，後面我將針對這段代碼寫一些個人理解：
  1 /* Return the length of the null-terminated string STR.  Scan for
  2    the null terminator quickly by testing four bytes at a time.  */
  3 size_t strlen (str)  const char *str;
  4 {
  5         const char *char_ptr;
  6         const unsigned long int *longword_ptr;
  7         unsigned long int longword, magic_bits, himagic, lomagic;
  8
  9         /* Handle the first few characters by reading one character at a time.
10            Do this until CHAR_PTR is aligned on a longword boundary.  */
11
12         for (char_ptr = str; ((unsigned long int) char_ptr
13              & (sizeof (longword) - 1)) != 0;
14              ++char_ptr)
15                 if (*char_ptr == '/0')
16                         return char_ptr - str;
17
18         /* All these elucidatory comments refer to 4-byte longwords,
19            but the theory applies equally well to 8-byte longwords.  */
20
21         longword_ptr = (unsigned long int *) char_ptr;
22
23         himagic = 0x80808080L;
24         lomagic = 0x01010101L;
25
26         if (sizeof (longword) > 8)
27                 abort ();
28
29         /* Instead of the traditional loop which tests each character,
30            we will test a longword at a time.  The tricky part is testing
31            if *any of the four* bytes in the longword in question are zero.  */
32
33         for (;;)
34         {
35                 longword = *longword_ptr++;
36
37                 if ( ((longword - lomagic) & himagic) != 0)
38                 {
39                         /* Which of the bytes was the zero?  If none of them were, it was
40                            a misfire; continue the search.  */
41
42                         const char *cp = (const char *) (longword_ptr - 1);
43
44                         if (cp[0] == 0)
45                                 return cp - str;
46                         if (cp[1] == 0)
47                                 return cp - str + 1;
48                         if (cp[2] == 0)
49                                 return cp - str + 2;
50                         if (cp[3] == 0)
51                                 return cp - str + 3;
52                         if (sizeof (longword) > 4)
53                         {
54                                 if (cp[4] == 0)
55                                         return cp - str + 4;
56                                 if (cp[5] == 0)
57                                         return cp - str + 5;
58                                 if (cp[6] == 0)
59                                         return cp - str + 6;
60                                 if (cp[7] == 0)
61                                         return cp - str + 7;
62                         }
63                 }
64         }
65 }
從這段代碼開頭做者的註釋咱們大體能夠了解到該strlen實現的原理：就是經過每次測試四個字節來代替傳統實現中每次測試一個字節的方法。知道這個原理了，那麼還須要解決兩個難題：
1) C標準庫要求有很好的移植性，在絕大部分系統體系結構下都應該能正確運行。那麼每次拿出4個字節比較(unsigned long int)，就須要考慮內存對齊問題，傳入的字符串的首字符地址可不必定在4對齊的地址上；
2) 如何對四個字節進行測試，找出其中某個字節爲全0，這是個技巧問題。
12～21行的代碼解決的就是第一個問題：
      for (char_ptr = str; ((unsigned long int) char_ptr
             & (sizeof (longword) - 1)) != 0;
             ++char_ptr)
                if (*char_ptr == '/0')
                        return char_ptr - str;
        /* All these elucidatory comments refer to 4-byte longwords,
           but the theory applies equally well to 8-byte longwords.  */
        longword_ptr = (unsigned long int *) char_ptr;
做者經過一個for-loop找到傳入字符串中第一個地址對齊到4的字符的地址，因爲該地址已經對齊到4，因此最後一行那個強制轉型是安全的。雖然能夠經過圓整算式直接獲得該對齊地址，可是考慮到這個區間可能存在的'/0'，一個字符一個字符比對也是不可避免的。在不少嚴格對齊的架構上(好比SUN的SPARC平臺)，編譯器通常會將字符串地址在編譯器就放到對齊的地址上，這樣一來，實際執行strlen時for-loop不多能執行一步。
第二個問題做者則是經過一個"帶前提"的技巧來解決的。做者設定了兩個掩碼變量：
himagic = 0x80808080L;
lomagic = 0x01010101L;
並經過一個conditional expression完成了對四字節中全0字節的檢測：((longword - lomagic) & himagic) != 0
咱們將himagic和lomagic按bit展開：
himagic   1000 0000 1000 0000 1000 0000 1000 0000
lomagic   0000 0001 0000 0001 0000 0001 0000 0001
對於這樣的代碼，彷佛沒有什麼理論能夠遵循，須要在實踐中去理解。起初我構造了一個不含全0字節的longword，好比：
longword  1000 0001 1000 0001 1000 0001 1000 0001，而後按照那個條件表達式計算後，竟然也知足!=0的條件，是否是做者的邏輯有問題呢？後來轉念一想，這種邏輯是有「前提條件」的。回顧一下strlen是作什麼的，其輸入參數是任意的麼？固然不是。輸入的字符串中每一個字符的值都在[0, 127]的ascii碼範圍內，也就是說每一個字節最高位的bit都是0，這樣longword就應該是以下這個樣子了：
longword  0xxx xxxx 0xxx xxxx 0xxx xxxx 0xxx xxxx
基於這樣的前提咱們考慮兩種狀況：
當longword中沒有全0字節時，好比：
longword 0000 0001 0000 0001 0000 0001 0000 0001
這樣在作完計算後，值爲0，不知足條件。
當longword中有全零字節時，好比：
longword 0000 0000 0000 0001 0000 0001 0000 0001
這樣在作完計算後，最高字節最高bit的值確定爲1，知足!=0條件，全0字節被檢測出來。也就是說一旦有全0字節，在減去lomagic時勢必會產生借位，全0的那個字節在減去lomagic後最高位bit確定由0變1，這樣與himagic一與，確定不爲0，就是這麼檢測出來的。
這一方法在64位平臺依然適用，上面的代碼摘要中省略了對64bit平臺的特殊處理，爲的是使代碼邏輯更清晰，更易讀。
---------------------
做者：hashmat
來源：CSDN
原文：https://blog.csdn.net/Hashmat/article/details/6054046
版權聲明：本文爲博主原創文章，轉載請附上博文連接！express

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。