研究gcc中不定長數組的實現方式

時間 2019-11-15

標籤研究 gcc 不定數組實現方式欄目 GCC 简体版

原文原文鏈接

一直對C99標準中的不定長數組很好奇，不知道編譯器是怎麼實現這種功能的，我猜想步驟以下：shell

如下面的代碼爲例，數組

#include <stdio.h>
int main(){
    int len;
    scanf("%d",&len);
    char buff[len];
    
    int size;
    size = sizeof(buff);
    printf("%d",size);
    return 0;
}

當程序運行到了聲明數組這一步的時候，也就是char buff[len]這裏，main函數的棧幀會增加len*sizeof(int)個字節來容納buff數組。猜想終究只是猜想，不如親自動手反編譯看看彙編代碼理解得清楚。
函數

編譯上面代碼事後使用ida反編譯，得到以下代碼。我通過一上午的努力，終於把代碼看懂並加上了註釋（請不要嘲諷我，畢竟新手上路）。spa

.text:00401350 sub_401350      proc near
.text:00401350
.text:00401350 func_arg_0      = dword ptr -40h
.text:00401350 func_arg_1      = dword ptr -3Ch
.text:00401350 var_38          = dword ptr -38h
.text:00401350 buff_len        = dword ptr -28h
.text:00401350 buff_size       = dword ptr -24h
.text:00401350 var_20          = dword ptr -20h
.text:00401350 buff_len_sub_1  = dword ptr -1Ch
.text:00401350 var_C           = dword ptr -0Ch
.text:00401350 arg_0           = dword ptr  4
.text:00401350
.text:00401350                 lea     ecx, [esp+arg_0]
.text:00401354                 and     esp, 0FFFFFFF0h
.text:00401357                 push    dword ptr [ecx-4]
.text:0040135A                 push    ebp
.text:0040135B                 mov     ebp, esp
.text:0040135D                 push    esi
.text:0040135E                 push    ebx
.text:0040135F                 push    ecx
.text:00401360                 sub     esp, 2Ch        ; char *
.text:00401363                 call    sub_401990
.text:00401368                 mov     eax, esp
.text:0040136A                 mov     ebx, eax
.text:0040136C                 lea     eax, [ebp+buff_len]
.text:0040136F                 mov     [esp+40h+func_arg_1], eax ; buff_len的地址
.text:00401373                 mov     [esp+40h+func_arg_0], offset aD ; "%d"
.text:0040137A                 call    scanf
.text:0040137F                 mov     ecx, [ebp+buff_len] ; ecx=buff_len
.text:00401382                 lea     eax, [ecx-1]    ; eax=buff_len-1
.text:00401385                 mov     [ebp+buff_len_sub_1], eax
.text:00401388                 mov     edx, ecx        ; edx=buff_len
.text:0040138A                 mov     eax, 10h
.text:0040138F                 sub     eax, 1          ; eax=15d  (d表明十進制數)
.text:00401392                 add     eax, edx        ; eax=buff_len+15d
.text:00401394                 mov     esi, 10h        ; esi=16d
.text:00401399                 mov     edx, 0          ; 被除數高位置0
.text:0040139E                 div     esi             ; eax = (buff_len + 15d) / 16d
.text:0040139E                                         ; edx = (buff_len + 15d) % 16d
.text:004013A0                 imul    eax, 10h        ; eax = eax * 16d
.text:004013A3                 call    sub_401BD0
.text:004013A8                 sub     esp, eax        ; 在內存中開闢buff的空間
.text:004013A8                                         ; eax即爲buff的實際大小
.text:004013AA                 lea     eax, [esp+40h+var_38]
.text:004013AE                 add     eax, 0
.text:004013B1                 mov     [ebp+var_20], eax
.text:004013B4                 mov     [ebp+buff_size], ecx
.text:004013B7                 mov     eax, [ebp+buff_size]
.text:004013BA                 mov     [esp+40h+func_arg_1], eax
.text:004013BE                 mov     [esp+40h+func_arg_0], offset aD ; "%d"
.text:004013C5                 call    printf
.text:004013CA                 mov     eax, 0
.text:004013CF                 mov     esp, ebx
.text:004013D1                 lea     esp, [ebp-0Ch]
.text:004013D4                 pop     ecx
.text:004013D5                 pop     ebx
.text:004013D6                 pop     esi
.text:004013D7                 pop     ebp
.text:004013D8                 lea     esp, [ecx-4]
.text:004013DB                 retn
.text:004013DB sub_401350      endp

代碼不是很長，我截取了重點部分的代碼code

buff_len        = dword ptr -28h
mov     ecx, [ebp+buff_len] ; ecx=申請的數組長度
mov     edx, ecx        ; edx=buff_len
mov     eax, 10h
sub     eax, 1          ; eax=15d
add     eax, edx        ; eax=buff_len+15d
mov     esi, 10h        ; esi=16d
mov     edx, 0          ; 被除數高位置0
div     esi             ; eax = (buff_len + 15d) / 16d
                        ; edx = (buff_len + 15d) % 16d
imul    eax, 10h        ; eax = eax * 16d
sub     esp, eax        ; 在內存中開闢buff的空間
                        ; eax即爲buff的實際大小

這是程序在執行scanf之後進行的操做，此時用戶輸入的長度已經被儲存到[ebp+buff_len]中去了。內存

簡單觀察後能夠看到動態數組所佔空間的大小是由固定公式計算獲得的，公式爲:編譯器

數組所佔內存大小=((申請大小+15Byte)/16)*16
io

分析到這裏能夠確定個人猜想是不正確的了。實際上數組的增加是以16字節的倍數增加的,也就是說即便你申請1字節的數組，實際獲得的數組卻佔用了16字節。我不明白這麼作有什麼意義，並且還有些浪費內存。也許是爲了對齊內存什麼的吧。
編譯

而我在運行上面程序的時候，輸出sizeof的結果倒是咱們所申請的那個數值，按道理不該該輸出數組實際佔用的內存嗎，看了反編譯代碼才發現這根本就是一個騙局！原來數組的大小根本是沒法計算的，之因此sizeof能工做，是由於程序編譯之後用了一個變量儲存了數組的長度(並且不是實際佔用內存的大小，而是咱們申請的大小)，事後使用sizeof關鍵字的時候就直接讀取這個變量就獲得了數組的大小。class

不得不感嘆不少華麗的表象的內部是那麼混亂