我使用NASM編寫的,運行在32位windows和linux主機上,但後來需求增長了,須要在64位windows和linux上運行,windows自身有個wow(windows on windows)機制,32位程序根本不用移植就能在64位機器上跑,而linux雖然沒有LOL機制(是Linux on linux,不是laugth out loud哈,呵呵 ~),但linux 能夠安裝ia-libs庫(ia 應該是 Intel x86 Archive的簡寫)達到LOL效果,不過,編譯ELF64和WIN64OBJ也是我比較感興趣的,因此我要移植程序!linux
首先是瞭解CPU,寄存器,基本上全部的32位寄存器都升級了,eax變成了rax,ebx變成了rbx,等等,它們帶寬變長了,用起來天然也爽了,一次處理 8個字節,一步能夠作不少之前須要幾步的操做了。寄存器增長了r8,r9,r10,r11,r12,r13,r14,r15,這麼多寄存器,又要少用多 少內存作中間變量,效率又高了,能夠本身保存使用的是r12-r15,之前通常只有esi,edi,ebx三個寄存器用做本身保存,如今可好了,有 r12-r15,rbx,一共有5個!爲何沒有rsi和rdi?問得好,在Linux系統裏,這兩個寄存器在64位CPU上用做參數傳遞,因此它們通常不用做保存了,但 是,rsi,rdi這兩個寄存器仍是很重要的,lodsb,stosb之類的指令仍是得用rsi,rdi保存源地址和目的地址。這點,我以爲作得很很差, 爲何不拿新加的寄存器來傳參數,偏要用到我心愛的rsi和rdi寄存器呢。。。我不會作CPU,我還不能抱怨啊!抱怨歸抱怨,這種狀況下,要方便移植,最好就是不要用lodsb之類的指令,而是直接用基址加變址的方式訪問內存。windows
接下來是函數調用,Unix 64 ABI規定使用rdi,rsi,rdx,rcx,r8,r9來傳遞前6個參數,少於6個的,按上面的順 序,要幾個就用幾個,超過6個的,前6個按上面的順序放入寄存器,剩下的從後向前壓入堆棧,而後,設置rax=0,最後使用call指令調用函數,若是超 過6個參數,函數返回後須要修復堆棧,你之前壓入了幾個參數,就把棧頂指針回移 幾*8 個字節,以平衡堆棧。注意的是Windows的ABI規定又不同了!ide
另外64位CPU不支持將32位寄存器直接入棧,因此,很差意思,你的push eax 不能用了,使用push rax,pop rax。不過,直接操做堆棧指針rsp/esp是一種可同時在32位和64位CPU上編譯經過,且不會出問題的方式,並且要連續push多個數值時(好比函數調用),每每一次性減掉esp/rsp,再用基址加變址的形式存參數,會比一個一個push參數的效率高!GCC進行API調用的時候就是這麼實現的,因此其實寫彙編是不如用gcc的,一不注意,GCC編譯的C程序都會比彙編寫的程序效率還高。我通常正式的項目都是用C語言的,但NASM可讓我瞭解得更深,這點是無話可說的!!函數
而本身實現的函數,仍是能夠用之前的c-call方式,以下:ui
1
2
3
4
5
6
7
8
9
10
|
Function:
%define
param1 rbp+16
%define
param2 rbp+24
%define
param3 rbp+32
enter 16,0
%define
local1 rbp-8
%define
local2 rbp-16
;.....
leave
ret
|
最後,就是在移植時困擾了個人問題,就是C函數的返回值,64位CPU中C函數的返回值不是在rax中,而是在edx:eax中。其實大多數函數都沒問題, 通常在返回-1的時候,這個問題就出來了,edx:eax是-1,可是rax不是-1,高32位全是0.低32位全是1。。this
如今時間很少,下次再寫一篇文章詳細討論。spa
結束以前,引用C語言的部分文檔。指針
==========================================code
Interfacing HLL code with asmorm
C calling convention – standard stack frame
Arguments passed to a C function are pushed onto the stack, right to left, before the function is called. The first thing the called function does is push the (E)BP register, then copy (E)SP into it. This creates a data structure called the standard C stack frame.
32-bit code | 16-bit code, TINY, SMALL, or COMPACT memory models | 16-bit code, MEDIUM, LARGE, or HUGE memory models | |
Create standard stack frame, allocate 16 bytes for local variables, save registers | push ebp
mov ebp,esp sub esp,16 push edi push esi … |
push bp
mov bp,sp sub sp,16 push di push si … |
push bp
mov bp,sp sub sp,16 push di push si … |
Restore registers, destroy stack frame, and return | …
pop esi pop edi mov esp,ebp pop ebp ret |
…
pop si pop di mov sp,bp pop bp ret |
…
pop si pop di mov sp,bp pop bp retf |
Size of ‘slots’ in stack frame, i.e. stack width | 32 bits | 16 bits | 16 bits |
Location of stack frame ‘slots’ | [ebp + 8] [ebp + 12] [ebp + 16]… |
[bp + 4] [bp + 6] [bp + 8]… |
[bp + 6] [bp + 8] [bp + 10]… |
If an argument passed to a function is wider than the stack, it will occupy more than one ‘slot’ in the stack frame. A 64-bit value passed to a function (long long or double) will occupy 2 stack slots in 32-bit code or 4 stack slots in 16-bit code.
Function arguments are accessed with positive offsets from the BP or EBP registers. Local variables are accessed with negative offsets. The previous value of BP or EBP is stored at [bp + 0] or [ebp + 0]. The return address (IP or EIP) is stored at [bp + 2] or [ebp + 4].
C calling convention – return values
A C function usually stores its return value in one or more registers.
32-bit code | 16-bit code, all memory models | |
8-bit return value | AL | AL |
16-bit return value | AX | AX |
32-bit return value | EAX | DX:AX |
64-bit return value | EDX:EAX | space for the return value is allocated on the stack of the calling function, and a ‘hidden’ pointer to this space is passed to the called function |
128-bit return value | hidden pointer | hidden pointer |
C calling convention – saving registers
GCC expects functions to preserve the callee-save registers:
EBX, EDI, ESI, EBP, DS, ES, SS
You need not save these registers:
EAX, ECX, EDX, FS, GS, EFLAGS, floating point registers
In some OSes, FS or GS may be used as a pointer to thread local storage (TLS), and must be saved if you modify it.
C calling convention – leading underscores
Some C compilers (those for DOS and Windows, and those with COFF output) prepend an underscore to the names of C functions and global variables. If a C global variable, e.g. conv_mem_size, is accessed by asm code, it should be declared with a leading underscore in the asm code:
EXTERN _conv_mem_size ; NASM syntax
mov [_conv_mem_size],ax
Linux ELF does NOT use underscores. Watcom C uses trailing underscores for function names, and leading underscores for global variables.
If your GCC supports it, leading underscores can be turned off with the compiler option -fno-leading-underscore
Pascal calling conventions
Function arguments are pushed onto the stack from left to right before the function is called. C-style variable-length argument lists are not possible in Pascal. (Look in file STDARG.H and think about it.)
In C, the calling function must ‘clean up the stack’ (remove function arguments from the stack after the called function returns). In Pascal, the called function must do this, before returning.
Pascal identifiers are case-insensitive. MyKewlProc() will be stored in the object code file as MYKEWLPROC
Other calling conventions
The __stdcall calling convention, used by Windows, is a hybrid of the C and Pascal calling conventions. Like C, function arguments are pushed right-to-left. Like Pascal, the called function must clean up the stack. Exception: the caller must clean up the stack for functions that accept a variable number of arguments, e.g. printf(const char *format, …);
Watcom C uses a register-based calling convention. See sections 7.4, 7.5, 10.4, and 10.5 in cuserguide.pdf in the Watcom documentation. Individual functions can be declared to use the normal, stack-based calling convention.
GCC can be made to use a register calling convention by compiling with gcc -mregparm=NNN …See the GCC documentation for details.