Beennan的內嵌彙編指導（譯）Brennan's Guide to Inline Assembly

時間 2019-12-20

標籤 beennan 內嵌彙編指導 brennan's brennan guide inline assembly 简体版

原文原文鏈接

注：寫在前面，這是一篇翻譯文章，本人的英文水平頗有限，但內嵌彙編是學習操做系統不可少的知識，本人也常去查看這方面的內容，本文是在作mit的jos實驗中的一篇關於內嵌彙編的介紹。關於經常使用的內嵌彙編（AT&T格式）的語法都有介紹，同時在篇末還列出了經常使用的一些內嵌彙編代碼的寫法。看了頗有益處。大牛就沒必要看了。固然很是歡迎對文章中的翻譯錯誤或不當之處進行指正。html

ps:這是這篇文章的原地址：http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.htmlexpress

ps:全部注都是本人另外添加的。數組

Brennan's Guide to Inline Assembly
Beennan的內嵌彙編指導

by Brennan "Bas" Underwood
做者：Brennanapp

Document version 1.1.2.2
文檔版本 1.1.2.2less

Ok. This is meant to be an introduction to inline assembly under DJGPP. DJGPP is based on GCC, so it uses the AT&T/UNIX syntax and has a somewhat unique method of inline assembly. I spent many hours figuring some of this stuff out and told Info that I hate it, many times.
這是一篇關於在DJGPP編譯器下的內嵌彙編的介紹。DJGPP基於GCC，因此它使用AT&T語法格式，而且一些獨特的方法。我花了好幾個小時指出它的特性以及我屢次提到的令我討厭的地方。
Hopefully if you already know Intel syntax, the examples will be helpful to you. I've put variable names, register names and other literals in bold type.
若是你已經瞭解Intel的彙編語法，這些例子會對你頗有幫助。我用粗體字來標識變量、寄存器以及其餘名稱。ide

The Syntax
語法

So, DJGPP uses the AT&T assembly syntax. What does that mean to you?
DJGPP使用AT&T彙編語法。這對你意味着什麼？函數

Register naming:

寄存器名稱

AT&T:  %eax
Intel: eax

Source/Destination Ordering:

操做數方向：

In AT&T syntax (which is the UNIX standard, BTW) the source is always on the left, and the destination is always on the right.So let's load ebx with the value in eax:
在AT&T語法中（順便說一句，這個在unix中是標準。），來源總在左側，目的總在右側。那麼讓我將eax中的值保存在ebx中，那語句將會象以下所示：學習

AT&T:  movl %eax, %ebx
Intel: mov ebx, eax

Constant value/immediate value format:

常量和當即數格式：

You must prefix all constant/immediate values with "$".
你必須在常量和當即數前加$符號。
Let's load eax with the address of the "C" variable booga, which is static.
將一個c語言的一個靜態變量booga保存在eax中。優化

AT&T:  movl $_booga, %eax
Intel: mov eax, _booga

Now let's load ebx with 0xd00d:
將一個十六進制數保存在ebx中。

AT&T:  movl $0xd00d, %ebx
Intel: mov ebx, d00dh

Operator size specification:

操做數大小指令：

You must suffix the instruction with one of b, w, or l to specify the width of the destination register as a byte, word or longword. If you omit this, GAS (GNU assembler) will attempt to guess. You don't want GAS to guess, and guess wrong! Don't forget it.
你必須使用b,w或者l作爲指令後綴來表示保存在目的寄存器中的是一個位，字或長字。若是省略它，GAS(GNU的編譯器）會臨時推斷。你必定不想GAS去猜它，也許會猜錯！這一點不要忘記。以下面的指令：

AT&T:  movw %ax, %bx
Intel: mov bx, ax

The equivalent forms for Intel is byte ptr, word ptr, and dword ptr, but that is for when you are...
在intel彙編語法中相匹配的格式是位使用byte ptr，字使用word ptr,長字使用dword ptr，但這是...

Referencing memory:

內存引用：

DJGPP uses 386-protected mode, so you can forget all that real-mode addressing junk, including the restrictions on which register has what default segment, which registers can be base or index pointers. Now, we just get 6 general purpose registers. (7 if you use ebp, but be sure to restore it yourself or compile with -fomit-frame-pointer.)
DJGPP使用386的保護模式，因此你能夠忘記全部關於實模式地址的問題，包括寄存器默認使用哪一個段寄存器，哪一個寄存器能夠用作基址或索引指針。如今，咱們必須使用6個通用寄存器。（固然，若是你使用ebp，那就是7個，但必須記得自已手動恢復它，或者在編譯時使用-fomit-frame-pointer選項。）

Here is the canonical format for 32-bit addressing:
下面是32位地址的常規格式：

AT&T:  immed32(basepointer,indexpointer,indexscale) 32位當即數（基址指針，索引指針，索引倍數）
Intel: [basepointer + indexpointer*indexscale + immed32]

You could think of the formula to calculate the address as:
你須要使用如下公式來計算地址：

immed32 + basepointer + indexpointer * indexscale

You don't have to use all those fields, but you do have to have at least 1 of immed32, basepointer and you MUST add the size suffix to the operator!
你可能不會用到全部的參數部分，但你至少會有一個當即數參數，使用基址指針時你必須添加指定大小的後綴。

Let's see some simple forms of memory addressing:
讓咱們來看一些簡單的關於內存地址例子：

Addressing a particular C variable:

（直接尋址）用一個規則的c變量的進行內存尋址：

AT&T:  _booga
Intel: [_booga]

Note: the underscore ("_") is how you get at static (global) C variables from assembler. This only works with global variables. Otherwise, you can use extended asm to have variables preloaded into registers for you. I address that farther down.
註釋：下劃線是編譯器翻譯後的靜態（全局）c語言變量。這種方式僅在引用全局變量時使用。不然你必須使用擴展asm來控制可變的預保存寄存器。我會在後面指出這種用法。

Addressing what a register points to:

（間接尋址）使用一個寄存器中地址值進行內存尋址：

AT&T:  (%eax)
Intel: [eax]

Addressing a variable offset by a value in a register:

（寄存器變址尋址）使用寄存器加偏移量進行內存尋址：

AT&T: _variable(%eax)
Intel: [eax + _variable]

Addressing a value in an array of integers (scaling up by 4):

使用一個整數數組進行內存尋址（以4爲步長）：

AT&T:  _array(,%eax,4)
Intel: [eax*4 + array]

You can also do offsets with the immediate value:

你也可使用當即數做爲偏移量：

C code: *(p+1) where p is a char *
對應的c代碼：*(p+1) 這裏p是一個char * 變量

AT&T:  1(%eax) where eax has the value of p（這裏eax是變量p的值）
Intel: [eax + 1]

You can do some simple math on the immediate value:

你也能夠對當即數進行簡單的算術運算：

AT&T: _struct_pointer+8

I assume you can do that with Intel format as well.
我假設你能用Intel格式作相同的事情。

Addressing a particular char in an array of 8-character records:

在一個8個大小的字符數組組成的記錄中進行尋址：

eax holds the number of the record desired. ebx has the wanted char's offset within the record.
寄存器eax中保存的是記錄號。寄存器ebx中是這個記錄中想查找的字符的偏移量。

AT&T:  _array(%ebx,%eax,8)
Intel: [ebx + eax*8 + _array]

Whew. Hopefully that covers all the addressing you'll need to do. As a note, you can put esp into the address, but only as the base register.

但願這些能覆蓋你能遇到的全部尋址方式。另外，你能夠把esp的值放在一個內存地址中，但僅限於作爲基址寄存器。

Basic inline assembly

基本內嵌彙編

The format for basic inline assembly is very simple, and much like Borland's method.
內聯彙編的語法格式是至關簡單的，並且更象Borland的方法。

asm ("statements");

Pretty simple, no? So
很是簡單，是不？

asm ("nop");
//will do nothing of course, and
//什麼也不作的空語句。
asm ("cli");
//will stop interrupts, with
//關閉中斷，
asm ("sti");
//of course enabling them. You can use __asm__ instead of asm if the keyword asm conflicts with something in your program.
//When it comes to simple stuff like this, basic inline assembly is fine. You can even push your registers onto the stack, 
//use them, and put them back.
//固然是容許中斷了。若是asm關鍵字在你的程序中衝突了，你可使用__asm__代替asm。

若是僅象上面這些同樣簡單，那內聯彙編真是好東西。你甚至能夠將寄存器入棧，而後使用它們，用完後再出棧。就象下面這樣：

asm ("pushl %eax\n\t"
     "movl $0, %eax\n\t"
     "popl %eax");

(The \n's and \t's are there so the .s file that GCC generates and hands to GAS comes out right when you've got multiple statements per asm.)
It's really meant for issuing instructions for which there is no equivalent in C and don't touch the registers.
（這裏使用的\n\t是爲了讓GAS在一段內聯彙編中使用了多條語句時準確地認出它們。）這裏真正用意是爲了讓它們和c語句不等同。而且不破壞寄存器。

But if you do touch the registers, and don't fix things at the end of your asm statement, like so:
但若是你破壞了寄存器，而且在結束時也沒有修正，就象下面：

asm ("movl %eax, %ebx");
asm ("xorl %ebx, %edx");
asm ("movl $0, _booga");

then your program will probably blow things to hell. This is because GCC hasn't been told that your asm statement clobbered ebx and edx and booga, which it might have been keeping in a register, and might plan on using later. For that, you need:

那麼你的程序可能會最到恐怖的事情。這是由於GCC沒有告訴你的彙編語句前面的ebx,edx和booga(多是保存在寄存器中)，你在後面計劃用到它。如想如此，你須要：

Extended inline assembly

擴展的內嵌彙編

The basic format of the inline assembly stays much the same, but now gets Watcom-like extensions to allow input arguments and output arguments.
內嵌彙編的基本語法格式和上面提到的很象，但須要Watcom擴展風格的輸入及輸出參數。

Here is the basic format:
下面是基本的語法格式：

asm ( "statements" : output_registers : input_registers : clobbered_registers);
asm（語句：輸出寄存器，輸入寄存器，會被破壞的寄存器）

Let's just jump straight to a nifty example, which I'll then explain:
先讓咱們直接看一段例子，稍後會作解釋：

asm ("cld\n\t"
     "rep\n\t"
     "stosl"
     : /* no output registers *//*沒有指定輸出寄存器*/
     : "c" (count), "a" (fill_value), "D" (dest)
     : "%ecx", "%edi" );

The above stores the value in fill_value count times to the pointer dest.

上面的程序段將fill_value分count次保存在目的地址處。

Let's look at this bit by bit.

讓咱們一句一句來看看。

asm ("cld\n\t"

We are clearing the direction bit of the flags register. You never know what this is going to be left at, and it costs you all of 1 or 2 cycles.

清除寄存器方向標誌。你永遠不會知道若是忘記了這句會怎麼樣，也許會花費你一兩個循環的時間。

"rep\n\t"

"stosl"

Notice that GAS requires the rep prefix to occupy a line of it's own. Notice also that stos has the l suffix to make it move longwords.

注意GAS須要rep前綴單獨佔一行。也要注意stos指令有個後綴l來指明它每次移動一個長字。

: /* no output registers */

Well, there aren't any in this function.

在這段函數中這裏什麼也沒有。

: "c" (count), "a" (fill_value), "D" (dest)

Here we load ecx with count, eax with fill_value, and edi with dest. Why make GCC do it instead of doing it ourselves? Because GCC, in its register allocating, might be able to arrange for, say, fill_value to already be in eax. If this is in a loop, it might be able to preserve eax thru the loop, and save a movl once per loop.

這裏count值被保存在ecx中，fill_value被保存在eax中，edi中的是目的地址。爲何要自已指定寄存器，而不是讓GCC來決定？由於GCC在分配寄存器時，可能會作如此安排，好比，fill_value已經在eax中了。假如這是一個循環，它應該整個循環被保留在eax中，每次循環均要保存一次。

: "%ecx", "%edi" );

And here's where we specify to GCC, "you can no longer count on the values you loaded into ecx or edi to be valid." This doesn't mean they will be reloaded for certain. This is the clobberlist.

這裏的意思是提醒GCC，「你不能期望你保存在ecx或edi中的數據依舊有效。」這不意味着它們必定被從新載入。這是一個寄存器影響列表。

Seem funky? Well, it really helps when optimizing, when GCC can know exactly what you're doing with the registers before and after. It folds your assembly code into the code it's generates (whose rules for generation look remarkably like the above) and then optimizes. It's even smart enough to know that if you tell it to put (x+1) in a register, then if you don't clobber it, and later C code refers to (x+1), and it was able to keep that register free, it will reuse the computation. Whew.

看起來讓人擔憂？好吧。當GCC能準確地知道你使用寄存器先後的事情時，在優化代碼時會有幫助。它將你的代碼放在它生成的代碼中而後再優化。編譯器足夠智能，以至於知道若是你告訴它放置一個變量值（經+1）到一個寄存器中，而後若是你不去破壞它，在後面的C代碼對這個變量（x+1）的引用中，它會保持這個寄存器，這樣就能重用計算。

Here's the list of register loading codes that you'll be likely to use:

下面是你最可能用到的寄存器對應的代碼列表：

a        eax
b        ebx
c        ecx
d        edx
S        esi
D        edi
I        constant value (0 to 31)數值
q,r      dynamically allocated register (see below)動態分配寄存器
g        eax, ebx, ecx, edx or variable in memory
A        eax and edx combined into a 64-bit integer (use long longs)長字時用eax和dex合起來表示一個64位字

Note that you can't directly refer to the byte registers (ah, al, etc.) or the word registers (ax, bx, etc.) when you're loading this way. Once you've got it in there, though, you can specify ax or whatever all you like.

注意在這種使用方法中，你不能直接引用位寄存器（ah,al,等等）或者字寄存器（ax,bx,等等）。一旦你拿到一個寄存器，你就能指定ax或者你願意的用法。

The codes have to be in quotes, and the expressions to load in have to be in parentheses.

代碼必須位於引號以內，表達式必須放在圓括號內。

When you do the clobber list, you specify the registers as above with the %. If you write to a variable, you must include "memory" as one of The Clobbered. This is in case you wrote to a variable that GCC thought it had in a register. This is the same as clobbering all registers. While I've never run into a problem with it, you might also want to add "cc" as a clobber if you change the condition codes (the bits in the flags register the jnz, je, etc. operators look at.)

在寄存器影響列表中，使用%前綴。若是你使用了一個變量，你必須在列表中包括memory。這是防止你寫了一個變量，GCC卻把它放在寄存器中。

Now, that's all fine and good for loading specific registers. But what if you specify, say, ebx, and ecx, and GCC can't arrange for the values to be in those registers without having to stash the previous values. It's possible to let GCC pick the register(s). You do this:

如今，使用指定的寄存器彷佛很好用。但，必定你指定ebx和ecx,而GCC在不隱藏之前保存的值就沒法安排這些數值。一種辦法是讓GCC來選擇寄存器。能夠象下面這樣作：

asm ("leal (%1,%1,4), %0"
     : "=r" (x)
     : "0" (x) );

The above example multiplies x by 5 really quickly (1 cycle on the Pentium). Now, we could have specified, say eax. But unless we really need a specific register (like when using rep movsl or rep stosl, which are hardcoded to use ecx, edi, and esi), why not let GCC pick an available one? So when GCC generates the output code for GAS, %0 will be replaced by the register it picked.

上面例子快速將變量x乘5倍（在Pentium上只用一個週期）。咱們能夠指定寄存器，好比eax。但只有咱們真的必須指定寄存器時才應該這樣作（就象當咱們使用rep movsl或者rep stosl這樣的語句時，由於它們規定必須使用ecx,dei和dsi），若是沒必要要，那爲何不讓gcc來選擇一個可用的寄存器呢？這樣，當GCC生成輸出代碼時，%0就會被它選擇的寄存器代替。注：lea是傳送指令，將左側值傳送到右側寄存器中。這樣就產生相似這樣的代碼：%0=%1+%1*4，這樣，就實現了x變量的乘5。

And where did "q" and "r" come from? Well, "q" causes GCC to allocate from eax, ebx, ecx, and edx. "r" lets GCC also consider esi and edi. So make sure, if you use "r" that it would be possible to use esi or edi in that instruction. If not, use "q".

那麼何時使用q和r?q會致使GCC在eax,ebx,ecx和edx這幾個寄存器中進行分配。r讓GCC決定esi和edi。若是你使用了r，那就必定會使用esi或edi這兩個寄存器。若是沒必要要，請使用q。

Now, you might wonder, how to determine how the %n tokens get allocated to the arguments. It's a straightforward first-come-first-served, left-to-right thing, mapping to the "q"'s and "r"'s. But if you want to reuse a register allocated with a "q" or "r", you use "0", "1", "2"... etc.

如今你極可能想知道%n這樣的參數是如何分配的？這裏遵循先看到先服務，從左至右的規則，將q或r指定的寄存器進行映射。若是你想重複使用經過q或r分配的寄存器，可使用0,1,2等。

You don't need to put a GCC-allocated register on the clobberlist as GCC knows that you're messing with it.

你沒必要要在影響列表中包含GCC分配的寄存器，由於GCC知道它們的使用狀況。

Now for output registers.

下面是輸出寄存器。

asm ("leal (%1,%1,4), %0"
     : "=r" (x_times_5)
     : "r" (x) );

Note the use of = to specify an output register. You just have to do it that way. If you want 1 variable to stay in 1 register for both in and out, you have to respecify the register allocated to it on the way in with the "0" type codes as mentioned above.

注意，使用=號來指定輸出寄存器。你須要作的僅僅就是象上面這樣。若是你想讓第1個變量在輸入及輸出時均保留在第一個寄存器，你必須使用0類型代碼來從新分配寄存器。

asm ("leal (%0,%0,4), %0"
     : "=r" (x)
     : "0" (x) );

注：這段代碼就經過0來指定使用的寄存器和%0是一個。

This also works, by the way:

下面代碼也完成一樣工做：

asm ("leal (%%ebx,%%ebx,4), %%ebx"
     : "=b" (x)
     : "b" (x) );

2 things here:

兩點要注意的事：

Note that we don't have to put ebx on the clobberlist, GCC knows it goes into x. Therefore, since it can know the value of ebx, it isn't considered clobbered. Notice that in extended asm, you must prefix registers with %% instead of just %. Why, you ask? Because as GCC parses along for %0's and %1's and so on, it would interpret %edx as a %e parameter, see that that's non-existent, and ignore it. Then it would bitch about finding a symbol named dx, which isn't valid because it's not prefixed with % and it's not the one you meant anyway.

注意，咱們沒必要將ebx放在影響列表中，由於GCC知道它將保存變量x。所以它知道ebx中保存有值，它就不會考慮去破壞它。注意在擴展內聯彙編中，你必須使用%%前綴來代替%前綴。爲何非要如此呢？由於GCC分析%0這類參數變量，它會在分析%edx時在%e處就中止分析，這樣會將%edx作爲%e這樣的參數變量，但它是不存在的，GCC就會忽略它。一樣GCC也會破壞找到的dx這樣的符號名稱，由於那些沒有%前綴的符號名稱是不合語法的。

Important note: If your assembly statement must execute where you put it, (i.e. must not be moved out of a loop as an optimization), put the keyword volatile after asm and before the ()'s. To be ultra-careful, use __asm__ __volatile__ (...whatever...);

重要的注意：若是你的彙編代碼必需要象你書寫的那樣來執行，（好比，不能在優化中將它從循環中移除），那麼就須要在asm關鍵字與()前放置volatile關鍵字。必定要當心，使用 __asm__ __volatile__ (..其餘代碼...);

However, I would like to point out that if your assembly's only purpose is to calculate the output registers, with no other side effects, you should leave off the volatile keyword so your statement will be processed into GCC's common subexpression elimination optimization.

然而，我要指出的是，若是你的彙編代碼目的僅僅是計算輸出寄存器，而且不m有其餘影響，你不該當放置volatile關鍵字，這樣能夠容許GCC進行代碼。

Some useful examples

一些有用例子

#define disable() __asm__ __volatile__ ("cli");

#define enable() __asm__ __volatile__ ("sti");

Of course, libc has these defined too.
固然，libc庫中也有這些定義。

#define times3(arg1, arg2) \
__asm__ ( \
  "leal (%0,%0,2),%0" \
  : "=r" (arg2) \
  : "0" (arg1) );

#define times5(arg1, arg2) \
__asm__ ( \
  "leal (%0,%0,4),%0" \
  : "=r" (arg2) \
  : "0" (arg1) );

#define times9(arg1, arg2) \
__asm__ ( \
  "leal (%0,%0,8),%0" \
  : "=r" (arg2) \
  : "0" (arg1) );

These multiply arg1 by 3, 5, or 9 and put them in arg2. You should be ok to do: times5(x,x);

上面這些代碼是將乘數arg1進行3倍，5倍或9倍乘法，而後結果放在arg2中。你應當象這樣作：times5(x,x);

as well.

#define rep_movsl(src, dest, numwords) \
__asm__ __volatile__ ( \
  "cld\n\t" \
  "rep\n\t" \
  "movsl" \
  : : "S" (src), "D" (dest), "c" (numwords) \
  : "%ecx", "%esi", "%edi" )

Helpful Hint: If you say memcpy() with a constant length parameter, GCC will inline it to a rep movsl like above. But if you need a variable length version that inlines and you're always moving dwords, there ya go.

有益的提示：若是你使用固定長度參數來調用memcpy()函數，GCC會將它內聯成象上面這樣的轉移指令。但若是你須要一個內聯的可變長度參數內存拷貝，你老是須要移動dwords，就象上面。

#define rep_stosl(value, dest, numwords) \
__asm__ __volatile__ ( \
  "cld\n\t" \
  "rep\n\t" \
  "stosl" \
  : : "a" (value), "D" (dest), "c" (numwords) \
  : "%ecx", "%edi" )

Same as above but for memset(), which doesn't get inlined no matter what (for now.)

上面的代碼和memset()函數執行一樣功能，但memset不會生成內聯代碼（到目前爲止是這樣）。

#define RDTSC(llptr) ({ \
__asm__ __volatile__ ( \
        ".byte 0x0f; .byte 0x31" \
        : "=A" (llptr) \
        : : "eax", "edx"); })

Reads the TimeStampCounter on the Pentium and puts the 64 bit result into llptr.

讀取Pentium機器上的時間戳，而後將它放在一個64位的結果變量llptr中。

注：在多核心機器上，可能使用rdtscp指令更可靠些，雖然執行週期多一些。就象下面這樣：

__inline__ uint64_t perf_counter(void)
{
  uint32_t lo, hi;
  // take time stamp counter, rdtscp does serialize by itself, and is much cheaper than using CPUID
  __asm__ __volatile__ (
      "rdtscp" : "=a"(lo), "=d"(hi)
      );
  return ((uint64_t)lo) | (((uint64_t)hi) << 32);
}

The End

寫在最後

"The End"?! Yah, I guess so.

結束了？我猜是這樣。

If you're wondering, I personally am a big fan of AT&T/UNIX syntax now. (It might have helped that I cut my teeth on SPARC assembly. Of course, that machine actually had a decent number of general registers.) It might seem weird to you at first, but it's really more logical than Intel format, and has no ambiguities.

若是你想知道，到目前爲止我我的是一個AT&T/UNIX語法的粉絲。（這種語法在我使用SPARC彙編時有幫助。固然那個機器實際上有至關多的通用寄存器。）這些語法對你來講可能有些怪，但真的比Intel格式要有邏輯得多，並且沒有岐義。

If I still haven't answered a question of yours, look in the Info pages for more information, particularly on the input/output registers. You can do some funky stuff like use "A" to allocate two registers at once for 64-bit math or "m" for static memory locations, and a bunch more that aren't really used as much as "q" and "r".

若是對你的問題我上面這些內容依舊沒有可以說清楚，能夠相關的Info Pages去看更多信息，尤爲是關於寄存器的輸入和輸出部分。你能作一些恐怖的事情，例如，使用"A"同時分配兩個寄存器來完成64位計算，或者使用"m"來定位靜態內存，或者"q"功"r"來綁定更多內容。

Alternately, mail me, and I'll see what I can do. (If you find any errors in the above, please, e-mail me and tell me about it! It's frustrating enough to learn without buggy docs!) Or heck, mail me to say "boogabooga."

或者，給我寫信，我將看看我能幫你作什麼。（若是你在上面的內容中發現錯誤，請必定要e-mail我，讓我知道！得知一個文檔沒有錯誤是使人不快的！）真見鬼，給我寫信並寫上"boogabooga." 注：最後這句話，我真不知做者在說什麼。

It's the least you can do.