gcc x64 asm 內聯彙編嘗試

時間 2019-12-07

標籤 gcc x64 asm 內聯彙編嘗試欄目 GCC 简体版

原文原文鏈接

char* getUrlFromBuffer(char *buffer) {

    char *a = "testok";

    __asm__ __volatile__("mov %0 , %%rax"::"r"(a));

}

asm volatile(assembler template : output : input : clobber);html

clobber：損壞部分，好比設置成"%ecx","%esi","%edi" 會在彙編代碼前自動保存push，並在彙編代碼後恢復pop。git

有時候咱們但願在C/C++代碼中使用嵌入式彙編，由於C中沒有對應的函數或語法可用。好比我最近在ARM上寫FIR程序時，須要對最後的結果進行飽和處理，但gcc沒有提供ssat這樣的函數，因而不得不在C代碼中嵌入彙編指令。express

1. 入門app

在C中嵌入彙編的最大問題是如何將C語言變量與指令操做數相關聯。固然，gcc都幫咱們想好了。下面是是一個簡單例子。less

asm(「fsinx %1, %0」:」=f」(result):」f」(angle));dom

這裏咱們不須要關注fsinx指令是幹啥的；只須要知道這條指令須要兩個浮點寄存器做爲操做數。做爲專職處理C語言的gcc編譯器，它是沒辦法知道fsinx這條彙編指令須要什麼樣的操做數的，這就要求程序猿告知gcc相關信息，方法就是指令後面的」=f」和」f」，表示這是兩個浮點寄存器操做數。這被稱爲操做數規則（constraint）。規則前面加上」=」表示這是一個輸出操做數，不然是輸入操做數。constraint後面括號內的是與該寄存器關聯的變量。這樣gcc就知道如何將這條嵌入式彙編語句轉成實際的彙編指令了：ide

fsinx：彙編指令名函數

%1, %0：彙編指令操做數oop

「=f」(result)：操做數%0是一個浮點寄存器，與變量result關聯（對輸出操做數，「關聯」的意思就是說gcc執行完這條彙編指令後會把寄存器%0的內容送到變量result中）優化

「f」(angle)：操做數%1是一個浮點寄存器，與變量angle關聯（對輸入操做數，「關聯」的意思是就是說gcc執行這條彙編指令前會先將變量angle的值讀取到寄存器%1中）

所以這條嵌入式彙編會轉換爲至少三條彙編指令（非優化）：

1> 將angle變量的值加載到寄存器%1

2> fsinx彙編指令，源寄存器%1，目標寄存器%0

3> 將寄存器%0的值存儲到變量result

固然，在高優化級別下上面的敘述可能不適用；好比源操做數可能原本就已經在某個浮點寄存器中了。

這裏咱們也看到constraint前加」=」符號的意義：gcc須要知道這個操做數是在執行嵌入彙編前從變量加載到寄存器，仍是在執行後從寄存器存儲到變量中。

經常使用的constraints有如下幾個（更多細節參見gcc手冊）：

m 內存操做數

r 寄存器操做數

i 當即數操做數（整數）

f 浮點寄存器操做數

F 當即數操做數（浮點）

從這個栗子也能夠看出嵌入式彙編的基本格式：

asm(「彙編指令」:」=輸出操做數規則」(關聯變量):」輸入操做數規則」(關聯變量));

輸出操做數必須爲左值；這個顯然。

2. 多個操做數，或沒有輸出操做數

若是某個指令有多個輸入或輸出操做數怎麼辦？例如arm有不少指令是三操做數指令。這個時候用逗號分隔多個規則：

asm(「add %0, %1, %2」:」=r」(sum):」r」(a), 「r」(b));

每條操做數規則按順序對應操做數%0, %1, %2。

對於沒有輸出操做數的狀況，在彙編指令後就沒有輸出規則，因而就出現兩個連續冒號，後跟輸入規則。

3. 輸入-輸出（或讀-寫）操做數

有時候一個操做數既是輸入又是輸出，好比x86下的這條指令：

add %eax, %ebx

注意指令使用AT&T格式而不是Intel格式。寄存器ebx同時做爲輸入操做數和輸出操做數。對這樣的操做數，在規則前使用」+」字符：

asm("add %1, %0" : "+r"(a) : "r"(b));

對應C語言語句a=a+b。

注意這樣的操做數不能使用」=」符號，由於gcc看到」=」符號會認爲這是一個單輸出操做數，因而在將嵌入彙編轉換爲真正彙編的時候就不會預先將變量a的值加載到寄存器%0中。

另外一個辦法是將讀-寫操做數在邏輯上拆分爲兩個操做數：

asm(「add %2, %0」 : 「=r」(a) : 「0」(a), 「r」(b));

對「邏輯」輸入操做數1指定數字規則」0」，表示這個邏輯操做數佔用和操做數0同樣的「位置」（佔用同一個寄存器）。這種方法的特色是能夠將兩個「邏輯」操做數關聯到兩個不一樣的C語言變量上：

asm("add %2, %0" : "=r"(c) : "0"(a), "r"(b));

對應於C程序語句c=a+b。

數字規則僅能用於輸入操做數，且必須引用到輸出操做數。拿上例來講，數字規則」0」位於輸入規則段，且引用到輸出操做數0，該數字規則自身佔用操做數計數1。

這裏要注意，經過同名C語言變量是沒法保證兩個操做數佔用同一「位置」的。好比下面這樣的寫法是不行的:

（錯誤寫法）asm(「add %2, %0」:」=r」(a):」r」(a), 「r」(b));

4. 指定寄存器

有時候咱們須要在指令中使用指定的寄存器；典型的栗子是系統調用，必須將系統調用碼和參數放在指定寄存器中。爲了達到這個目的，咱們要在聲明變量時使用擴展語法：

asm("add %1, %0" : "+r"(a) : "r"(b)); // statement 3

注意只有在執行彙編指令時能肯定a在eax中，b在ebx中，其餘時候a和b的存放位置是不可知的。

另外，在這麼用的時候要注意，防止statement 2在執行時覆蓋了eax。例如statement 2改爲下面這句：

函數調用約定會將func()的返回值放在eax裏，因而破壞了statement 1對a的賦值。這個時候能夠先用一條語句將func返回值放在臨時變量裏：

int t = func();

asm("add %1, %0" : "+r"(a) : "r"(b)); // statement 3

5. 隱式改變寄存器

有的彙編指令會隱含修改一些不在指令操做數中的寄存器，爲了讓gcc知道這個狀況，將隱式改變寄存器規則列在輸入規則以後。下面是VAX機上的栗子：

asm volatile(「movc3 %0,%1,%2」

: /* no outputs */

:」g」(from),」g」(to),」g」(count)

:」r0」,」r1」,」r2」,」r3」,」r4」,」r5」);

（movc3是一條字符塊移動（Move characters）指令）

這裏要注意的是輸入/輸出規則中列出的寄存器不能和隱含改變規則中的寄存器有交叉。好比在上面的栗子裏，規則「g」中就不能包含r0-r5。以指定寄存器語法聲明的變量，所佔用的寄存器也不能和隱含改變規則有交叉。這個應該好理解：隱含改變規則是告訴gcc有額外的寄存器須要照顧，天然不能和輸入/輸出寄存器有交集。

另外，若是你在指令裏顯式指定某個寄存器，那麼這個寄存器也必須列在隱式改變規則之中（有點繞了哈）。上面咱們說過gcc自身是不瞭解彙編指令的，因此你在指令中顯式指定的寄存器，對gcc來講是隱式的，所以必須包含在隱式規則之中。另外，指令中的顯式寄存器前須要一個額外的%，好比%%eax。

6. volatile

asm volatile通知gcc你的彙編指令有side effect，千萬不要給優化沒了，好比上面的栗子。

若是你的指令只是作些計算，那麼不須要volatile，讓gcc能夠優化它；除此之外，無腦給每一個asm加上volatile或者是個好辦法。

[做者：byeyear 首發：cnblogs Email：east3@163.com 轉載請註明]

6.45.2 Extended Asm - Assembler Instructions with C Expression Operands

With extended asm you can read and write C variables from assembler and perform jumps from assembler code to C labels. Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:

asm [volatile] ( AssemblerTemplate 
                 : OutputOperands 
                 [ : InputOperands
                 [ : Clobbers ] ])

asm [volatile] goto ( AssemblerTemplate 
                      : 
                      : InputOperands
                      : Clobbers
                      : GotoLabels)

The asm keyword is a GNU extension. When writing code that can be compiled with -ansi and the various -std options, use __asm__ instead of asm (see Alternate Keywords).

Qualifiers

volatile

The typical use of extended asm statements is to manipulate input values to produce output values. However, your asm statements may also produce side effects. If so, you may need to use the volatile qualifier to disable certain optimizations. See Volatile.

goto

This qualifier informs the compiler that the asm statement may perform a jump to one of the labels listed in the GotoLabels. See GotoLabels.

Parameters

AssemblerTemplate

This is a literal string that is the template for the assembler code. It is a combination of fixed text and tokens that refer to the input, output, and goto parameters. See AssemblerTemplate.

OutputOperands

A comma-separated list of the C variables modified by the instructions in the AssemblerTemplate. An empty list is permitted. See OutputOperands.

InputOperands

A comma-separated list of C expressions read by the instructions in the AssemblerTemplate. An empty list is permitted. See InputOperands.

Clobbers

A comma-separated list of registers or other values changed by the AssemblerTemplate, beyond those listed as outputs. An empty list is permitted. See Clobbers and Scratch Registers.

GotoLabels

When you are using the goto form of asm, this section contains the list of all C labels to which the code in the AssemblerTemplate may jump. See GotoLabels.

asm statements may not perform jumps into other asm statements, only to the listed GotoLabels. GCC’s optimizers do not know about other jumps; therefore they cannot take account of them when deciding how to optimize.

The total number of input + output + goto operands is limited to 30.

Remarks

The asm statement allows you to include assembly instructions directly within C code. This may help you to maximize performance in time-sensitive code or to access assembly instructions that are not readily available to C programs.

Note that extended asm statements must be inside a function. Only basic asm may be outside functions (see Basic Asm). Functions declared with the naked attribute also require basic asm (see Function Attributes).

While the uses of asm are many and varied, it may help to think of an asm statement as a series of low-level instructions that convert input parameters to output parameters. So a simple (if not particularly useful) example for i386 using asm might look like this:

int src = 1;
int dst;   

asm ("mov %1, %0\n\t"
    "add $1, %0"
    : "=r" (dst) 
    : "r" (src));

printf("%d\n", dst);

This code copies src to dst and add 1 to dst.

6.45.2.1 Volatile

GCC’s optimizers sometimes discard asm statements if they determine there is no need for the output variables. Also, the optimizers may move code out of loops if they believe that the code will always return the same result (i.e. none of its input values change between calls). Using the volatile qualifier disables these optimizations. asm statements that have no output operands, including asm goto statements, are implicitly volatile.

This i386 code demonstrates a case that does not use (or require) the volatile qualifier. If it is performing assertion checking, this code uses asm to perform the validation. Otherwise, dwRes is unreferenced by any code. As a result, the optimizers can discard the asm statement, which in turn removes the need for the entire DoCheck routine. By omitting the volatile qualifier when it isn’t needed you allow the optimizers to produce the most efficient code possible.

void DoCheck(uint32_t dwSomeValue)
{
   uint32_t dwRes;

   // Assumes dwSomeValue is not zero.
   asm ("bsfl %1,%0"
     : "=r" (dwRes)
     : "r" (dwSomeValue)
     : "cc");

   assert(dwRes > 3);
}

The next example shows a case where the optimizers can recognize that the input (dwSomeValue) never changes during the execution of the function and can therefore move the asm outside the loop to produce more efficient code. Again, using volatile disables this type of optimization.

void do_print(uint32_t dwSomeValue)
{
   uint32_t dwRes;

   for (uint32_t x=0; x < 5; x++)
   {
      // Assumes dwSomeValue is not zero.
      asm ("bsfl %1,%0"
        : "=r" (dwRes)
        : "r" (dwSomeValue)
        : "cc");

      printf("%u: %u %u\n", x, dwSomeValue, dwRes);
   }
}

The following example demonstrates a case where you need to use the volatile qualifier. It uses the x86 rdtsc instruction, which reads the computer’s time-stamp counter. Without the volatile qualifier, the optimizers might assume that the asm block will always return the same value and therefore optimize away the second call.

uint64_t msr;

asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
        "shl $32, %%rdx\n\t"  // Shift the upper bits left.
        "or %%rdx, %0"        // 'Or' in the lower bits.
        : "=a" (msr)
        : 
        : "rdx");

printf("msr: %llx\n", msr);

// Do other work...

// Reprint the timestamp
asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
        "shl $32, %%rdx\n\t"  // Shift the upper bits left.
        "or %%rdx, %0"        // 'Or' in the lower bits.
        : "=a" (msr)
        : 
        : "rdx");

printf("msr: %llx\n", msr);

GCC’s optimizers do not treat this code like the non-volatile code in the earlier examples. They do not move it out of loops or omit it on the assumption that the result from a previous call is still valid.

Note that the compiler can move even volatile asm instructions relative to other code, including across jump instructions. For example, on many targets there is a system register that controls the rounding mode of floating-point operations. Setting it with a volatile asm, as in the following PowerPC example, does not work reliably.

asm volatile("mtfsf 255, %0" : : "f" (fpenv));
sum = x + y;

The compiler may move the addition back before the volatile asm. To make it work as expected, add an artificial dependency to the asm by referencing a variable in the subsequent code, for example:

asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
sum = x + y;

Under certain circumstances, GCC may duplicate (or remove duplicates of) your assembly code when optimizing. This can lead to unexpected duplicate symbol errors during compilation if your asm code defines symbols or labels. Using ‘%=’ (see AssemblerTemplate) may help resolve this problem.

6.45.2.2 Assembler Template

An assembler template is a literal string containing assembler instructions. The compiler replaces tokens in the template that refer to inputs, outputs, and goto labels, and then outputs the resulting string to the assembler. The string can contain any instructions recognized by the assembler, including directives. GCC does not parse the assembler instructions themselves and does not know what they mean or even whether they are valid assembler input. However, it does count the statements (see Size of an asm).

You may place multiple assembler instructions together in a single asm string, separated by the characters normally used in assembly code for the system. A combination that works in most places is a newline to break the line, plus a tab character to move to the instruction field (written as ‘\n\t’). Some assemblers allow semicolons as a line separator. However, note that some assembler dialects use semicolons to start a comment.

Do not expect a sequence of asm statements to remain perfectly consecutive after compilation, even when you are using the volatile qualifier. If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement.

Accessing data from C programs without using input/output operands (such as by using global symbols directly from the assembler template) may not work as expected. Similarly, calling functions directly from an assembler template requires a detailed understanding of the target assembler and ABI.

Since GCC does not parse the assembler template, it has no visibility of any symbols it references. This may result in GCC discarding those symbols as unreferenced unless they are also listed as input, output, or goto operands.

Special format strings

In addition to the tokens described by the input, output, and goto operands, these tokens have special meanings in the assembler template:

‘%%’

Outputs a single ‘%’ into the assembler code.

‘%=’

Outputs a number that is unique to each instance of the asm statement in the entire compilation. This option is useful when creating local labels and referring to them multiple times in a single template that generates multiple assembler instructions.

‘%{’

‘%|’

‘%}’

Outputs ‘{’, ‘|’, and ‘}’ characters (respectively) into the assembler code. When unescaped, these characters have special meaning to indicate multiple assembler dialects, as described below.

Multiple assembler dialects in `asm` templates

On targets such as x86, GCC supports multiple assembler dialects. The -masm option controls which dialect GCC uses as its default for inline assembler. The target-specific documentation for the -masm option contains the list of supported dialects, as well as the default dialect if the option is not specified. This information may be important to understand, since assembler code that works correctly when compiled using one dialect will likely fail if compiled using another. See x86 Options.

If your code needs to support multiple assembler dialects (for example, if you are writing public headers that need to support a variety of compilation options), use constructs of this form:

{ dialect0 | dialect1 | dialect2... }

This construct outputs dialect0 when using dialect #0 to compile the code, dialect1 for dialect #1, etc. If there are fewer alternatives within the braces than the number of dialects the compiler supports, the construct outputs nothing.

For example, if an x86 compiler supports two dialects (‘att’, ‘intel’), an assembler template such as this:

"bt{l %[Offset],%[Base] | %[Base],%[Offset]}; jc %l2"

is equivalent to one of

"btl %[Offset],%[Base] ; jc %l2"   /* att dialect */
"bt %[Base],%[Offset]; jc %l2"     /* intel dialect */

Using that same compiler, this code:

"xchg{l}\t{%%}ebx, %1"

corresponds to either

"xchgl\t%%ebx, %1"                 /* att dialect */
"xchg\tebx, %1"                    /* intel dialect */

There is no support for nesting dialect alternatives.

6.45.2.3 Output Operands

An asm statement has zero or more output operands indicating the names of C variables modified by the assembler code.

In this i386 example, old (referred to in the template string as %0) and *Base (as %1) are outputs and Offset (%2) is an input:

bool old;

__asm__ ("btsl %2,%1\n\t" // Turn on zero-based bit #Offset in Base.
         "sbb %0,%0"      // Use the CF to calculate old.
   : "=r" (old), "+rm" (*Base)
   : "Ir" (Offset)
   : "cc");

return old;

Operands are separated by commas. Each operand has this format:

[ [asmSymbolicName] ] constraint (cvariablename)

asmSymbolicName

Specifies a symbolic name for the operand. Reference the name in the assembler template by enclosing it in square brackets (i.e. ‘%[Value]’). The scope of the name is the asm statement that contains the definition. Any valid C variable name is acceptable, including names already defined in the surrounding code. No two operands within the same asm statement can use the same symbolic name.

When not using an asmSymbolicName, use the (zero-based) position of the operand in the list of operands in the assembler template. For example if there are three output operands, use ‘%0’ in the template to refer to the first, ‘%1’ for the second, and ‘%2’ for the third.

constraint

A string constant specifying constraints on the placement of the operand; See Constraints, for details.

Output constraints must begin with either ‘=’ (a variable overwriting an existing value) or ‘+’ (when reading and writing). When using ‘=’, do not assume the location contains the existing value on entry to the asm, except when the operand is tied to an input; see Input Operands.

After the prefix, there must be one or more additional constraints (see Constraints) that describe where the value resides. Common constraints include ‘r’ for register and ‘m’ for memory. When you list more than one possible location (for example, "=rm"), the compiler chooses the most efficient one based on the current context. If you list as many alternates as the asm statement allows, you permit the optimizers to produce the best possible code. If you must use a specific register, but your Machine Constraints do not provide sufficient control to select the specific register you want, local register variables may provide a solution (see Local Register Variables).

cvariablename

Specifies a C lvalue expression to hold the output, typically a variable name. The enclosing parentheses are a required part of the syntax.

When the compiler selects the registers to use to represent the output operands, it does not use any of the clobbered registers (see Clobbers and Scratch Registers).

Output operand expressions must be lvalues. The compiler cannot check whether the operands have data types that are reasonable for the instruction being executed. For output expressions that are not directly addressable (for example a bit-field), the constraint must allow a register. In that case, GCC uses the register as the output of the asm, and then stores that register into the output.

Operands using the ‘+’ constraint modifier count as two operands (that is, both as input and output) towards the total maximum of 30 operands per asm statement.

Use the ‘&’ constraint modifier (see Modifiers) on all output operands that must not overlap an input. Otherwise, GCC may allocate the output operand in the same register as an unrelated input operand, on the assumption that the assembler code consumes its inputs before producing outputs. This assumption may be false if the assembler code actually consists of more than one instruction.

The same problem can occur if one output parameter (a) allows a register constraint and another output parameter (b) allows a memory constraint. The code generated by GCC to access the memory address in b can contain registers which might be shared by a, and GCC considers those registers to be inputs to the asm. As above, GCC assumes that such input registers are consumed before any outputs are written. This assumption may result in incorrect behavior if the asm writes to a before using b. Combining the ‘&’ modifier with the register constraint on a ensures that modifying a does not affect the address referenced by b. Otherwise, the location of b is undefined if a is modified before using b.

asm supports operand modifiers on operands (for example ‘%k2’ instead of simply ‘%2’). Typically these qualifiers are hardware dependent. The list of supported modifiers for x86 is found at x86 Operand modifiers.

If the C code that follows the asm makes no use of any of the output operands, use volatile for the asm statement to prevent the optimizers from discarding the asm statement as unneeded (see Volatile).

This code makes no use of the optional asmSymbolicName. Therefore it references the first output operand as %0 (were there a second, it would be %1, etc). The number of the first input operand is one greater than that of the last output operand. In this i386 example, that makes Mask referenced as %1:

uint32_t Mask = 1234;
uint32_t Index;

  asm ("bsfl %1, %0"
     : "=r" (Index)
     : "r" (Mask)
     : "cc");

That code overwrites the variable Index (‘=’), placing the value in a register (‘r’). Using the generic ‘r’ constraint instead of a constraint for a specific register allows the compiler to pick the register to use, which can result in more efficient code. This may not be possible if an assembler instruction requires a specific register.

The following i386 example uses the asmSymbolicName syntax. It produces the same result as the code above, but some may consider it more readable or more maintainable since reordering index numbers is not necessary when adding or removing operands. The names aIndex and aMask are only used in this example to emphasize which names get used where. It is acceptable to reuse the names Index and Mask.

uint32_t Mask = 1234;
uint32_t Index;

  asm ("bsfl %[aMask], %[aIndex]"
     : [aIndex] "=r" (Index)
     : [aMask] "r" (Mask)
     : "cc");

Here are some more examples of output operands.

uint32_t c = 1;
uint32_t d;
uint32_t *e = &c;

asm ("mov %[e], %[d]"
   : [d] "=rm" (d)
   : [e] "rm" (*e));

Here, d may either be in a register or in memory. Since the compiler might already have the current value of the uint32_t location pointed to by e in a register, you can enable it to choose the best location for d by specifying both constraints.

6.45.2.4 Flag Output Operands

Some targets have a special register that holds the 「flags」 for the result of an operation or comparison. Normally, the contents of that register are either unmodifed by the asm, or the asm is considered to clobber the contents.

On some targets, a special form of output operand exists by which conditions in the flags register may be outputs of the asm. The set of conditions supported are target specific, but the general rule is that the output variable must be a scalar integer, and the value is boolean. When supported, the target defines the preprocessor symbol __GCC_ASM_FLAG_OUTPUTS__.

Because of the special nature of the flag output operands, the constraint may not include alternatives.

Most often, the target has only one flags register, and thus is an implied operand of many instructions. In this case, the operand should not be referenced within the assembler template via %0 etc, as there’s no corresponding text in the assembly language.

x86 family

The flag output constraints for the x86 family are of the form ‘=@cccond’ where cond is one of the standard conditions defined in the ISA manual for jcc or setcc.

a

「above」 or unsigned greater than

ae

「above or equal」 or unsigned greater than or equal

b

「below」 or unsigned less than

be

「below or equal」 or unsigned less than or equal

c

carry flag set

e

z

「equal」 or zero flag set

g

signed greater than

ge

signed greater than or equal

l

signed less than

le

signed less than or equal

o

overflow flag set

p

parity flag set

s

sign flag set

na

nae

nb

nbe

nc

ne

ng

nge

nl

nle

no

np

ns

nz

「not」 flag, or inverted versions of those above

6.45.2.5 Input Operands

Input operands make values from C variables and expressions available to the assembly code.

Operands are separated by commas. Each operand has this format:

[ [asmSymbolicName] ] constraint (cexpression)

asmSymbolicName

When not using an asmSymbolicName, use the (zero-based) position of the operand in the list of operands in the assembler template. For example if there are two output operands and three inputs, use ‘%2’ in the template to refer to the first input operand, ‘%3’ for the second, and ‘%4’ for the third.

constraint

A string constant specifying constraints on the placement of the operand; See Constraints, for details.

Input constraint strings may not begin with either ‘=’ or ‘+’. When you list more than one possible location (for example, ‘"irm"’), the compiler chooses the most efficient one based on the current context. If you must use a specific register, but your Machine Constraints do not provide sufficient control to select the specific register you want, local register variables may provide a solution (see Local Register Variables).

Input constraints can also be digits (for example, "0"). This indicates that the specified input must be in the same place as the output constraint at the (zero-based) index in the output constraint list. When using asmSymbolicName syntax for the output operands, you may use these names (enclosed in brackets ‘[]’) instead of digits.

cexpression

This is the C variable or expression being passed to the asm statement as input. The enclosing parentheses are a required part of the syntax.

When the compiler selects the registers to use to represent the input operands, it does not use any of the clobbered registers (see Clobbers and Scratch Registers).

If there are no output operands but there are input operands, place two consecutive colons where the output operands would go:

__asm__ ("some instructions"
   : /* No outputs. */
   : "r" (Offset / 8));

Warning: Do not modify the contents of input-only operands (except for inputs tied to outputs). The compiler assumes that on exit from the asm statement these operands contain the same values as they had before executing the statement. It is not possible to use clobbers to inform the compiler that the values in these inputs are changing. One common work-around is to tie the changing input variable to an output variable that never gets used. Note, however, that if the code that follows the asm statement makes no use of any of the output operands, the GCC optimizers may discard the asm statement as unneeded (see Volatile).

In this example using the fictitious combine instruction, the constraint "0" for input operand 1 says that it must occupy the same location as output operand 0. Only input operands may use numbers in constraints, and they must each refer to an output operand. Only a number (or the symbolic assembler name) in the constraint can guarantee that one operand is in the same place as another. The mere fact that foo is the value of both operands is not enough to guarantee that they are in the same place in the generated assembler code.

asm ("combine %2, %0" 
   : "=r" (foo) 
   : "0" (foo), "g" (bar));

Here is an example using symbolic names.

asm ("cmoveq %1, %2, %[result]" 
   : [result] "=r"(result) 
   : "r" (test), "r" (new), "[result]" (old));

6.45.2.6 Clobbers and Scratch Registers

While the compiler is aware of changes to entries listed in the output operands, the inline asm code may modify more than just the outputs. For example, calculations may require additional registers, or the processor may overwrite a register as a side effect of a particular assembler instruction. In order to inform the compiler of these changes, list them in the clobber list. Clobber list items are either register names or the special clobbers (listed below). Each clobber list item is a string constant enclosed in double quotes and separated by commas.

Clobber descriptions may not in any way overlap with an input or output operand. For example, you may not have an operand describing a register class with one member when listing that register in the clobber list. Variables declared to live in specific registers (see Explicit Register Variables) and used as asm input or output operands must have no part mentioned in the clobber description. In particular, there is no way to specify that input operands get modified without also specifying them as output operands.

When the compiler selects which registers to use to represent input and output operands, it does not use any of the clobbered registers. As a result, clobbered registers are available for any use in the assembler code.

Here is a realistic example for the VAX showing the use of clobbered registers:

asm volatile ("movc3 %0, %1, %2"
                   : /* No outputs. */
                   : "g" (from), "g" (to), "g" (count)
                   : "r0", "r1", "r2", "r3", "r4", "r5", "memory");

Also, there are two special clobber arguments:

"cc"

The "cc" clobber indicates that the assembler code modifies the flags register. On some machines, GCC represents the condition codes as a specific hardware register; "cc" serves to name this register. On other machines, condition code handling is different, and specifying "cc" has no effect. But it is valid no matter what the target.

"memory"

The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.

Note that this clobber does not prevent the processor from doing speculative reads past the asm statement. To prevent that, you need processor-specific fence instructions.

Flushing registers to memory has performance implications and may be an issue for time-sensitive code. You can provide better information to GCC to avoid this, as shown in the following examples. At a minimum, aliasing rules allow GCC to know what memory doesn’t need to be flushed.

Here is a fictitious sum of squares instruction, that takes two pointers to floating point values in memory and produces a floating point register output. Notice that x, and y both appear twice in the asm parameters, once to specify memory accessed, and once to specify a base register used by the asm. You won’t normally be wasting a register by doing this as GCC can use the same register for both purposes. However, it would be foolish to use both %1 and %3 for x in this asm and expect them to be the same. In fact, %3 may well not be a register. It might be a symbolic memory reference to the object pointed to by x.

asm ("sumsq %0, %1, %2"
     : "+f" (result)
     : "r" (x), "r" (y), "m" (*x), "m" (*y));

Here is a fictitious *z++ = *x++ * *y++ instruction. Notice that the x, y and z pointer registers must be specified as input/output because the asm modifies them.

asm ("vecmul %0, %1, %2"
     : "+r" (z), "+r" (x), "+r" (y), "=m" (*z)
     : "m" (*x), "m" (*y));

An x86 example where the string memory argument is of unknown length.

asm("repne scasb"
    : "=c" (count), "+D" (p)
    : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));

If you know the above will only be reading a ten byte array then you could instead use a memory input like: "m" (*(const char (*)[10]) p).

Here is an example of a PowerPC vector scale implemented in assembly, complete with vector and condition code clobbers, and some initialized offset registers that are unchanged by the asm.

void
dscal (size_t n, double *x, double alpha)
{
  asm ("/* lots of asm here */"
       : "+m" (*(double (*)[n]) x), "+&r" (n), "+b" (x)
       : "d" (alpha), "b" (32), "b" (48), "b" (64),
         "b" (80), "b" (96), "b" (112)
       : "cr0",
         "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39",
         "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47");
}

Rather than allocating fixed registers via clobbers to provide scratch registers for an asm statement, an alternative is to define a variable and make it an early-clobber output as with a2 and a3 in the example below. This gives the compiler register allocator more freedom. You can also define a variable and make it an output tied to an input as with a0 and a1, tied respectively to ap and lda. Of course, with tied outputs your asm can’t use the input value after modifying the output register since they are one and the same register. What’s more, if you omit the early-clobber on the output, it is possible that GCC might allocate the same register to another of the inputs if GCC could prove they had the same value on entry to the asm. This is why a1 has an early-clobber. Its tied input, lda might conceivably be known to have the value 16 and without an early-clobber share the same register as %11. On the other hand, ap can’t be the same as any of the other inputs, so an early-clobber on a0 is not needed. It is also not desirable in this case. An early-clobber on a0 would cause GCC to allocate a separate register for the "m" (*(const double (*)[]) ap) input. Note that tying an input to an output is the way to set up an initialized temporary register modified by an asm statement. An input not tied to an output is assumed by GCC to be unchanged, for example "b" (16) below sets up %11 to 16, and GCC might use that register in following code if the value 16 happened to be needed. You can even use a normal asm output for a scratch if all inputs that might share the same register are consumed before the scratch is used. The VSX registers clobbered by the asm statement could have used this technique except for GCC’s limit on the number of asm parameters.

static void
dgemv_kernel_4x4 (long n, const double *ap, long lda,
                  const double *x, double *y, double alpha)
{
  double *a0;
  double *a1;
  double *a2;
  double *a3;

  __asm__
    (
     /* lots of asm here */
     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
     "#a0=%3 a1=%4 a2=%5 a3=%6"
     :
       "+m" (*(double (*)[n]) y),
       "+&r" (n),	// 1
       "+b" (y),	// 2
       "=b" (a0),	// 3
       "=&b" (a1),	// 4
       "=&b" (a2),	// 5
       "=&b" (a3)	// 6
     :
       "m" (*(const double (*)[n]) x),
       "m" (*(const double (*)[]) ap),
       "d" (alpha),	// 9
       "r" (x),		// 10
       "b" (16),	// 11
       "3" (ap),	// 12
       "4" (lda)	// 13
     :
       "cr0",
       "vs32","vs33","vs34","vs35","vs36","vs37",
       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
     );
}

6.45.2.7 Goto Labels

asm goto allows assembly code to jump to one or more C labels. The GotoLabels section in an asm goto statement contains a comma-separated list of all C labels to which the assembler code may jump. GCC assumes that asm execution falls through to the next statement (if this is not the case, consider using the __builtin_unreachable intrinsic after the asm statement). Optimization of asm goto may be improved by using the hot and cold label attributes (see Label Attributes).

An asm goto statement cannot have outputs. This is due to an internal restriction of the compiler: control transfer instructions cannot have outputs. If the assembler code does modify anything, use the "memory" clobber to force the optimizers to flush all register values to memory and reload them if necessary after the asm statement.

Also note that an asm goto statement is always implicitly considered volatile.

To reference a label in the assembler template, prefix it with ‘%l’ (lowercase ‘L’) followed by its (zero-based) position in GotoLabels plus the number of input operands. For example, if the asm has three inputs and references two labels, refer to the first label as ‘%l3’ and the second as ‘%l4’).

Alternately, you can reference labels using the actual C label name enclosed in brackets. For example, to reference a label named carry, you can use ‘%l[carry]’. The label must still be listed in the GotoLabels section when using this approach.

Here is an example of asm goto for i386:

asm goto (
    "btl %1, %0\n\t"
    "jc %l2"
    : /* No outputs. */
    : "r" (p1), "r" (p2) 
    : "cc" 
    : carry);

return 0;

carry:
return 1;

The following example shows an asm goto that uses a memory clobber.

int frob(int x)
{
  int y;
  asm goto ("frob %%r5, %1; jc %l[error]; mov (%2), %%r5"
            : /* No outputs. */
            : "r"(x), "r"(&y)
            : "r5", "memory" 
            : error);
  return y;
error:
  return -1;
}

6.45.2.8 x86 Operand Modifiers

References to input, output, and goto operands in the assembler template of extended asm statements can use modifiers to affect the way the operands are formatted in the code output to the assembler. For example, the following code uses the ‘h’ and ‘b’ modifiers for x86:

uint16_t  num;
asm volatile ("xchg %h0, %b0" : "+a" (num) );

These modifiers generate this assembler code:

xchg %ah, %al

The rest of this discussion uses the following code for illustrative purposes.

int main()
{
   int iInt = 1;

top:

   asm volatile goto ("some assembler instructions here"
   : /* No outputs. */
   : "q" (iInt), "X" (sizeof(unsigned char) + 1), "i" (42)
   : /* No clobbers. */
   : top);
}

With no modifiers, this is what the output from the operands would be for the ‘att’ and ‘intel’ dialects of assembler:

Operand	‘att’	‘intel’
`%0`	`%eax`	`eax`
`%1`	`$2`	`2`
`%3`	`$.L3`	`OFFSET FLAT:.L3`

The table below shows the list of supported modifiers and their effects.

Modifier	Description	Operand	‘att’	‘intel’
`a`	Print an absolute memory reference.	`%A0`	`*%rax`	`rax`
`b`	Print the QImode name of the register.	`%b0`	`%al`	`al`
`c`	Require a constant operand and print the constant expression with no punctuation.	`%c1`	`2`	`2`
`E`	Print the address in Double Integer (DImode) mode (8 bytes) when the target is 64-bit. Otherwise mode is unspecified (VOIDmode).	`%E1`	`%(rax)`	`[rax]`
`h`	Print the QImode name for a 「high」 register.	`%h0`	`%ah`	`ah`
`H`	Add 8 bytes to an offsettable memory reference. Useful when accessing the high 8 bytes of SSE values. For a memref in (%rax), it generates	`%H0`	`8(%rax)`	`8[rax]`
`k`	Print the SImode name of the register.	`%k0`	`%eax`	`eax`
`l`	Print the label name with no punctuation.	`%l3`	`.L3`	`.L3`
`p`	Print raw symbol name (without syntax-specific prefixes).	`%p2`	`42`	`42`
`P`	If used for a function, print the PLT suffix and generate PIC code. For example, emit `foo@PLT` instead of ’foo’ for the function foo(). If used for a constant, drop all syntax-specific prefixes and issue the bare constant. See `p` above.
`q`	Print the DImode name of the register.	`%q0`	`%rax`	`rax`
`w`	Print the HImode name of the register.	`%w0`	`%ax`	`ax`
`z`	Print the opcode suffix for the size of the current integer operand (one of `b`/`w`/`l`/`q`).	`%z0`	`l`

V is a special modifier which prints the name of the full integer register without %.

6.45.2.9 x86 Floating-Point `asm` Operands

On x86 targets, there are several rules on the usage of stack-like registers in the operands of an asm. These rules apply only to the operands that are stack-like registers:

Given a set of input registers that die in an asm, it is necessary to know which are implicitly popped by the asm, and which must be explicitly popped by GCC.
An input register that is implicitly popped by the asm must be explicitly clobbered, unless it is constrained to match an output operand.
For any input register that is implicitly popped by an asm, it is necessary to know how to adjust the stack to compensate for the pop. If any non-popped input is closer to the top of the reg-stack than the implicitly popped register, it would not be possible to know what the stack looked like—it’s not clear how the rest of the stack 「slides up」.
All implicitly popped input registers must be closer to the top of the reg-stack than any input that is not implicitly popped.

It is possible that if an input dies in an asm, the compiler might use the input register for an output reload. Consider this example:
```
asm ("foo" : "=t" (a) : "f" (b));
```
This code says that input b is not popped by the asm, and that the asm pushes a result onto the reg-stack, i.e., the stack is one deeper after the asm than it was before. But, it is possible that reload may think that it can use the same register for both the input and the output.

To prevent this from happening, if any input operand uses the ‘f’ constraint, all output register constraints must use the ‘&’ early-clobber modifier.

The example above is correctly written as:
```
asm ("foo" : "=&t" (a) : "f" (b));
```
Some operands need to be in particular places on the stack. All output operands fall in this category—GCC has no other way to know which registers the outputs appear in unless you indicate this in the constraints.
Output operands must specifically indicate which register an output appears in after an asm. ‘=f’ is not allowed: the operand constraints must select a class with a single register.
Output operands may not be 「inserted」 between existing stack registers. Since no 387 opcode uses a read/write operand, all output operands are dead before the asm, and are pushed by the asm. It makes no sense to push anywhere but the top of the reg-stack.
Output operands must start at the top of the reg-stack: output operands may not 「skip」 a register.
Some asm statements may need extra stack space for internal calculations. This can be guaranteed by clobbering stack registers unrelated to the inputs and outputs.

This asm takes one input, which is internally popped, and produces two outputs.

asm ("fsincos" : "=t" (cos), "=u" (sin) : "0" (inp));

This asm takes two inputs, which are popped by the fyl2xp1 opcode, and replaces them with one output. The st(1) clobber is necessary for the compiler to know that fyl2xp1 pops both inputs.

asm ("fyl2xp1" : "=t" (result) : "0" (x), "u" (y) : "st(1)");

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。