chapter 3

3.1 A historical perspectivesass

3.2 Program encodingsapp

gcc -Og -o p p1.c p2.c -Og instructs the compiler to
apply a level of optimization that yields machine code
that follows the overall structure of the original C
code.Invoking higher levels of optimization can genera-
te code that is so heavily transformed that the rela-
tionship between the generated machine code and the
orignial source code is difficult to understand.In
practice,higher levels of optimaization(e.g.specified
with the option -O1 or O2) are considered a better
choice in terms of the resulting program performance.less

3.2.1 machine-level codedom

first,the format and behavior of a machine-level pro-
gram is defined by the "instruction set architecture",
or "ISA".xss

Second,the memory addresses used by a machine-level
program are virtual addresses. The actual implemen-
tation of the memory system invoves a combination of
multiple hardware memories and operating system soft-
ware.ide

Parts of the processor state are visible:oop

+ The "program counter"( PC,and called %rip in x86-64)
indicates the address in memory of the next instruc-
tion to be executed.
+ The "integer register file" contains 16 named loca-
tions storing 64-bit values. can hold addresses( c
pointer ) or integer data.
+ The "condition code registers" hold status informa-
tion about the most recently executed arithmetic or
logical instruction. used to implement conditional
changes in the control or data flow, such as is
required to implement if and while statements.
+ A set of "vector registers" can each hold one or
more integer or floating-point values.ui

Machine code views the memory as simply a large byte-
addressable array. The program memory contains the
executable machine code for the program,some infor-
mation required by the operation system,a run-time
stack for managing procedure calls and returns,and
blocks of memroy allocated by the user.

As mentioned earlier,the program memory is addressed
using virtual addresses. In current implementations of
these machine ,the upper 16 bits must be set of zero,
end so an address can potentially specify a byte over
a range of 2^48,or 64 terabytes.The operating system
manages this virtual address space,translating virtual
addresss into the physical adddresses of values in the
actual processor memory.this

3.2.2 Code Exampleslua

gcc -Og -S mstore.c //generate an assembly file .s
gcc -Og -c mstore.c //generate an object-code file .o

To inspect the contents of machine-code files,a class
of programs known as 'disassemblers' can be invaluables
.With Linux systems,the program 'objdump' can serve
this role give the -d command line flag:

objdump -d mstore.c

Generating the actual executable code requires running
a linker on the set of object-code files,one of which
must contain a function 'main'.

GCC can generate code in Intel format for the sum
function using the following command line:

gcc -Og -S -masm=intel mstore.c

3.3 Data Formats

'word' refer to 16-bit data type
'double words' refer to 32-bit data type
'quad words' refer to 64-bit data type

3.4 Accessing Information

An x86-64 central processing unit(CPU) contains a set of
16 'general-purpose' registers storing 64-bit values.

These registers are used to store integer data as well as
pointers.

r: register e: extend register
---------------------------------------------------------
%rax %eax %ax %al // return value
---------------------------------------------------------
%rbx %ebx %bx %bl // callee saved
---------------------------------------------------------
%rcx %ecx %cx %cl // 4th argument
---------------------------------------------------------
%rdx %edx %dx %dl // 3rd argument
---------------------------------------------------------
%rsi %esi %si %sil // 2nd argument
---------------------------------------------------------
%rdi %edi %di %dil // 1st argument
---------------------------------------------------------
%rbp %ebp %bp %bpl // callee saved
---------------------------------------------------------
%rsp %esp %sp %spl // stack pointer
---------------------------------------------------------
%r8 %r8d %r8w %r8b // 5th argument
---------------------------------------------------------
%r9 %r9d %r9w %r9b // 6th argument
---------------------------------------------------------
%r10 %r10d %r10w %r10b // caller saved
---------------------------------------------------------
%r11 %r11d %r11w %r11b // caller saved
---------------------------------------------------------
%r12 %r12d %r12w %r12b // callee saved
---------------------------------------------------------
%r13 %r13d %r13w %r13b // callee saved
---------------------------------------------------------
%r14 %r14d %r14w %r14b // callee saved
---------------------------------------------------------
%r15 %r15d %r15w %r15b // callee saced
---------------------------------------------------------

3.4.1 Operand Specifiers

Source value can be given as constants or read from
registers or memory.Results can be stored in either
registers or memory.

+ immediate written with a '$' followed by an integer
+ registers
+ memory reference -- effective address.The most ge-
neral form is Imm(Rb,Ri,s): An immediate offset Imm
,a base register Rb,an index register Ri,and scale
factor s,where s must be 1,2,4,or 8.Both the base
and index must be 64-bit registers.Then the effec-
tive address is computed as:

Imm + R[Rb] + R[Ri] * s

3.4.2 Data Movement Instructions

-------------------------------------------------------
mov{b,w,l,q} S,D D <= S move source to destination
-------------------------------------------------------
movabsq I,R R <= I move absolute quad word
-------------------------------------------------------

X86-64 impose the restriction that 'a move instruction'
'cannot have both operands refer to memory locations'
Copying a value from one memory location to another:

1. load the source value into a register
2. write this register value to the destination.

Note: movl will also set the high-order 4 bytes of
register to 0. Any instruction that generates a
32-bit value for a register also sets the high-
order portion of the register to 0.

Copying a smaller source value to a larger desitination
.All of these instructions copy data from a register or
stored in memory, to a 'register'.

'movz class' fill out the remaining bytes of the des-
tination with zero.

------------------------------------------------------
movz S,R R <= zeroExtend(s)

movzbw move zero-extended byte to word
movzbl byte to double word
movzbq byte to quad word

movzwl word to double word
movzwq word to quad word
------------------------------------------------------
note: movl double to quad


'movs' class : sing-extending date movement
------------------------------------------------------
movs S,R R <= signExtend(S)

movsbw move sign-extended byte to word
movsbl
movsbq

movswl
movswq

movslq double word to quad word

cltq %rax <= signExtend(%eax)
------------------------------------------------------
note: cltq is exact same to movslq %eax,%rax

'memory references in x86-64 are always given with'
'quad word registers,such as %rax,even if the '
'operand is a byte,single word,or double word.'

3.4.3 Data Movement Example

3.4.4 Pushing and popping stack data

-----------------------------------------------------
pushq S R[%rsp] <= R[%rsp] - 8; //push quad word
M[R[%rsp]] <= S
-----------------------------------------------------
popq D D <= M[R[%rsp]]; //pop quad word
R[%rsp] <= R[%rsp] + 8
-----------------------------------------------------

pushq %rbp equal to:

subq $8,%rsp
movq %rbp,(%rsp)

popq %rax equal to:

movq (%rsp),%rax
addq $8,%rsp

movq 8(%rsp),%rdx //copy the second quad word from
//stact to %rdx

3.5 Arithmetic and logical operations

-----------------------------------------------------
leaq S,D D <= &S Load effective address
-----------------------------------------------------
inc D D <= D + 1 increment
dec D D <= D - 1 decrement
neg D D <= -D negate
not D D <= ~D complement
-----------------------------------------------------
add S,D D <= D+S add
sub S,D
imul S,D D <= D * S multiply
xor S,D D <= D ^ S exclusive-or
or S,D
and S,D
-----------------------------------------------------
sal k,D D <= D << k left shift
shl k,D D <= D << k left shift (same as sal)
sar k,D D <= D >> k arithmetic right shift
shr k,D logical right shift
-----------------------------------------------------

3.5.1 load effective address

'leaq' ,instead of reading from the designated location
,the instruction copies the effective address to the
destination.

'leaq' can be used to generate pointers for late memory
references. and also it can be used to cpmpactly
describe common arithmetic operations:

leaq 7(%rdx,%rdx,4),%rax //%rdx contain value x

will set register %rax to 5x + 7.


3.5.2 unary and binary operations

with the single operand serving as both source and
destination.This operand can be either a register or a
memory location.

As with the mov instructions, the two operands 0f bi-
nary cannot both be memory locations.

3.5.3 shift operations (page 195)

the shift amount is given frist and the value to
shift is given second.

The different shift instructions can specify the
shift amount either as 'an immediate' value or with
the signle-byte register '%cl'.

With x86-64, a shift instruction operation on data
values that are w bits long determines the shift
amount from the low-order m bits of register %cl,
where 2 ^m = w.The higher-order bits are ignored.

3.5.5 Special arithmetic operations

Intel refers to a 16-byte quantity as an 'oct word'.
quad word for 8-byte double word for 4-byte and word

-----------------------------------------------------
imulq S R[%rdx]:R[%rax] <= S * R[%rax] signed full
mulq S R[%rdx]:R[%rax] <= S * R[%rax] unsigned

cqto R[%rdx]:R[%rax] <= signExtend(R[%rax])
convert to oct word

idivq S R[%rdx] <= R[%rdx]:R[%rax] mod S signed divide
R[%rax] <= R[%rdx]:R[%rax] / S

divq S R[%rdx] <= R[%rdx]:R[%rax] mod S unsigned divi
R[%rax] <= R[%rdx]:R[%rax] / S
-----------------------------------------------------

imulq instruction has two difference forms.One serves
as a 'two-operand' multiply instruction,generating a
64-bit produce from two 64-bit operands.
The second serves as a 'one-operand' 128-bit multiple.

'cqto' takes no operands--it implicitly reads the sign
bit from %rax and copies it across all of %rdx.sign-
extend.

3.6 Control

Machine code provides two basic low-level mechanisms
for implementng conditional behavior:it tests data
values and then alters either the control flow or the
data flow based on the results of these tests.

3.6.1 condition codes

Cpu maitains a set of single-bit condition code regis-
ters describing attributes of the most recent arith-
metic or logical operation.These registers can then be
tested to perform conditional branches:

+ CF: Carry flag. carry out of the most significant
bit. used to detect 'overflow for unsigend op'
+ ZF: Zero flag. the most recent operation yield '0'
+ SF: Sign flag. operation yielded a negative value
+ OF: Overflow flag. operation caused a two's-
complement overflow--'negative or positive'

For example, t = a + b (in c) where all are integers

CF (unsigned)t < (unsigned) a Unsigned overflow
ZF (t==0) zero
SF (t < 0) negative
OF (a<0 == b<0)&&(t<0 != a<0) signed overflow

The 'leaq' instruction does 'not alter' any condition
codes,since it is intended to be used in address com-
putation.

For the 'logical operation',such as XOR, the carry and
overflow flags are set to 'zero'.

CF : 0
OF : 0

For the 'shift operation',the carry flag is set to the
'last bit shifted out',while the overflow flag is set
to 'zero'.

CF : last bit shifted out
OF : 0

The 'INC' and 'DEC' instructions set the overflow and
zero flags,but leave the carry flag unchanged.

CF : unchanged
OF : 1
ZF : 1

Comparison and test instructions.These instructions
set the condition codes without updating any other
registers.
-----------------------------------------------------
CMP s1,s2 s2 - s1 compare

cmpb compare byte
cmpw compare word
cmpl compare double word
cmpq compare quad word

TEST s1,s2 s1 & s2 test

testb test byte
testw
testl
testq test quad word
-----------------------------------------------------

testq %rax,%rax //test zero,negative,or positive


3.6.2 Accessing the condition codes

There are three common ways of using the condition
codes:
1. we can set a single byte to 0 or 1 depending on
some combination of the condition codes.
2. we can conditionally jump to some other part of
the program,or
3. we can conditionally transfer data.

A SET instruction has either one of the low-order
single-byte register elements or a single-byte memory
location as its desitination,setting this byte to '1'
or '0'.
------------------------------------------------------
sete D setz D <- ZF Eaual/zero
setne D setnz D <- ~ZF not equl/not zero

sets D D <- SF Negative
setns D D <- ~SF nonnegative

setg D setnle D<- ~(SF^OF)&~ZF greater(singed >)
setge D setnl D<- ~(SF^OF) greater or equal(>=)
setl D setnge 'D<- SF^OF' less (signed <)
setle D setng D <- (SF^OF) | ZF less or equal(<=)

seta D setnbe D <- ~CF&~ZF above(unsigned >)
setae D setnb D <- ~CF above(unsigned >=)
setb D setnae D <- CF below(<)
setbe D setna D <- CF | ZF below or equal(<=)
------------------------------------------------------
for signed integer: greater equal less
for unsigned integer: above equal below

3.6.3 Jump Instructions

These jump destinations are generally indicated in
assembly code by a 'label'.

jmp instruction jumps unconditionally.It can be either
a direct jump or a indirect jump.

Direct jumps are written in assembly code by giving a
label as the jump target.

Indirect jumps are written using '*' followed by an
operand specifier using one of the memory operand
formats:

jmp *%rax // uses the value in %rax
jmp *(%rax) //reads the jump target from memory.

------------------------------------------------------
Instruction Synonym JUmp condition Description
------------------------------------------------------
jmp Label 1 Direct jump
jmp *oerand 1 indirct jump

je Label jz ZF equal/zero
jne Label jnz ~ZF not (equal/0)

js Label SF Negative
jns Label ~SF Nonnegative

jg Label jnle ~(SF^OF)&~ZF greater(sign >
jge Label jnl
jl Label jnge SF^OF less <
jle Label jng

ja Label jnbe ~CF&~ZF above(unsign >
jae Label jnb ~CF
jb Label jnae CF
jbe Label jna CF | ZF
------------------------------------------------------
note: conditional jumps can only be direct

3.6.4 Jump instruction encodings

There are several difference encodings for jumps.The
most commonly used ones are PC relative: they encode
the difference between the address of the target in-
struction and the address of the instruction immediate
following the jump.These offsets can be encoded using
1,2,or 4 bytes.

A second encoding method is to give an 'absolute' addr,
using 4 bytes to directly specify the target.

3.6.5 Implement Conditional Braches with Conditional
control

for the c code:

if(test-expr)
then-statement
else
else-statement

typically adheres to the following form of assembly(
present in C syntax):

t = test-expr;
if (!t)
goto false;
then-statement
goto done;
false:
else-statement
done:
....

An alternate rule of translating if statement into
goto code is as follows:

t = test-expr
if(t)
goto trun;
else-statement
goto done;
true:
then-statement
done:
...

3.6.6 implementing conditional branches with conditiona
-l moves

An alternate strategy is through a conditional trans-
fer of data. This approach computes both outcomes of
a conditional operation and then selects one based on
whether or not the condition holds.

how did determine penalty time:

T(ran) = T(ok) + p * T(mp)

P: probability, mp: misprediction penalty.

The assmbler can infer the operand length of a condi-
tional move instruction from the name of the destina-
tion register,and so the same instruction name can be
used for all operand lengths.

Note: single conditional moves are not supported.

----------------------------------------------------
instruction Synonym Condition Description
----------------------------------------------------
cmove S,R cmovz ZF equal/zero
cmovne S,R cmovz

cmovs S,R SF negative
cmovns S,R nonnegative

cmovg S,R cmovnle greater
cmovge S,R cmovl
cmovl S,R cmovnge SF ^ OF less
cmovle S,R cmovng

cmova S,R cmovnbe Above
cmovae S,R cmovnb
cmovb S,R cmovnae CF below
cmovbe S,R cmovna
----------------------------------------------------

3.6.7 loops

do while() can be done like:

loop:
body-statement
t = test-expr;
if(t)
goto loop;

while loop can be done like:

goto test;
loop:
body-statement
test:
t = test-expr;
if(t)
goto loop;

because while loop can translate into do-while loop:

t = test-expr;
if(!t)
goto done;
do
body-satement
while(test-expr);
done:
---------------------------
so,it can like this too:

t = test-expr;
if (!t)
goto done;
loop:
body-statement
t= test-expr;
if(t)
goto loop;
done:

For loops

The behavior of such a for loop is identical to a
while loop:

init-expr;
while(test-expr) {
body-statment
update-expr;
}

Depending on the optimization level,GCC generates one
of the two translation strategies for while loops.

1. jump-to-middle strategy yields the goto code:

init-expr;
goto test;

loop:

body-statment
update-expr;

test:

t = test-expr;
if(t)
goto loop;

2. guarded-do strategy yields:

init-expr;
t = test-expr;
if(!t)
goto done;

loop:

body-statement
update-expr;
t = test-expr;
if(t)
goto loop;

done:

3.6.8 swith statememts

A jump table is an array where entry i is the address
of a code segment implementing the action the program
should take when the switch index equals i.

Jump talbes are used when there are a number of cases
(e.g.,four or more) and they span a small range of
values.

3.7 Procedures

Procedures are a key abstraction in software. They
provide a way to package code that implements some
functionality with a designated set of arguments and an
optional return value. This function can then be invok
from different points in a program.

Procedures come in many guises in diferent programming
language--functions,methods,subroutines,handlers,and so
on--but they share a general set of features.

Suppose procedure P calls procedure Q,and Q then execut
and returns back to P. These actions involve one or mor
of the following mechaisms:

+ Passing control

The program counter must be set to the starting
address for Q and set to the instruction in P
following the call to Q upon return.

+ Passing date

P must can provide one or more parameters to Q,
and Q can return a value to P

+ Allocating and deallocating memory.

Q may need to do so for local variables when it
begins and returns.

3.7.1 The run-time stack

when an x86-64 procedure requires storage beyond what
it can hold in registers, it allocates space on the
stack. This region is referred to as the procedure's
'stack frame'.

The frame for the currently executing procedure is
always at the top of the stack.Within that space, it
can save the valuse of registers,allocate space for
local variable, and set up arguments for the proce-
dures it cals. The stack frames for must procedures
are of fixed size, allocated at the beginning of the
procedure.

3.7.2 control transfer

when Passing control from function P to function Q:

The instruction 'call' pushes an address 'A' onto the
stack and sets the PC (program countiner %rip) to the
beginning of Q. The pushed address 'A' is referred to
as the return address and is computed as the address
of the instruction immediately following the call
instruction.

The counterpart instruction 'ret' pops an address A
off the stack and sets the PC to A.

----------------------------------------
call Label Procedure call
call *Operand Procedure call

ret Return from call
----------------------------------------
also can uses 'callq' and 'retq'

Like jumps, a call can be either direct or indirect
.In assembly code, the target of a direct call is
given as a labe, while the target of an indirect
call is given by '*' followed by an operand speci-
fier.

3.7.3 Data Transfer

procedure calls may involve passing data as arguments
and returning from a procedure may also involve re-
turning a value. thoes ocur via registers.

When procedure P calls procedure Q, the code for P
must first copy the arguments into the proper register
. when Q returns back to P, the code for P can access
the returned value in register %rax.

For x86-64, up to 6 integral arguments can be passed:

------------------------------------------
size(bits) 1 2 3 4 5 6

64bits rdi rsi rdx rcx r8 r9

32 edi esi edx ecx r8d r9d

16 %di %si %dx %cx r8w r9w

8 dil sil dl cl r8b r9b
-------------------------------------------

When a function has more than six integral arguments,
the other ones are passed on the stack, whit argumnet
7 at the top of the stack.

note: 'when passing parameters on the stack, all data'
'sizes are rounded up to be multiples of eight'


With the arguments in place, the program can then
execute a call instruction to transfer control to Q.

3.7.4 Local storage on the stack

Local data must be stored in memory, include:

+ There are not enough registers to hold all of the
local data.

+ The address operator '&' is applied to a local
variable, and hence we must be able to generate an
address for it.

+ Some of the local variables are arrays or struc-
tures and hence must be accessed by array or
structure references.

3.7.5 Local storage in registers

The set of program registers acts as a single resource
shared by all of the procedures. X86-64 adopts a
uniform set of conventions for register usage that
must be respected by all procedures.

By convention:

registers %rbx, %rbp, and %r12 - %r15 are classi-
fied as 'callee-saved' registers. When procedure P
calls Q, Q must preserve the values of these re-
gisters, ensuing that they have the same values
when Q returns to P as they did when Q was called.

Procedure Q can preserve a register value by either
not changing it at all or by pushing the original
value on the stack,altering it, and then popping the
old value from the stack before returning.

All other registers, except for the stack pointer
%rsp, are classified as 'caller-saved' registers.

3.7.6 recursive procedures

Each procedure call has its own private space on the
stack, and so the local variables of the multiple
outstanding calls do not interfere with one another.

3.8 Array Allocation and Access

Array element i will be stored at address:

x + L * i // x is address,L is size

movl (%rdx, %rcx,4), %eax

will compute the address: x + 4i

3.8.2 pointer arithmetic

p + i = Xp + L * i
so:
A[i] = *(A+i)

3.8.3 Nested Arrays

&D[i][j] = X + L(C * i + j) // c is total columns

e.g. int a[3][4];
then: a[2][3] address is a + 4(2 * 4 + 3)

3.8.5 Variable-Size Arrays

int var_ele(long n,int A[n][n],long i,long j) { }

The parameter n must precede the parameterA[n][n].

3.9 Heterogeneous Data Structures

3.9.1 structures
3.9.2 unions
3.9.3 Data Alignment

The x86-64 hardware will work correctly regardless
of the alignment of data. However,Intel recommends
that data be aligned to improve memory system per-
formace. Their alignment rule is based on the prin-
ciple that amy primitive object of K (1,2,4,8) bytes
must have an address that is a multiple of K.

The compiler places directives in the assembly code
indicating the desired alignment for global data.

.align 8

This ensures that the data following it will start
with an address that is a multiple of 8.

For code invoving structures, the compiler may need
to insert gaps in the field allocation to ensure that
each structure element satisfies its alignment.

Furthermore, the compilermust ensure that any pointer
P of type struct S1 * (if alignment is 8) statisfies
a 4-byte alignment.

In addition, the compiler may need to add padding to
the end of the structure so that each element in an
array of structures will satisfy its alignment re-
quirement.

There are some SSE instructions implement multimedia
operations. and these instruction operate on 16-byte
data.So SSE register must satisfy a 16-byte alignment
. This requirement has the following two consequences
:
+ The starting address for any block generated by a
memory allocation function( alloca malloc,calloc,
or realloc) must be a multiple of 16


+ The stack frame for most functions must be align-
ed on a 16-byte boundary( This requirement has a
number of exceptions)

3.10 Combining control and data in machine-level pro-
grams

3.10.1 understanding pointers
3.10.2 Life in the real world: Using the GDB debuger

----------------------------------------------------
command (page 280) effect
----------------------------------------------------
starting and stopping

quit exit gdb
run run program(give argumets
kill stop program

breakpoints

break multstore(func name)
break *0x400540(address)
delete 1 delete breakpoint 1
delete delete all

Execution

stepi execute one instruction
stepi 4 execute four instructions
nexti like stepi but proceed
through function calls
continue resume execution
finish run until current func-
tion returns
Examining code

disas disassemble current funct
disas multstore disassemble function mul.
disas 0x400544 disas func around address
disas 0x400540,0x40054d disas within the range
print /x $rip print program counter hex

Examing data

print $rax print contents in decimal
print /x $ print %rax in hex
print /t $rax print in binary
print 0x100 print decimal of 0x100
print /x 555 print hex of 555
print /x ($rsp+8) print contents of %rsp+8
print *(long *) ($rsp+8) print long integer at
address %rsp+8
print *(long *) 0x7fff8 print long intege of addr
x/2g 0x7fff8 examine two(8-byte) words
starting at the address
x/20b multstore examine first 20 bytes of
function multstore
Useful information

info frame information about current
stack frame
info registers values of all the registr
help
----------------------------------------------------

3.10.3 out-of-bounds memory references and buffer
overflow

A particularly common source of state corruption is
known as buffer overflow.

The idea of stack randomization is to make the position
of the stack vary from one run of a program to another.
This is implemented by allocating a ranom amount of spcae
between 0 and n bytes on the stack at the start of a pro-
gram.

More recently,AMD introduced an NX(for "no-execute") bit
into the memory protection for its 64-bit processors,sep-
arating the read and execute access modes. With this fea-
ture, the stack can be marked as being readable and writa-
ble, but not executable.

The techniques we have outlined:

+ randomization
+ stack protection, and
+ limiting which portion s of memory can hold
executable code
are three of the most common mechanisms used to minimize
the vulnerability of programs to buffer overflow attacks.

3.10.5 Supporting variable-size stack frames

To manage a variable-size stack frame,x86-64 code uses
register '%rbp' to serve as a 'frame pointer'(base poin-
ter.

The 'leave' instruction takes no arguments,equivalent to

movq $rbp,%rsp
popq %rbp

This instruction combination has the effect of deallocat
the entire stack frame.

3.11 Floating-Point Code

How programs operating on floating-point data are
mapped onto the machine, including

+ how floating-point values are stored and access
+ The instructions that operate on floating-point
+ conventions used for passing it as arguments
+ the conventions for how registers are preserved
during function calls.

The AVX floating-point architecture allows data to be
stored in 16 YMM registers, named %ymm0 - %ymm15.

Each YMM register is 256 bits(32 bytes) long.

When operating on scalar data, only the low-order 32
bits( for float ) or 64 bits(for double) are used.

The assembly code refers to the registers by their
SSE XMM register names %xmm0 - %xmm15, where each XMM
register is the low-order 128 bits(16 bytes) of the
corresponding YMM register.

xmm0 - xmm7 argument pass, xmm8 - xmm15 caller saved

3.11.1 Floating-Point Movement and Conversion Operation

-----------------------------------------------------
Instruction Source Dest Description
-----------------------------------------------------
vmovss M32 X move single precision
vmovss X M32 move single precision

vmovsd M64 X move double precision
vmovsd X M64 move double precision

vmovaps X X move aligned,packed singl
vmovapd X X aligned packed double
-----------------------------------------------------
X : XMM register M32: 32-bit memory M64: 64-bit

Those that reference memory are scalar instructions,
meaning that they operate on individul, rather than
packed,date values.

These instructions will work corrently regardless of
the alignment of data.

Gcc uses the scalar movement operations only to trans-
fer data from memory to an XMM registers, or vice versa

And uses vmovaps for single precision and vmovapd(doub)
transfer data between XMM registers.


convert floating-point data to integers
-----------------------------------------------------
Instruction Source Dest Description
-----------------------------------------------------
vcvttss2si X/M32 R32 convert with trunction
single precision to int
vcvttsd2si X/M64 R32 double precision to int

vcvttss2siq X/M32 R64 single to quad word int
vcvttsd2siq X/M64 R64 double to quad word int
-----------------------------------------------------

 

convert from integer to floating-point
-----------------------------------------------------
Instruction Source1 S2 Dest Description
-----------------------------------------------------
vcvtsi2ss M32/R32 x x convert int to single
vcvtsi2sd M32/R32 x x int to double float

vcvtsi2ssq M64/R64 x x quad word int to sgle
vcvtsi2sdq M64/R64 x x quad int to double F
-----------------------------------------------------

The first operand is read from memory or from a gener-
al purpose register. For our purposes, we can ignore
the second operand, since its value only affects the
upper bytes of the result. The destination must be an
XMM register. In common usage, both the second source
and the destination operands are identical, as in the
instruction:

vcvtsi2sdq %rax, %xmm1, %xmm1

This instruction reads a long integer from register
%rax, converts it to data type double,and stores the
result in the lower bytes of XMM register %xmm1.

Finally, for converting between two different float-
int point formats:

vunpcklps %xmm0, %xmm0, %xmm0
vcvtps2pd %xmm0, %xmm0

The vunpcklps instruction is normally used to inter-
leave the values in two XMM registers and store them
in a third.(detail on page 298). If the original reg-
ister held values [x3,x2,x1,x0], then the instruction
will update the register to hold values [x1,x1,x0,x0]
.

convert from single-precision to double-precision:

The vcvtps2pd instruction expands the two low-order
single precision values in the source XMM register to
be the two double-precision values in the destina-
tion XMM register. Applying this to the result of the
preceding vunpcklps instruction would give values
[dx0,dx0],where dx0 is the result of convertion.

Convert from double to single precision:

vmovddup %xmm0, %xmm0 //replicate first element
vcvtpd2psx %xmm0,%xmm0 //convert two ele to single

3.11.2 Floating-Point Code in Procedure

+ up to eight floating-point arguments can be passed
in XMM registers %xmm0 to %xmm7. others on stack.
+ a function that returns a floating-point value does
so in register %xmm0.
+ all XMM registers are caller saved.

3.11.3 Floating-point arithmetic operations

Each has either one(s1) or two(s1,s2) source operands
and a destination operand D. The first source operand
can be either an XMM register or a memory location.
The second source operand and the destination operand
must be XMM register.

Eacn operation has an instruction for single prcision
and an instruction for double precision. The result
is stored in the destination register.

Scalar floating-point arithmitic operation
--------------------------------------------------
Single Double Effect Description
--------------------------------------------------
vaddss vaddsd D <- S2+s1 floating-point add
vsubss vsubsd D <- s2-s1 substract
vmulss vmulsd D <- s2*s1 multiply
vdivss vdivsd D <- s2/s1 divide
vmaxss vmaxsd D <-max(s2,s1) maximum
vminss vminsd D <-min(s2,s1) minimun
sqrtss sqrtsd D <- s1 square root
--------------------------------------------------

3.11.4 Defing and Using Floating-Point Constants

AVX floating-point operations cannot have immediate
vales as operands.Instead, the compiler must allocate
and initialize storage for any constant values. The
code then reads the values from memory.

3.11.5 Using Bitwise Operations in Floating-Point Code

These operations all act on packed data, meaning that
they update the entire destination XMM register,
applying the bitwise operation to all the data in the
two source registers.

--------------------------------------------------
single double effect Description
--------------------------------------------------
vxorps xorpd D<- s2 ^ s1 bitwise exclusive or
--------------------------------------------------
vandps andpd D<- s2 & s1 bitwise and
--------------------------------------------------
note: perform on all 128 bits in an XMM register

3.12.6 Floating-point comparison operations

instruction based on description
--------------------------------------------------
ucomiss s1,s2 s2-s1 compare single
ucomisd s1,s2 s2-s1 compare double
--------------------------------------------------
note: argu s2 must be in XMM register


The condition codes are set as follows:

ordring s2,s1 CF ZF PF
------------------------
unordered 1 1 1
s2 < s1 1 0 0
s2 = s1 0 1 0
s2 > s1 0 0 0
------------------------
PF for NaN number set

The unordered case occurs when either operand is NaN. the jp(for jump on parity) instruction is used to conditionally jump when a floating-point compari- son yields an unordered reslut. Instrucntions ja and jb are used to conditionally jump.

相關文章
相關標籤/搜索