# Wed 27 Dec 18:57:00 GMT 2017linux
------------------------------------
Part II running programs on a system
------------------------------------shell
第七章 連接 (linking)windows
7.1 編譯器驅動程序app
編譯驅動程序表明用戶在須要時間調用語言預處理器,
編譯器,彙編器和鏈接器。生成 .i, .s, .o及可執行程序。ide
當運行程序時,shell調用系統的加載器(loader)函數,
把執行文件的代碼和數據複製到內存,而後將控制轉移到程序
的開頭。函數
7.2 靜態連接ui
連接器兩個主要任務:this
+ 符號解析(symbol resolution).
+ 重定位(relocation)。操作系統
7.3 目標文件debug
有三個形式:
+ 可重定位目標文件
+ 可執行目標文件。
+ 共享目標文件: 特殊的可重定向目標文件,能夠在加載
或運行時被動態的加載進內存並連接。
unix and linux use ELF(executable,linkable format)
windows: PE (portable executable)
MacOS-X: Mach-O format.
7.4 可重定位目標文件
ELF header
.text 以編譯程序的機器代碼
.rodata 只讀數據:如跳轉表
.data 以初始化的全局和靜態c變量
.bss 未初始化的全局和靜態c變量
.symtab 存放程序中定義和引用的函數和全局變量的信息
.rel.text 調用外部函數或引用全局變量的指令都須要修改
.rel.data In general, any initiallized global vari-
able whose initial value is the address
of a global variable or externally defined
function will need to be modified.
.debug
.line
.strtab
7.5 符號和符號表
包含定義和引用的符號的信息:
1. 用模塊m自己定義並能被其餘模塊引用的全局符號。即
非靜態的c函數和全局變量。
2. 其餘模塊定義並被m快自己引用的全局符號。即外部符號,
對應於其餘模塊定義的非靜態c函數和全局變量。
3. 只被模塊m定義和引用的局部符號。即帶static屬性的c
函數和全局變量。這些符號只在本模塊中可見。
There are three special pseudosections that don't have
entries in the 'section header table':
+ ABS is for symbols that should not be reloated.
+ UNDEF is for undefined symbols: symbols that are
referenced in this object module but defined else-
where.
+ COMMON is for uninitialized data objects that are
not yet allocated.
Note: the pseudosections exist only in relocatable
object files.
7.6 符號解析
The linker resolves symbol references by associating
each reference with exactly one symbol definition from the
symbol tables of its input relocateable object files.
7.6.1
At comiling time, the compiler exorts each global
symbol to the assembler as either 'strong' or 'weak'.
Functions and initialized global variables get strong
symbols. Uninitialized global varibles get weak symbols.
And then, Linux linkers use the following rules for
dealing with duplicate symbol names:
Rule 1. Multiple strong symbols with same name are
not allowed.
Rule 2. same name for strong and weak symbols, choose
strong symbols.
Rule 3. same name for all weak symbols, choose any of
the weak symbols.
7.6.2 static libraries
static libraries are stored on disk in a particular
file format known as an 'archive'.
An archive is a collection of concatenated relocatable
object files, with a header that describes the size and
location of each member object file.
Archive filenames are denoted with the .a suffix.
To create a static library of some functions:
ar rcs libsome.a some.o any.o
7.7 Relocation
Relocation consists of two steps:
1. Relocating sections and symbol definitions.
2. Relocating symbol references within sections.
7.7.1 Relocation Enties
Relocation entries for code and data are placed in
.rel.tex and .rel.data respectively.
7.8 executable object files
For any segment s, the linker must choose a starting
address,vaddr, such that:
'vaddr mod align = off mod align' //off: first offset
This alignment requirement is an optimazation that
enables segments in the object file to be transferred
efficiently to memory when the program executes.
7.9 Loading Executable Object Files
每一個linux程序都有一個運行時內存映像。在linux x86-64
系統中,代碼段老是從地址0x400000處開始的,後面是數據段。
運行時堆在數據段以後,經過調用malloc庫向上增加。堆後面的區域是爲共享模塊保留的。用戶的棧老是從最大的合法用戶地址:
2^48 - 1 開始,向較小內存地址增加。
棧上的區域,從地址2^48開始,是爲內核準備的。所謂內核
就是操做系統駐留在內存的部分。
7.10 Dynamic Linking with Shared Libraries
A shared library is an object module that, at either
run time or load time, can be loaded at an arbitrary
memory address and linked with a program in memory.
This process is known as 'dynamic linking' and is
performed by a program called a 'dynamic linker'.
shared libraries are also referred to as shared
objects, and on Linux systems they are indicated by the
'.so' suffix. Microsoft operating systems make heavy use
of shared libraries, which they refer to as 'DLLs'
(dynamic link libraries).
Shared libraries are shared in two ways:
1. there is exactly one .so file for a particular
library. the code and data in this .so file are
shared by all of the executable object files.
2. A single copy of the .text section of a shared
library in memory can be shared by different
running processes.
To build a shared library libvector.so :
$: gcc -shared -fpic -o libvector.so addvec.c multvec.c
Once we have created the library,we would then link:
$: gcc -o prog21 main2.c ./libvector.so
When the loader loads and runs the prog21, the loader
will notice the prog21 contains '.interp' section, which
contains the path name of the dynamic linker,which is
itself a shared object(e.g. ld-linux.so on linux systems)
. The dynamic linker then finishes the linking task by:
+ relocating the text and data of libc.so into some
memory segment
+ relocating the text and data of libvector.so into
another memory segment
+ relocating any references in prog21 to symbols
defined by libc.so and libvector.so
Finalliy, the dynamic linker passes control to the
application(prog21).
From this point on, the locations of the shared
libraries are fixed and do not change during execution
of the program.
Linux systems provide a simple interface to the
dynamic linker that allows apllication programs to load
and link shared libraries at run time.
#include <dlfcn.h>
void *dlopen(const char *filename, int flag);
returns: pointer to handle if ok,NULL on error
#include <dlfcn.h>
void *dlsym(void *handle,char *symbol);
returns: pointer to symbol if ok, NULL on error
#include <dlfcn.h>
int dlclose(void *handle);
returns: 0 if OK, -1 on error
#include <dlfcn.h>
const char *dlerror(void);
returns: error message if previous call to dlopen
,dlsym,or dlclose failed;
NULL if previous call was OK
$: gcc -rdynamic -o a.out dll.c -ldl
note: -rdynamic: make dll.c global symbols are also
available for symbol resolution.
-ldl: short for libdl.so
PIC Data Refernces
No matter where we load an object module(including
shared object modules) in memory, the data segment is
always the same distance from the code segment.
GOT (global offset table) at the beginning of the
data segment. The GOT contains an 8-byte entry for each
gobal data object (procedure or global variable) that is
referenced by the object module.
7.12 位置無關代碼( position-independent code, PIC)
用戶對gcc使用-fpic選項生成。共享庫的編譯必須使用此選項
PIC Function Calls
7.13 library interpositioning
7.13.1 compile-time interpositioning
gcc -DCOMPILETIME -c mymalloc.c
gcc -I. -o intc int.c mymalloc.o
7.13.2 Link-time interpositioning
The Linux static linker supports link-time interposi-
tioning with the --wrap f flag. This flag tells the linker
to resolve references to symbol f as __wrap_f, and to
resolve references to symbol __real_f as f.
gcc -DLINKTIME -c mymalloc.c
gcc -c int.c
gcc -Wl,--wrap,malloc -wl,--wrap,free -o int1 int.o
mymalloc.o
7.13.3 run-time interpositioning
This mechanism is based on the dynamic linker's LD_
PRELOAD environment variable.
gcc -DRUNTIME -shared -fpic -o mymalloc.so mymalloc.c
-ldl
gcc -o intr int.c
linux> LD_PRELOAD="./mymalloc.so" ./intr # execute
7.14 tools for manipulating object files
the GNU binutils package:
ar strings: list all of printable string in obj. strip nm: list symbols defined in symbol table. size: list name and size of sections in obj. readelf objdump LDD program for manipulating shared libraries.