參考文章:juejin.im/post/5e12ce…程序員
在平時的開發過程當中,咱們經歷過成千上萬次的 Command + B/R 的過程,但可能不多有人關注這個過程當中 XCode 幫咱們作了哪些些事情。bootstrap
事實上,這個過程分解爲4個步驟,分別是預處理(Prepressing)、編譯(Compilation)、彙編(Assembly)和連接(Linking). ------ 摘自《程序員的自我修養-- 連接、裝載與庫》數組
在以上4個步驟中,IDE主要作了如下幾件事:緩存
在蘋果的操做系統中,就是由dyld來完成連接加載程序的操做。app
dyld(The dynamic link editor) 是蘋果的動態連接器,負責程序的連接及加載工做,是蘋果操做系統的重要組成部分。dyld是開源的,咱們能夠在蘋果的開源網站 OpenSource 上找到其源碼。框架
點擊去下載dyld源碼dom
下載源碼,咱們就能夠分析dyld的加載過程了。ide
首先咱們建立新的iOS工程,在ViewController的 .m 文件中實現一個空的 +load() 方法,並在該方法打斷點 函數
從函數調用棧咱們能夠看見第一個調用的地方在dyld的 start 函數, 點擊能夠看見彙編代碼以下post
uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[],
intptr_t slide, const struct macho_header* dyldsMachHeader,
uintptr_t* startGlue)
{
// if kernel had to slide dyld, we need to fix up load sensitive locations
// we have to do this before using any global variables
slide = slideOfMainExecutable(dyldsMachHeader);
bool shouldRebase = slide != 0;
#if __has_feature(ptrauth_calls)
shouldRebase = true;
#endif
if ( shouldRebase ) {
rebaseDyld(dyldsMachHeader, slide);
}
// allow dyld to use mach messaging
mach_init();
// kernel sets up env pointer to be just past end of agv array
const char** envp = &argv[argc+1];
// kernel sets up apple pointer to be just past end of envp array
const char** apple = envp;
while(*apple != NULL) { ++apple; }
++apple;
// set up random value for stack canary
__guard_setup(apple);
#if DYLD_INITIALIZER_SUPPORT
// run all C++ initializers inside dyld
runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple);
#endif
// now that we are done bootstrapping dyld, call dyld's main
uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
複製代碼
在 start() 函數中主要作了一下幾件事:
slide = slideOfMainExecutable(dyldsMachHeader);
bool shouldRebase = slide != 0;
#if __has_feature(ptrauth_calls)
shouldRebase = true;
#endif
if ( shouldRebase ) {
rebaseDyld(dyldsMachHeader, slide);
}
複製代碼
點擊進入 dyld::_main() 函數,代碼以下
// Entry point for dyld. The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide,
int argc, const char* argv[], const char* envp[], const char* apple[],
uintptr_t* startGlue)
{
}
複製代碼
dyld::main()函數的代碼比較多,這裏只展現了方法名稱和參數。dyld::main()主要作了如下幾件事:
- setContext:
- 加載共享緩存
- reloadAllImages
- 加載插入的庫(load any inserted libraries)
- 連接主程序和插入的庫
- 初始化主程序,initializeMainExecutable();
CRSetCrashLogMessage("dyld: launch started");
setContext(mainExecutableMH, argc, argv, envp, apple);
複製代碼
在main函數的 launch started 處咱們能夠發現 setContext() 方法,點進方法咱們咱們發現這一步就是設置上下文 gLinkContext ,點進 gLinkContext 咱們發現它是一個LinkContext類型變量
configureProcessRestrictions(mainExecutableMH);
checkEnvironmentVariables(envp);
複製代碼
接下來要配置進程受限以及檢測環境變量,這兩步操做會影響到有些庫是否會被加載。
咱們爲何要加載共享緩存?共享緩存究竟是什麼呢?舉個例子,咱們開發過程當中會常常用到 UIKit 和 Foundation 框架,這些框架是放在哪裏呢,怎樣加載呢?若是每一個app在運行時都加載一次,顯然會十分影響效率,也是一種不經濟的方式。蘋果爲了解決這一問題,使用了共享緩存機制這一方式。對於系統動態庫,在app用到某一動態庫時就加載進內存,其餘app用到該動態庫時就沒必要重複加載。
點擊 mapSharedCache() 方法中的 loadDyldCache 方法能夠發現,會有這一邏輯判斷,代碼以下。
bool loadDyldCache(const SharedCacheOptions& options, SharedCacheLoadInfo* results) {
results->loadAddress = 0;
results->slide = 0;
results->errorMessage = nullptr;
#if TARGET_IPHONE_SIMULATOR
// simulator only supports mmap()ing cache privately into process
return mapCachePrivate(options, results);
#else
if ( options.forcePrivate ) {
// mmap cache into this process only
return mapCachePrivate(options, results);
}
else {
// fast path: when cache is already mapped into shared region
bool hasError = false;
if ( reuseExistingCache(options, results) ) {
hasError = (results->errorMessage != nullptr);
} else {
// slow path: this is first process to load cache
hasError = mapCacheSystemWide(options, results);
}
return hasError;
}
#endif
}
複製代碼
在進行共享緩存的加載前,dyld會檢測是否能夠禁用共享緩存,代碼以下,咱們能夠發現iOS系統下沒法禁用共享緩存。
static void checkSharedRegionDisable(const dyld3::MachOLoaded* mainExecutableMH, uintptr_t mainExecutableSlide) {
#if __MAC_OS_X_VERSION_MIN_REQUIRED
// if main executable has segments that overlap the shared region,
// then disable using the shared region
if ( mainExecutableMH->intersectsRange(SHARED_REGION_BASE, SHARED_REGION_SIZE) ) {
gLinkContext.sharedRegionMode = ImageLoader::kDontUseSharedRegion;
if ( gLinkContext.verboseMapping )
dyld::warn("disabling shared region because main executable overlaps\n");
}
#if __i386__
if ( !gLinkContext.allowEnvVarsPath ) {
// <rdar://problem/15280847> use private or no shared region for suid processes
gLinkContext.sharedRegionMode = ImageLoader::kUsePrivateSharedRegion;
}
#endif
#endif
// iOS cannot run without shared region
複製代碼
在MachO文件的LoadCommands中的有一種類型叫 LC_LOAD_DYLIB ,這一類型標識的是程序所依賴的動態庫,如圖所示:
// The kernel maps in main executable before dyld gets control. We need to
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path) {
// try mach-o loader
if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
addImage(image);
return (ImageLoaderMachO*)image;
}
throw "main executable not a known format";
}
複製代碼
首先調用 isCompatibleMachO() 判斷是否兼容此MachO文件, 主要是判斷MachO文件的Magic number、cputype、cpusubtype等字段是否正確。
// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
//dyld::log("ImageLoader=%ld, ImageLoaderMachO=%ld, ImageLoaderMachOClassic=%ld, ImageLoaderMachOCompressed=%ld\n",
// sizeof(ImageLoader), sizeof(ImageLoaderMachO), sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed));
bool compressed;
unsigned int segCount;
unsigned int libCount;
const linkedit_data_command* codeSigCmd;
const encryption_info_command* encryptCmd;
sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
// instantiate concrete class based on content of load commands
if ( compressed )
return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
else
#if SUPPORT_CLASSIC_MACHO
return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
throw "missing LC_DYLD_INFO load command";
#endif
}
複製代碼
在該函數中有幾個未初始化的變量 compressed、segCount、libCount、codeSigCmd、encryptCmd ,這幾個變量的地址做爲參數,在 sniffLoadCommands() 調用後發生改變。 sniffLoadCommands() 函數的實現以下:
// determine if this mach-o file has classic or compressed LINKEDIT and number of segments it has
void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
const linkedit_data_command** codeSigCmd,
const encryption_info_command** encryptCmd)
{
*compressed = false;
*segCount = 0;
*libCount = 0;
*codeSigCmd = NULL;
*encryptCmd = NULL;
......省略部分代碼
switch (cmd->cmd) {
case LC_DYLD_INFO:
case LC_DYLD_INFO_ONLY:
if ( cmd->cmdsize != sizeof(dyld_info_command) )
throw "malformed mach-o image: LC_DYLD_INFO size wrong";
dyldInfoCmd = (struct dyld_info_command*)cmd;
*compressed = true;
break;
case LC_SEGMENT_COMMAND:
segCmd = (struct macho_segment_command*)cmd;
case LC_SEGMENT_COMMAND:
// ignore zero-sized segments
if ( segCmd->vmsize != 0 ) *segCount += 1;
case LC_LOAD_DYLIB:
case LC_LOAD_WEAK_DYLIB:
case LC_REEXPORT_DYLIB:
case LC_LOAD_UPWARD_DYLIB:
*libCount += 1;
// fall thru
case LC_CODE_SIGNATURE:
......
if ( *codeSigCmd != NULL )
throw "malformed mach-o image: multiple LC_CODE_SIGNATURE load commands";
*codeSigCmd = (struct linkedit_data_command*)cmd;
break;
case LC_ENCRYPTION_INFO:
......
if ( *encryptCmd != NULL )
throw "malformed mach-o image: multiple LC_ENCRYPTION_INFO load commands";
*encryptCmd = (encryption_info_command*)cmd;
break;
複製代碼
介於代碼比較長,這裏只展現了部分代碼,不過咱們也能夠看見該函數主要是讀取了MachO文件的LoadCommands信息,並賦值給以前定義的變量。 這幾個變量的含義以下:
- compressed:
- segCount: MachO文件中segment數量
- libCount: MachO文件中依賴的動態庫的數量
- codeSigCmd: 簽名信息
- encryptCmd: 加密信息,如cryptid等
if ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib)
loadInsertedDylib(*lib);
}
// record count of inserted libraries so that a flat search will look at
// inserted libraries, then main, then others.
sInsertedDylibCount = sAllImages.size()-1;
複製代碼
根據 DYLD_INSERT_LIBRARIES 來斷定是否加載插入的庫,若是容許加載插入的庫且有插入的庫,則for循環執行 loadInsertedDylib() 函數加載動態庫,若是不容許加載插入的庫,則執行下面的操做。
// link main executable
gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
if ( mainExcutableAlreadyRebased ) {
// previous link() on main executable has already adjusted its internal pointers for ASLR
// work around that by rebasing by inverse amount
sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
}--nExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
gLinkContext.bindFlat = true;
gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}
// link any inserted libraries
// do this after linking main executable so that any dylibs pulled in by inserted
// dylibs (e.g. libSystem) will not be in front of dylibs the program uses
if ( sInsertedDylibCount > 0 ) {
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
image->setNeverUnloadRecursive();
}
// only INSERTED libraries can interpose
// register interposing info after all inserted libraries are bound so chaining works
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
image->registerInterposing(gLinkContext);
}
}
複製代碼
經過 link() 函數連接主程序和插入的庫,連接完畢後還會進行 recursiveBind() 、弱綁定 weakBind() 。至此,dyld進行setContext、加載共享緩存、reloadAllImages、加載插入的庫、連接主程序和插入的庫已完成,加下來會進行初始化主程序的操做。
進行到這一步,咱們會發現正好對應文章開頭的函數調用棧中第6步的 initializeMainExecutable() 函數。
void initializeMainExecutable() {
// record that we've reached this step
gLinkContext.startedInitializingMainExecutable = true;
// run initialzers for any inserted dylibs
ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
initializerTimes[0].count = 0;
const size_t rootCount = sImageRoots.size();
if ( rootCount > 1 ) {
for(size_t i=1; i < rootCount; ++i) {
sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
}
}
// run initializers for main executable and everything it brings up
sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
// register cxa_atexit() handler to run static terminators in all loaded images when this process exits
if ( gLibSystemHelpers != NULL )
(*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL);
// dump info if requested
if ( sEnv.DYLD_PRINT_STATISTICS )
ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]);
if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]);
}
複製代碼
從代碼中咱們能夠看到 runInitializers() 函數,由註釋能夠看到該函數是用來運行主程序初始化器的,並且該函數正對應函數調用棧中的第5步,從這一步開始方法的所在的類由dyld變成了ImageLoader。咱們進入函數中看一下代碼:
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
uint64_t t1 = mach_absolute_time();
mach_port_t thisThread = mach_thread_self();
ImageLoader::UninitedUpwards up;
up.count = 1;
up.images[0] = this;
processInitializers(context, thisThread, timingInfo, up);
context.notifyBatch(dyld_image_state_initialized, false);
mach_port_deallocate(mach_task_self(), thisThread);
uint64_t t2 = mach_absolute_time();
fgTotalInitTime += (t2 - t1);
}
複製代碼
在該函數中咱們進一步能夠看到函數調用棧第4步的 processInitializers() 函數,繼續點進該函數咱們會發現,函數調用棧的第3步 recursiveInitialization() 函數,此時咱們沒法再點進函數,可是能夠經過在本文件中搜索的方式找到該函數。
本篇文章主要總結了dyld的加載流程。將源代碼轉換爲目標文件通常會經歷 預編譯、編譯、彙編、連接的過程,dyld就是蘋果的連接器,用於將可執行文件連接成目標文件,其主要流程有:
本文是第一次進行dyld底層探索,還有許多細節沒有探索,歡迎你們批評指正,我會不斷進行完善,後續也會繼續進行底層的探索。