iOS底層探究 - 類的結構剖析（cache_t）

時間 2020-03-05

標籤 ios 底層探究結構剖析 cache 欄目 iOS 简体版

原文原文鏈接

引言：

上一篇咱們一塊兒探索了 iOS 類的底層結構，咱們先回顧下他的定義：程序員

// 在objc-runtime-new.h這個文件發現了這段定義
struct objc_class : objc_object {
    // Class ISA;
    Class superclass;           // 8
    cache cache;             // formerly cache pointer and vtable 16
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags 8
     //下面還有不少方法，在這裏暫時咱們不關注
};
複製代碼

咱們已經介紹了類的幾個重要成員，其中重點探索了class_data_bits_t bits的內部結構，這裏面還有一個cache_t，一塊兒來看一看這個東西。顧名思義就是緩存的意思，那麼用來緩存什麼呢？
答案是： 緩存方法 。
它的底層是經過散列表（哈希表）的數據結構來實現存儲和讀取的，用於緩存曾經調用過的方法，再次調用時能夠從緩存裏面直接讀取，提升方法的查找速度。那麼接下來咱們詳細介紹下這個傢伙。算法

一：`cache_t`在源碼中的定義

先看下類結構的定義：數組

咱們能夠看出ISA,superclass分別都佔8個字節，而cache_t是在class首地址平移16字節的位置，接下來咱們看下cache_t的定義：緩存

struct cache_t {
    struct bucket_t *_buckets; // 8字節,*便是指針,指針佔 8 字節
    mask_t _mask;  // 4字節,uint32_t mask_t,int 類型 4 字節
    mask_t _occupied; // 4字節,同上
}
複製代碼

其中：數據結構

_mask 散列表長度 - 1
_occupied 已緩存方法數量

而_buckets是一個數組,數組裏面的每個元素就是一個bucket_t,咱們看下源碼裏bucket_t的定義：多線程

struct bucket_t {  
private:  
    // IMP-first is better for arm64e ptrauth and no worse for arm64. 
    // SEL-first is better for armv7* and i386 and x86_64. 
#if __arm64__ 
    MethodCacheIMP _imp;  
    cache_key_t _key;  
#else 
    cache_key_t _key;  
    MethodCacheIMP _imp;  
#endif 
public:  
    inline cache_key_t key() const { return _key; }  
    inline IMP imp() const { return (IMP)_imp; }  
    inline void setKey(cache_key_t newKey) { _key = newKey; }  
    inline void setImp(IMP newImp) { _imp = newImp; }  

    void set(cache_key_t newKey, IMP newImp);  
};  
複製代碼

從源碼能夠可看出bucket_t裏面包含了2個參數_imp和_key.less

_key 方法的SEL做爲key
_imp 函數實現的內存地址

二：`cache_t`的做用

引言裏面咱們提到cache_t是用來緩存方法的，那麼爲何要緩存方法呢，直接調用不能夠嗎？講到這裏咱們先回顧下方法的查找流程：
正常時候咱們調用方法是周NORMAL這種形式，也就是普通查找，假設有個person類的實例方法eat被調用[person eat],咱們來看下系統的查找流程:函數

obj -> isa -> obj的Class對象 -> method_array_t methods -> 對該表進行遍歷查找，找到就調用，沒找到繼續往下走
obj的Class對象 -> superclass父類 -> method_array_t methods -> 對父類的方法列表進行遍歷查找，找到就調用，沒找到就重複本步驟
找到就調用，沒找到重複流程 ...
直到跟類NSObject -> isa -> NSObject的Class對象 -> method_array_t methods
最後沒找到纔會走各類判斷，拋出異常等

看下，多麼複雜和繁瑣，可是蘋果的工程師就很聰明，在每一個類裏面放一個緩存的盒子，你只要調用我就給你發方法的SEL和IMP保存下來，下次調用的時候只要根據SEL就能在緩存中很快的獲得方法的實現地址，豈不是極大的提升了效率。ui

三：`cache_t`的緩存流程

關於流程源碼裏面有這樣一段註釋

* Cache readers (PC-checked by collecting_in_critical())
 * objc_msgSend*
 * cache_getImp
 *
 * Cache writers (hold cacheUpdateLock while reading or writing; not PC-checked)
 * cache_fill         (acquires lock)
 * cache_expand       (only called from cache_fill)
 * cache_create       (only called from cache_expand)
 * bcopy               (only called from instrumented cache_expand)
 * flush_caches        (acquires lock)
 * cache_flush        (only called from cache_fill and flush_caches)
 * cache_collect_free (only called from cache_expand and cache_flush)
複製代碼

能夠看出讀緩存的時候過程很簡單，就是調用objc_msgsend以後經過cache_getImp去讀取函數的地址，因此咱們着重研究下寫的流程，咱們看些的過程不少，可是他的入口是從cache_fill開始的：

void cache_fill(Class cls, SEL sel, IMP imp, id receiver) {
#if !DEBUG_TASK_THREADS
   mutex_locker_t lock(cacheUpdateLock);
   cache_fill_nolock(cls, sel, imp, receiver);
#else
   _collecting_in_critical();
   return;
#endif
}
複製代碼

在cache_fill這個函數內部又調用了cache_fill_nolock這個函數：

static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver) {
    cacheUpdateLock.assertLocked();

    // Never cache before +initialize is done
    if (!cls->isInitialized()) return;

    // Make sure the entry wasn't added to the cache by some other thread 
    // before we grabbed the cacheUpdateLock.
    if (cache_getImp(cls, sel)) return;

    cache_t *cache = getCache(cls);
    cache_key_t key = getKey(sel);

    // Use the cache as-is if it is less than 3/4 full
    mask_t newOccupied = cache->occupied() + 1;
    mask_t capacity = cache->capacity();
    if (cache->isConstantEmptyCache()) {
        // Cache is read-only. Replace it.
        cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
    }
    else if (newOccupied <= capacity / 4 * 3) {
        // Cache is less than 3/4 full. Use it as-is.
    }
    else {
        // Cache is too full. Expand it.
        cache->expand();
    }

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot because the 
    // minimum size is 4 and we resized at 3/4 full.
    bucket_t *bucket = cache->find(key, receiver);
    if (bucket->key() == 0) cache->incrementOccupied();
    bucket->set(key, imp);
}
複製代碼

這麼大段代碼，能夠感受到這個是個核心函數，函數內部作了不少的操做，咱們逐行去研究下

首先是判斷cls也就是類是否被初始化，若是沒有直接return,接下來判斷cache_getImp(cls, sel)是否有值，這裏應該是防止在多線程的調用中，別的線程也會調用相同的方法，因此判斷下是否在別的線程被寫入，若是有就return

// Never cache before +initialize is done
    if (!cls->isInitialized()) return;

    // Make sure the entry wasn't added to the cache by some other thread 
    // before we grabbed the cacheUpdateLock.
    if (cache_getImp(cls, sel)) return;
複製代碼

接下來是經過調用函數內部使用內存平移，拿出類內部的緩存，而後根據sel生成一個key

cache_t *cache = getCache(cls);
cache_key_t key = getKey(sel);
複製代碼

首先定義newOccupied等於舊的佔用數+1，取出cache_t中的capacity也就是緩存的容量值，

mask_t newOccupied = cache->occupied() + 1;
mask_t capacity = cache->capacity();
複製代碼

接下來就是判斷比較了：

1：若是緩存是是空的，則進行cache->reallocate()。
2：若是新的佔位容量小於等於當前容量的3/4，則不做處理
3：而後若是新的佔位容量大於當前容量的3/4，則進行擴容處理cache->expand()

if (cache->isConstantEmptyCache()) {
     // Cache is read-only. Replace it.
     cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
 }
 else if (newOccupied <= capacity / 4 * 3) {
     // Cache is less than 3/4 full. Use it as-is.
 }
 else {
     // Cache is too full. Expand it.
     cache->expand();
 }
複製代碼

其中cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE)是對buckets從新生成，咱們看下他的實現：

void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
    bool freeOld = canBeFreed();

    bucket_t *oldBuckets = buckets();
    bucket_t *newBuckets = allocateBuckets(newCapacity);

    // Cache's old contents are not propagated. 
    // This is thought to save cache memory at the cost of extra cache fills.
    // fixme re-measure this

    assert(newCapacity > 0);
    assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);

    setBucketsAndMask(newBuckets, newCapacity - 1);
    
// 下面這個就是把舊的bucket_t給抹掉，釋放內存
    if (freeOld) {
        cache_collect_free(oldBuckets, oldCapacity);
        cache_collect(false);
    }
}
複製代碼

函數是根據新的newCapacity生成一個新的Buckets而後把老的Buckets給替換掉，最後釋放掉老的Bucket佔用的內存空間。

接下來咱們看下cache->expand()這個函數的調用：

void cache_t::expand()
{
    cacheUpdateLock.assertLocked();
    
    uint32_t oldCapacity = capacity();
    uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;
  *
  能進入到擴容的這裏面 _mask 是有值的，而且是而且咱們知道獲得的oldCapacity是_maks + 1，
  申請的一份新的容量是 oldCapacity * 2，咱們能夠驗證一下開闢兩倍的空間是最划算的。
  *

    if ((uint32_t)(mask_t)newCapacity != newCapacity) {
        // mask overflow - can't grow further
        // fixme this wastes one bit of mask
        newCapacity = oldCapacity;
     }
    reallocate(oldCapacity, newCapacity);
}
複製代碼

以上咱們可總結出cache擴容，就是從新申請一個容量是原來2倍的新容量。

在這裏咱們有一個疑問就是在容量不夠的時候爲何要銷燬重建呢，那樣以前的緩存不就沒有了嗎，爲何保存以前緩存的方法呢？

蘋果的程序員在設計這塊的時候可能考慮到保存以前的調用cache，開闢空間以後還要把老的緩存進行內存平移，這樣自己緩存是讓人節省時間的設計，這樣作反而更耗時，不如銷燬直接重建來的快速。

擴容和銷燬重建的函數咱們已經瞭解了，那麼回到主線，此時Buckets存儲筒已經準備好，接下來就是存儲的過程，首先咱們經過cache->find(key, receiver)來尋找個合適的筒子，咱們看下他是怎麼作尋找的：

bucket_t * cache_t::find(cache_key_t k, id receiver)
{
    assert(k != 0);

    bucket_t *b = buckets();
    mask_t m = mask();
// 經過cache_hash函數 [begin = k & m]計算出key的值 k 對應的index的值 begin，用來記錄查詢起始索引
    mask_t begin = cache_hash(k, m);
    
    // begin賦值給i，用於切換索引
    mask_t i = begin;
    do {
        if (b[i].key() == 0  ||  b[i].key() == k) {
            // 用這個i從散列表取值，若是取出來的bucket_t 的 key = k，則查詢成功，返回bucket_t
            // 若是key = 0， 說明在索引i的位置上尚未緩存過方法，一樣須要返回該bucket_t，用於終止緩存查詢。
            return &b[i];
        }
    } while ((i = cache_next(i, m)) != begin);
// 這裏其實就是找到咱們cache_t中buckets列表裏面須要匹配的bucket。
    // hack
    // 若是此時尚未找到key對應的bucket_t，或者是空的bucket_t，則循環結束，說明查找失敗，調用下面的bad_cache函數
    Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
    cache_t::bad_cache(receiver, (SEL)k, cls);
}
複製代碼

咱們知道Buckets實際上是一個數組，數組的底層也是個散列表，根據key計算出index值的這個算法稱做散列算法。index = @selector(XXXX) & mask 根據&運算的特色，能夠得知最終index <= mask，而mask = 散列表長度 - 1，也就是說0 <= index <= 散列表長度 - 1，這實際上覆蓋了散列表的索引範圍。

這個函數調用以後咱們獲取到了合適的bucket筒子，接下來判斷if (bucket->key() == 0) cache->incrementOccupied()若是爲真也就是筒子沒被佔用過，那麼Occupied佔用數要加一。

最後，調用set(key, imp)進行填充

bucket->set(key, imp);
複製代碼

咱們總結下cache_t的整體流程：

1: 當一個對象經過objc_megsend接收到消息時;首先根據obj的isa指針進入它的類對象cls裏面。
2: 在obj的cls裏面，首先到緩存cache_t裏面查詢方法message的函數實現，若是找到，就直接調用該函數。
3: 若是上一步沒有找到對應函數，在對該cls的方法列表進行二分/遍歷查找
4: 若是找到了對應函數，接下來就是對cache_t進行填充

(1) 進行容錯判斷，準備一些臨時變量。
(2) 在每次進行緩存操做以前，首先須要檢查緩存容量，若是緩存內的方法數量超過規定的臨界值(設定容量的3/4)，須要先對緩存進行2倍擴容，原先緩存過的方法所有丟棄，而後將當前方法存入擴容後的新緩存內
(3) 在Buckets數組裏經過散列算法進行查找合適的bucket
(4) 找到以後判斷是否曾經佔用過，若是沒有佔用過，那麼就把Occupied加一
(5) 將方法緩存到bucket中

5:調用該方法。