Read the fucking source code!
--By 魯迅A picture is worth a thousand words.
--By 高爾基說明:緩存
本文將描述memory compaction
,內存碎片整理技術。
內存碎片分爲內碎片和外碎片:數據結構
memory compaction
就是經過將正在使用的可移動頁面遷移到另外一個地方以得到連續的空閒頁面的方法。針對內存碎片,內核中定義了migrate_type
用於描述遷移類型:app
MIGRATE_UNMOVABLE
:不可移動,對應於內核分配的頁面;MIGRATE_MOVABLE
:可移動,對應於從用戶空間分配的內存或文件;MIGRATE_RECLAIMABLE
:不可移動,能夠進行回收處理;先來一張memory compaction
的概況圖:
異步
上圖對應的是struct page
的操做,而針對物理內存的操做以下圖所示:
ide
在以前的文章中提到過pageblock
,咱們看到圖中zone
區域是以pageblock
爲單位上下掃描的,pageblock
的大小定義以下(未使用huge table
狀況下),與Buddy System管理中的最大塊大小一致:函數
/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */ #define pageblock_order (MAX_ORDER-1) #define pageblock_nr_pages (1UL << pageblock_order)
好了,已經有一個初步印象了,那就進一步的分析吧。工具
compact_priority
/* * Determines how hard direct compaction should try to succeed. * Lower value means higher priority, analogically to reclaim priority. */ enum compact_priority { COMPACT_PRIO_SYNC_FULL, MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_FULL, COMPACT_PRIO_SYNC_LIGHT, MIN_COMPACT_COSTLY_PRIORITY = COMPACT_PRIO_SYNC_LIGHT, DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT, COMPACT_PRIO_ASYNC, INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC };
本結構用於描述memory compact
的幾種不一樣方式:ui
COMPACT_PRIO_SYNC_FULL/MIN_COMPACT_PRIORITY
:最高優先級,壓縮和遷移以同步的方式完成;COMPACT_PRIO_SYNC_LIGHT/MIN_COMPACT_COSTLY_PRIORITY/DEF_COMPACT_PRIORITY
:中優先級,壓縮以同步方式處理,遷移以異步方式處理;COMPACT_PRIO_ASYNC/INIT_COMPACT_PRIORITY
:最低優先級,壓縮和遷移以異步方式處理。compact_result
本結構用於描述壓縮處理函數的返回值:this
/* Return values for compact_zone() and try_to_compact_pages() */ /* When adding new states, please adjust include/trace/events/compaction.h */ enum compact_result { /* For more detailed tracepoint output - internal to compaction */ COMPACT_NOT_SUITABLE_ZONE, /* * compaction didn't start as it was not possible or direct reclaim * was more suitable */ COMPACT_SKIPPED, /* compaction didn't start as it was deferred due to past failures */ COMPACT_DEFERRED, /* compaction not active last round */ COMPACT_INACTIVE = COMPACT_DEFERRED, /* For more detailed tracepoint output - internal to compaction */ COMPACT_NO_SUITABLE_PAGE, /* compaction should continue to another pageblock */ COMPACT_CONTINUE, /* * The full zone was compacted scanned but wasn't successfull to compact * suitable pages. */ COMPACT_COMPLETE, /* * direct compaction has scanned part of the zone but wasn't successfull * to compact suitable pages. */ COMPACT_PARTIAL_SKIPPED, /* compaction terminated prematurely due to lock contentions */ COMPACT_CONTENDED, /* * direct compaction terminated after concluding that the allocation * should now succeed */ COMPACT_SUCCESS, };
migrate_mode
本結構用於描述migrate
過程當中的不一樣模式,主要針對同步和異步的處理。線程
/* * MIGRATE_ASYNC means never block * MIGRATE_SYNC_LIGHT in the current implementation means to allow blocking * on most operations but not ->writepage as the potential stall time * is too significant * MIGRATE_SYNC will block when migrating pages * MIGRATE_SYNC_NO_COPY will block when migrating pages but will not copy pages * with the CPU. Instead, page copy happens outside the migratepage() * callback and is likely using a DMA engine. See migrate_vma() and HMM * (mm/hmm.c) for users of this mode. */ enum migrate_mode { MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC, MIGRATE_SYNC_NO_COPY, };
compact_control
compact_control
結構體用於在執行compact
的時候,維護兩個掃描器,對應freepages
和migratepages
,最終將migratepages
中的頁拷貝到freepages
中去。具體的字段註釋足夠詳盡,不細說了。
/* * compact_control is used to track pages being migrated and the free pages * they are being migrated to during memory compaction. The free_pfn starts * at the end of a zone and migrate_pfn begins at the start. Movable pages * are moved to the end of a zone during a compaction run and the run * completes when free_pfn <= migrate_pfn */ struct compact_control { struct list_head freepages; /* List of free pages to migrate to */ struct list_head migratepages; /* List of pages being migrated */ struct zone *zone; unsigned long nr_freepages; /* Number of isolated free pages */ unsigned long nr_migratepages; /* Number of pages to migrate */ unsigned long total_migrate_scanned; unsigned long total_free_scanned; unsigned long free_pfn; /* isolate_freepages search base */ unsigned long migrate_pfn; /* isolate_migratepages search base */ unsigned long last_migrated_pfn;/* Not yet flushed page being freed */ const gfp_t gfp_mask; /* gfp mask of a direct compactor */ int order; /* order a direct compactor needs */ int migratetype; /* migratetype of direct compactor */ const unsigned int alloc_flags; /* alloc flags of a direct compactor */ const int classzone_idx; /* zone index of a direct compactor */ enum migrate_mode mode; /* Async or sync migration mode */ bool ignore_skip_hint; /* Scan blocks even if marked skip */ bool ignore_block_suitable; /* Scan blocks considered unsuitable */ bool direct_compaction; /* False from kcompactd or /proc/... */ bool whole_zone; /* Whole zone should/has been scanned */ bool contended; /* Signal lock or sched contention */ bool finishing_block; /* Finishing current pageblock */ };
光看上文的數據結構,會比較零散,看看總體的流程吧。
在內核中,有三種方式來操做memory compact
:
compact
處理;kcompactd
守護線程在後臺喚醒,執行compact
處理;echo 1 > /proc/sys/vm/compact_memory
來觸發;圖來了:
實際操做一把:
cat /proc/pagetypeinfo
以下圖:
compact
處理這個處理的過程仍是很複雜的,下圖顯示了大概的過程:
下邊將針對各個子模塊更深刻點分析。
compaction_suitable
判斷是否執行內存的碎片整理,須要知足如下三個條件:
order
大於PAGE_ALLOC_COSTLY_ORDER
時,計算碎片指數fragindex
,根據值來判斷;isolate_migratepages
isolate_migratepages
函數中,遷移掃描器以pageblock
爲單位,掃描可移動頁,最終把可移動的頁添加到struct compact_control
結構中的migratepages
鏈表中。以下圖所示:isolate_freepages
的邏輯與isolate_migratepages
相似,也是對頁進行隔離處理,最終添加cc->freepages
鏈表中。
當空閒掃描器和遷移掃描器完成掃描以後,那就是時候將兩個鏈表中的頁作一下migrate
操做了。
migrate_pages
compact_alloc
函數,從cc->freepages
鏈表中取出一個空閒頁;__unmap_and_move
來把可移動頁移動到空閒頁處;_unmap_and_move
函數涉及到反向映射,以及頁緩存等,留在之後再深刻看。這個函數兩個關鍵做用:1)調用try_to_unmap
刪除進程頁表中舊的映射關係,在須要訪問的時候再從新映射到新的物理地址上;2)調用move_to_new_page
函數將舊頁移動到新的物理頁上,其中在彙編文件arch/arm64/lib/copy_page.S
中copy_page
函數完成拷貝。compact_finished
compact_finished
函數主要用於檢查compact
是否完成。
compaction_deferred/compaction_defer_reset/defer_compaction
compact
有關,這三個函數是在try_to_compact_pages
中調用。當free pages除去申請頁面數高於水位值,且申請或備用的遷移類型至少有一個足夠大的空閒頁面時,能夠認爲compact
成功。在沒有成功時,可能須要推遲幾回來處理。struct zone
結構中與之有關的字段以下:struct zone { ... /* * On compaction failure, 1<<compact_defer_shift compactions * are skipped before trying again. The number attempted since * last failure is tracked with compact_considered. */ unsigned int compact_considered; //記錄推遲次數 unsigned int compact_defer_shift; //(1 << compact_defer_shift)=推遲次數,最大爲6 int compact_order_failed; //記錄碎片整理失敗時的申請order值 ... };