Linux內核之頁面換出詳解

kswap線程主要用於頁面的按期換出,接下來講說kswap線程的實現node

首先kswap線程的初始化時,須要根據物理內存的大小設置一個page_cluster變量的值,這個值表示預讀數目
(好比原本只讀一個頁面,預讀3個,就會一次性讀取3個頁面,這樣根據訪問局部性原理有利於提升速度)
kswap是一個線程共享內核的內存空間,建立使用kernel_thread建立
kswap線程首先調用inactive_shortage()檢查整個系統物理頁面是否短缺.
系統物理頁面的最低底線值由freepages.high(空閒頁面的數量),inactive_targe(不活躍頁面的數目)提供
而系統物理頁面的實際可用物理頁面由三部分組成,分別是
空閒頁面(當即可分配,來自於各個zone),其數目由nr_free_pages()統計提供
不活躍乾淨頁面(本質是能夠分配的頁面,但其頁面還存在內容(在swap緩存),多保留這樣的頁面有利於減小從swap設備讀入,提供速度),其數量由nr_inactive_clean_pages記錄
不活躍髒頁面(須要寫入交換設備後,才能被分配的),由nr_inactive_dirty_pages記錄
  1. int inactive_shortage(void)
  2. {
  3. int shortage = 0;
  4. //系統應該維持的物理內存由xxxhigh跟target維持
  5. //實際的由下面3個函數統計,若是無法知足那就返回正數
  6. shortage += freepages.high;
  7. shortage += inactive_target;
  8. shortage -= nr_free_pages();
  9. shortage -= nr_inactive_clean_pages();
  10. shortage -= nr_inactive_dirty_pages;
  11. if (shortage > 0)
  12. return shortage;
  13. return 0;
  14. }
即便以上條件知足(及實際頁面數目高於底線數目),還須要調用free_shortage()檢查各個管理區是否頁面很是短缺.
統計管理區的實際的頁面是否知足管理區的水準,若是不知足,則返回差值..
  1. /*
  2. * Check if there are zones with a severe shortage of free pages,
  3. * or if all zones have a minor shortage.
  4. */
  5. int free_shortage(void)
  6. {
  7. pg_data_t *pgdat = pgdat_list;//節點
  8. int sum = 0;
  9. int freeable = nr_free_pages() + nr_inactive_clean_pages();//實際空閒
  10. int freetarget = freepages.high + inactive_target / 3;//理論空閒
  11. //實際小於理論,直接返回差值,表示須要擴充
  12. /* Are we low on free pages globally? */
  13. if (freeable < freetarget)
  14. return freetarget - freeable;
  15. /* If not, are we very low on any particular zone? */
  16. do {
  17. int i;
  18. for(i = 0; i < MAX_NR_ZONES; i++) {
  19. zone_t *zone = pgdat->node_zones+ i;//獲取管理區
  20. if (zone->size && (zone->inactive_clean_pages +
  21. zone->free_pages < zone->pages_min+1)) {//空閒頁面+乾淨不活躍頁面是否小於最低水準
  22. /* + 1 to have overlap with alloc_pages() !! */
  23. sum += zone->pages_min + 1;
  24. sum -= zone->free_pages;
  25. sum -= zone->inactive_clean_pages;
  26. }
  27. }
  28. pgdat = pgdat->node_next;
  29. } while (pgdat);
  30. return sum;
  31. }
以上兩個條件都知足,那麼將調用refill_inactive_scan函數,試圖將一些活躍頁面(沒有用戶映射)轉換爲非活躍髒頁面,
據priority的值掃描活躍隊列一部分頁面,priority爲0時才所有掃描,另外判斷頁面是否最近受到訪問,收到了就增長age值,不然減小age值
(關於age值,age爲0才考慮是否移到不活躍隊列中),接着判斷頁面age是否等於0而且判斷頁面是否有用戶進程映射(頁面分配時count設置爲1,
當作讀寫緩衝時+1,每當一個進程映射到這頁面時+1,因此須要判斷該頁面是佛屬於緩衝頁面(讀/寫),若是age=0而且沒有用戶映射,那就
調用deactivate_page_nolock()函數,將頁面的age設置爲0,清除頁面最近訪問標誌,並從活躍頁面隊列轉移到非活躍髒隊列,
固然若是頁面仍是活躍的就放入活躍隊列尾.
  1. /**
  2. * refill_inactive_scan - scan the active list and find pages to deactivate
  3. * @priority: the priority at which to scan
  4. * @oneshot: exit after deactivating one page
  5. *
  6. * This function will scan a portion of the active list to find
  7. * unused pages, those pages will then be moved to the inactive list.
  8. *///據priority的值掃描隊列一部分頁面,priority爲0時才所有掃描
  9. int refill_inactive_scan(unsigned int priority, int oneshot)
  10. {
  11. struct list_head * page_lru;
  12. struct page * page;
  13. int maxscan, page_active = 0;//maxscan控制掃描頁面數目
  14. int ret = 0;
  15. /* Take the lock while messing with the list... */
  16. spin_lock(&pagemap_lru_lock);
  17. maxscan = nr_active_pages >> priority;
  18. while (maxscan-- > 0 && (page_lru = active_list.prev) != &active_list) {
  19. page = list_entry(page_lru, struct page, lru);
  20. /* Wrong page on list?! (list corruption, should not happen) */
  21. if (!PageActive(page)) {//掃描的頁面必須是在活躍隊列中
  22. printk("VM: refill_inactive, wrong page on list.\n");
  23. list_del(page_lru);
  24. nr_active_pages--;
  25. continue;
  26. }
  27. /* 判斷頁面是否受到訪問,,決定增長或減小壽命,若是減小壽命到0,那說明此頁面好久都沒訪問了Do aging on the pages. */
  28. if (PageTestandClearReferenced(page)) {
  29. age_page_up_nolock(page);
  30. page_active = 1;
  31. } else {
  32. age_page_down_ageonly(page);
  33. /*
  34. * Since we don't hold a reference on the page
  35. * ourselves, we have to do our test a bit more
  36. * strict then deactivate_page(). This is needed
  37. * since otherwise the system could hang shuffling
  38. * unfreeable pages from the active list to the
  39. * inactive_dirty list and back again...
  40. *
  41. * SUBTLE: we can have buffer pages with count 1.
  42. *///緩衝頁面若是引用計數大於1,說明還要用戶空間映射,不能轉爲不活躍頁面
  43. if (page->age == 0 && page_count(page) <=
  44. (page->buffers ? 2 : 1)) {
  45. deactivate_page_nolock(page);
  46. page_active = 0;
  47. } else {
  48. page_active = 1;
  49. }
  50. }
  51. /*
  52. * If the page is still on the active list, move it
  53. * to the other end of the list. Otherwise it was
  54. * deactivated by age_page_down and we exit successfully.
  55. */
  56. if (page_active || PageActive(page)) {
  57. list_del(page_lru);//若是頁面仍是活躍的,就放入活躍尾部
  58. list_add(page_lru, &active_list);
  59. } else {
  60. ret = 1;
  61. if (oneshot)//根據oneshot參數選擇是否繼續掃描一次
  62. break;
  63. }
  64. }
  65. spin_unlock(&pagemap_lru_lock);
  66. return ret;
  67. }
上面是kswap檢測了系統物理內存是夠了而且管理區物理頁面也夠了的操做,kswap線程是一個死循環,完成上述操做,再次判斷頁面是否短缺或管理區短缺,若是不短缺就調用interruptibale_sleep_on_timeon()進入睡眠,讓內核自由調度其餘進程運行,而後在內核運行必定時間(HZ本身能夠定義)後,又喚醒kswap繼續重複操做
 
2.若是判斷出系統內存不足或者管理區頁面不足則調用do_try_free_pages()試圖騰出一些內存頁面來
 1.若是頁面緊缺,或者髒的不活躍頁面的數量大於空閒頁面跟不活躍乾淨頁面的數目就須要調用page_launder試圖把不活躍狀態的髒頁面洗淨,使得它們成爲馬上可分配的頁面,
若是通過page_launder()後,系統頁面依舊緊缺,釋放dentry目錄項跟inode數據結構的緩存,通常而言即便關閉這些,頁面也不會馬上釋放而是保存到lru隊列做爲後備,不然若是頁面不緊缺了,就只調用 kmem_cache_reap回收一部分slab緩存
 
  1. static int do_try_to_free_pages(unsigned int gfp_mask, int user)
  2. {
  3. int ret = 0;
  4. /*
  5. 若是頁面緊缺,或者髒的不活躍頁面的數量大於空閒頁面跟不活躍乾淨頁面的數目
  6. 就須要調用page_launder試圖把不活躍狀態的髒頁面洗淨,使得它們成爲馬上可分配的
  7. 頁面
  8. */
  9. if (free_shortage() || nr_inactive_dirty_pages > nr_free_pages() +
  10. nr_inactive_clean_pages())
  11. ret += page_launder(gfp_mask, user);
  12. /*若是內存依舊緊缺
  13. * If needed, we move pages from the active list
  14. * to the inactive list. We also "eat" pages from
  15. * the inode and dentry cache whenever we do this.
  16. *///釋放dentry目錄項跟inode數據結構的緩存,即便關閉這些,頁面也不會馬上釋放
  17. //而是保存到lru隊列做爲後備
  18. if (free_shortage() || inactive_shortage()) {
  19. shrink_dcache_memory(6, gfp_mask);//釋放dentry目錄項緩存
  20. shrink_icache_memory(6, gfp_mask);//釋放inode緩存
  21. ret += refill_inactive(gfp_mask, user);//user表示是否有等待隊列的進程
  22. } else {
  23. /*
  24. * 不然回收slab緩存
  25. */
  26. kmem_cache_reap(gfp_mask);
  27. ret = 1;
  28. }
  29. return ret;
  30. }
 
以上是大致流程,接下來分析do_try_free_pages中的 page_launder()函數
做用是把不活躍狀態的髒頁面洗淨.
從不活躍髒頁面隊列取出每一個頁,判斷是否最近受到訪問(雖然是髒頁面隊列仍是有可能會受到訪問的,因此須要判斷,若是受到了訪問,那就移入活躍隊列,
頁面依舊是髒頁面,判斷是不是第一輪掃描,是的話放入隊尾而後繼續循環,不然若是是第二輪循環(固然有條件的,就是空閒頁面是否短缺),那就清除髒位,同時調用address_space提供的相關寫到swap設備的函數進行寫入.
若是頁面再也不是髒的了但做用於緩存,先把該頁面脫離髒隊列,再調用try_to_free_buffers()後,count值減一
,若是失敗了,那就轉入活躍隊列或者不活躍乾淨頁面,接着判斷 判斷該頁面是否有映射,不是的話,那就釋放該頁面,或者判斷是否還有用戶進程映射,若是有,那就轉移到活躍隊列中,不然那就是雖然此頁面曾經是映射頁面,但沒有用戶映射了,那就也釋放該頁面,(注:前面的釋放,只是設置標誌位.須要再通過page_cache_release()使其count減爲0,那就頁面進入了空閒頁面隊列了,接着判斷是否釋放了一個頁面後系統再也不短缺,那就跳出循環,結束清洗,不然
判斷頁面是不是乾淨頁面而且是以前映射過的頁面那就轉移到不活躍乾淨隊列中.
完成一趟掃描後,判斷是否頁面緊缺,若是依舊緊缺就第二輪掃描了
  1. int page_launder(int gfp_mask, int sync)
  2. {
  3. int launder_loop, maxscan, cleaned_pages, maxlaunder;
  4. int can_get_io_locks;
  5. struct list_head * page_lru;
  6. struct page * page;
  7. /*
  8. * We can only grab the IO locks (eg. for flushing dirty
  9. * buffers to disk) if __GFP_IO is set.
  10. */
  11. can_get_io_locks = gfp_mask & __GFP_IO;
  12. launder_loop = 0;
  13. maxlaunder = 0;
  14. cleaned_pages = 0;
  15. dirty_page_rescan:
  16. spin_lock(&pagemap_lru_lock);
  17. maxscan = nr_inactive_dirty_pages;//避免重複處理同一頁面,設定的變量
  18. //對不活躍髒頁面隊列掃描
  19. while ((page_lru = inactive_dirty_list.prev) != &inactive_dirty_list &&
  20. maxscan-- > 0) {
  21. page = list_entry(page_lru, struct page, lru);
  22. /* Wrong page on list?! (list corruption, should not happen) */
  23. if (!PageInactiveDirty(page)) {檢查其標誌是否爲1
  24. printk("VM: page_launder, wrong page on list.\n");
  25. list_del(page_lru);//從隊列中刪除
  26. nr_inactive_dirty_pages--;
  27. page->zone->inactive_dirty_pages--;
  28. continue;
  29. }
  30. /* 到了髒隊列,因爲可能受到訪問,就會放入活躍頁面隊列Page is or was in use? Move it to the active list. */
  31. if (PageTestandClearReferenced(page) || page->age > 0 ||
  32. (!page->buffers && page_count(page) > 1) ||
  33. page_ramdisk(page)) {
  34. del_page_from_inactive_dirty_list(page);//刪除非活躍隊列
  35. add_page_to_active_list(page);//加入到活躍隊列中
  36. continue;
  37. }
  38. /*頁面是否被鎖住,是的話表示把它移到隊列尾部
  39. * The page is locked. IO in progress?
  40. * Move it to the back of the list.
  41. */
  42. if (TryLockPage(page)) {
  43. list_del(page_lru);
  44. list_add(page_lru, &inactive_dirty_list);
  45. continue;
  46. }
  47. /*
  48. * Dirty swap-cache page? Write it out if
  49. * last copy..
  50. */
  51. if (PageDirty(page)) {//是髒頁面
  52. int (*writepage)(struct page *) = page->mapping->a_ops->writepage;
  53. int result;
  54. if (!writepage)//若是沒有提供具體寫swp的函數,則放入活躍隊列中
  55. goto page_active;
  56. /*判斷是不是第一次掃描,是的話就移到隊列尾部,繼續 First time through? Move it to the back of the list */
  57. if (!launder_loop) {
  58. list_del(page_lru);
  59. list_add(page_lru, &inactive_dirty_list);
  60. UnlockPage(page);
  61. continue;
  62. }
  63. /* OK, do a physical asynchronous write to swap. */
  64. ClearPageDirty(page);//清除page結構的_dirty位,防止再次寫入
  65. page_cache_get(page);//增長page->count表示多了一個用戶操做此
  66. //頁面,由於kswap線程把這個頁面寫出到swp設備中
  67. spin_unlock(&pagemap_lru_lock);
  68. result = writepage(page);
  69. page_cache_release(page);//count--完成了寫入操做
  70. //因此就用戶--了
  71. /* And re-start the thing.. */
  72. spin_lock(&pagemap_lru_lock);
  73. if (result != 1)//寫入失敗的話
  74. continue;
  75. /* writepage refused to do anything */
  76. set_page_dirty(page);//又設置爲髒頁
  77. goto page_active;
  78. }
  79. /*
  80. * 若是頁面不是髒的而後又是用於緩存文件讀寫的頁面
  81. */
  82. if (page->buffers) {
  83. int wait, clearedbuf;
  84. int freed_page = 0;
  85. /*
  86. * Since we might be doing disk IO, we have to
  87. * drop the spinlock and take an extra reference
  88. * on the page so it doesn't go away from under us.
  89. */
  90. del_page_from_inactive_dirty_list(page);//脫離髒隊列
  91. page_cache_get(page);//表示kswap進程須要做用於page,count++
  92. spin_unlock(&pagemap_lru_lock);
  93. /* Will we do (asynchronous) IO? */
  94. if (launder_loop && maxlaunder == 0 && sync)
  95. wait = 2; /* Synchrounous IO */
  96. else if (launder_loop && maxlaunder-- > 0)
  97. wait = 1; /* Async IO */
  98. else
  99. wait = 0; /* No IO */
  100. /*試圖將頁面釋放,這裏是count減一 Try to free the page buffers. */
  101. clearedbuf = try_to_free_buffers(page, wait);
  102. /*
  103. * Re-take the spinlock. Note that we cannot
  104. * unlock the page yet since we're still
  105. * accessing the page_struct here...
  106. */
  107. spin_lock(&pagemap_lru_lock);
  108. /* 不能釋放或者說釋放失敗繼續放入髒隊列The buffers were not freed. */
  109. if (!clearedbuf) {
  110. add_page_to_inactive_dirty_list(page);
  111. /*/*頁面只在buffer cache隊列中,而不在某個文件的inode->i_mapping中,這樣的頁有超級塊,索引節點位圖等等,它們不屬於某個文件,所以咱們就成功釋放了一個頁面*/ 
    若是該頁面只用於緩存,而非映射The page was only in the buffer cache. */
  112. } else if (!page->mapping) {
  113. atomic_dec(&buffermem_pages);
  114. freed_page = 1;
  115. cleaned_pages++;
  116. /* *不然這個頁面還在某個文件的inode->i_mapping中,而且還有超過2個用戶(the cache and us)在訪問它,例若有多個進程映射到該文件若是該頁有幾個用戶,加入到活躍隊列中The page has more users besides the cache and us. */
  117. } else if (page_count(page) > 2) {
  118. add_page_to_active_list(page);
  119. /* 最後,只剩下page->mapping && page_count(page) == 2,說明雖然這個頁面還在某個inode->i_mapping中,可是已經沒有任何用戶在訪問他們了,所以能夠釋放該頁面OK, we "created" a freeable page. */
  120. } else /* page->mapping && page_count(page) == 2 */ {
  121. add_page_to_inactive_clean_list(page);
  122. cleaned_pages++;
  123. }
  124. /*
  125. * Unlock the page and drop the extra reference.
  126. * We can only do it here because we ar accessing
  127. * the page struct above.
  128. */
  129. UnlockPage(page);
  130. page_cache_release(page);//最終釋放頁面到空閒隊列緩存中
  131. /*
  132. * If we're freeing buffer cache pages, stop when
  133. * we've got enough free memory.
  134. 釋放了一個頁面,而且系統內存再也不緊缺,那就中止
  135. */
  136. if (freed_page && !free_shortage())
  137. break;
  138. continue;//頁面再也不是髒頁面,而且屬於address_space紅
  139. } else if (page->mapping && !PageDirty(page)) {
  140. /*
  141. * If a page had an extra reference in
  142. * deactivate_page(), we will find it here.
  143. * Now the page is really freeable, so we
  144. * move it to the inactive_clean list.
  145. */
  146. del_page_from_inactive_dirty_list(page);//轉移到不活躍隊列中
  147. add_page_to_inactive_clean_list(page);
  148. UnlockPage(page);
  149. cleaned_pages++;
  150. } else {
  151. page_active:
  152. /*
  153. * OK, we don't know what to do with the page.
  154. * It's no use keeping it here, so we move it to
  155. * the active list.
  156. */
  157. del_page_from_inactive_dirty_list(page);
  158. add_page_to_active_list(page);
  159. UnlockPage(page);
  160. }
  161. }
  162. spin_unlock(&pagemap_lru_lock);
  163. /*
  164. * If we don't have enough free pages, we loop back once
  165. * to queue the dirty pages for writeout. When we were called
  166. * by a user process (that /needs/ a free page) and we didn't
  167. * free anything yet, we wait synchronously on the writeout of
  168. * MAX_SYNC_LAUNDER pages.
  169. *
  170. * We also wake up bdflush, since bdflush should, under most
  171. * loads, flush out the dirty pages before we have to wait on
  172. * IO.
  173. *///若是內存繼續緊缺,那就二次掃描一趟
  174. if (can_get_io_locks && !launder_loop && free_shortage()) {
  175. launder_loop = 1;
  176. /* If we cleaned pages, never do synchronous IO. */
  177. if (cleaned_pages)
  178. sync = 0;
  179. /* We only do a few "out of order" flushes. */
  180. maxlaunder = MAX_LAUNDER;
  181. /* Kflushd takes care of the rest. */
  182. wakeup_bdflush(0);
  183. goto dirty_page_rescan;
  184. }
  185. /* Return the number of pages moved to the inactive_clean list. */
  186. return cleaned_pages;//返回有多少頁面被移到不活躍乾淨頁面中
  187. }
若是通過page_launder後,頁面也就緊缺,那就調用shrink_dcache_memory跟shrink_icache_memory
函數分別釋放釋放dentry目錄項緩存跟釋放inode緩存,而且調用refill_inactive函數進一步回收,不然若是
頁面充裕,那就只調用kmem_cache_reap回收slab緩存
 
接下來分析refill_inactive函數.
首先判斷系統還須要多少頁面,接着回收slab緩存,而後一個do_while循環,從優先級最低的6開始,加大力度到0.
其循環調用了refill_active_scan(上面已經分析了)試圖將一部分活躍頁面轉移到非活躍髒頁面隊列,
接着調用shrink_dcache_memory跟shrink_icache_memory,函數分別釋放釋放dentry目錄項緩存跟釋放inode緩存,
接着根據count的數目屢次調用swap_out函數 試圖找出一個進程,掃描其映射表,找到能夠轉入不活躍狀態頁面,最後根據count的數目屢次調用refill_active_scan再次掃描就結束了
  1. /*
  2. * We need to make the locks finer granularity, but right
  3. * now we need this so that we can do page allocations
  4. * without holding the kernel lock etc.
  5. *
  6. * We want to try to free "count" pages, and we want to
  7. * cluster them so that we get good swap-out behaviour.
  8. *
  9. * OTOH, if we're a user process (and not kswapd), we
  10. * really care about latency. In that case we don't try
  11. * to free too many pages.
  12. */
  13. static int refill_inactive(unsigned int gfp_mask, int user)
  14. {
  15. int priority, count, start_count, made_progress;
  16. count = inactive_shortage() + free_shortage();//獲取須要的頁面數目
  17. if (user)
  18. count = (1 << page_cluster);
  19. start_count = count;
  20. /* 任什麼時候候,當頁面緊缺時,從slab開始回收Always trim SLAB caches when memory gets low. */
  21. kmem_cache_reap(gfp_mask);
  22. priority = 6;//從最低優先級別6開始
  23. do {
  24. made_progress = 0;
  25. //每次循環都要檢查下當前進程是否被設置被調度,設置了,說明某個中斷程序須要調度
  26. if (current->need_resched) {
  27. __set_current_state(TASK_RUNNING);
  28. schedule();
  29. }
  30. //掃描活躍頁面隊列,試圖從中找出能夠轉入不活躍狀態頁面
  31. while (refill_inactive_scan(priority, 1)) {
  32. made_progress = 1;
  33. if (--count <= 0)
  34. goto done;
  35. }
  36. /*
  37. * don't be too light against the d/i cache since
  38. * refill_inactive() almost never fail when there's
  39. * really plenty of memory free.
  40. */
  41. shrink_dcache_memory(priority, gfp_mask);
  42. shrink_icache_memory(priority, gfp_mask);
  43. /*試圖找出一個進程,掃描其映射表,找到能夠轉入不活躍狀態頁面
  44. * Then, try to page stuff out..
  45. */
  46. while (swap_out(priority, gfp_mask)) {
  47. made_progress = 1;
  48. if (--count <= 0)
  49. goto done;
  50. }
  51. /*
  52. * If we either have enough free memory, or if
  53. * page_launder() will be able to make enough
  54. * free memory, then stop.
  55. */
  56. if (!inactive_shortage() || !free_shortage())
  57. goto done;
  58. /*
  59. * Only switch to a lower "priority" if we
  60. * didn't make any useful progress in the
  61. * last loop.
  62. */
  63. if (!made_progress)
  64. priority--;
  65. } while (priority >= 0);
  66. /* Always end on a refill_inactive.., may sleep... */
  67. while (refill_inactive_scan(0, 1)) {
  68. if (--count <= 0)
  69. goto done;
  70. }
  71. done:
  72. return (count < start_count);
  73. }
接着看看swap_out函數的實現
根據內核中進程的個數跟調用swap_out的優先級計算獲得的counter.counter表示循環次數,每次循環的任務從全部進程中找出最合適的進程best,斷開頁面印射,進一步轉換成不活躍狀態,最合適的準則是"劫富濟貧「和」輪流坐莊「的結合
 
  1. static int swap_out(unsigned int priority, int gfp_mask)
  2. {
  3. int counter;//循環次數
  4. int __ret = 0;
  5. /*
  6. * We make one or two passes through the task list, indexed by
  7. * assign = {0, 1}:
  8. * Pass 1: select the swappable task with maximal RSS that has
  9. * not yet been swapped out.
  10. * Pass 2: re-assign rss swap_cnt values, then select as above.
  11. *
  12. * With this approach, there's no need to remember the last task
  13. * swapped out. If the swap-out fails, we clear swap_cnt so the
  14. * task won't be selected again until all others have been tried.
  15. *
  16. * Think of swap_cnt as a "shadow rss" - it tells us which process
  17. * we want to page out (always try largest first).
  18. *///根據內核中進程的個數跟調用swap_out的優先級計算獲得的
  19. counter = (nr_threads << SWAP_SHIFT) >> priority;
  20. if (counter < 1)
  21. counter = 1;
  22. for (; counter >= 0; counter--) {
  23. struct list_head *p;
  24. unsigned long max_cnt = 0;
  25. struct mm_struct *best = NULL;
  26. int assign = 0;
  27. int found_task = 0;
  28. select:
  29. spin_lock(&mmlist_lock);
  30. p = init_mm.mmlist.next;
  31. for (; p != &init_mm.mmlist; p = p->next) {
  32. struct mm_struct *mm = list_entry(p, struct mm_struct, mmlist);
  33. if (mm->rss <= 0)
  34. continue;
  35. found_task++;
  36. /* Refresh swap_cnt? */
  37. if (assign == 1) {////增長這層判斷目的是,但咱們找不到mm->swap_cnt不爲0的mm時候,
  38. 咱們就會設置assign=1,而後再重新掃描一遍,這次就會直接把內存頁面數量賦值給還沒有考察頁面數量,
  39. 從而重新刷新一次,這樣咱們就會從最富有的進程開始下手,mm->swap_cnt用於保證咱們所說的輪流坐莊,
  40. mm->rss則是保證劫富濟貧第二輪循環,將mm->rss拷貝到mm_swap_cnt,從最大的開始繼續
  41. mm->swap_cnt = (mm->rss >> SWAP_SHIFT);//記錄一次輪換中還沒有內存頁面還沒有考察的數量
  42. if (mm->swap_cnt < SWAP_MIN)
  43. mm->swap_cnt = SWAP_MIN;
  44. }
  45. if (mm->swap_cnt > max_cnt) {
  46. max_cnt = mm->swap_cnt;
  47. best = mm;
  48. }
  49. }///從循環退出來,咱們就找到了最大的mm->swap_cnt的mm
  50. /* Make sure it doesn't disappear */
  51. if (best)
  52. atomic_inc(&best->mm_users);
  53. spin_unlock(&mmlist_lock);
  54. /*
  55. * We have dropped the tasklist_lock, but we
  56. * know that "mm" still exists: we are running
  57. * with the big kernel lock, and exit_mm()
  58. * cannot race with us.
  59. */
  60. if (!best) {
  61. if (!assign && found_task > 0) {//第一次進入,表示全部進程mm->swap_cnt都爲0,第2次不會再進入了,通常不會出現第2次
  62. assign = 1;//第二輪循環
  63. goto select;
  64. }
  65. break;
  66. } else {//掃出一個最佳換出的進程,調用swap_out_mm
  67. __ret = swap_out_mm(best, gfp_mask);
  68. mmput(best);
  69. break;
  70. }
  71. }
  72. return __ret;
  73. }
swap_out_vma會調用關係swap_out_vma()>swap_out_pgd()>swap_out_pmd()>try_to_swap_out() static int try_to_swap_out()(struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, pte_t * page_table, int gfp_mask){//page_table指向頁面表項,不是頁面表到了try_to_swap_out()這個是很是關鍵的..因此本身主要分析try_to_swap_out()函數的實現,
一開始判斷準備換出的頁的合法性,判斷是否訪問過,是的話增長其age,即便不在活躍隊列,並且最近沒有訪問,還不能馬上換出,而要保留觀察,直到其
page->age等於0爲止,若是page->age等於0了,又經過了上面的測試,清除其頁表項設置爲0,接着判斷該頁是否已經在swap緩存中,若是存在就判斷是否最近寫過,若是是,那就設置該頁爲髒頁,同時轉移到不活躍髒隊列中,而且釋放頁面的緩存.
若是頁面不是髒頁面也不在swap緩存中,那就直接把映射解除而不是暫時斷開.若是頁面來自於mmap映射也不在swap緩存中,把頁面設置爲髒頁面,而且轉移到該文件映射的髒頁面隊列中.
若是頁面是髒頁面又不屬於文件映射也不在swap緩存,那就說明該頁面好久都沒訪問了,那就必須先分配一個swap設備的磁盤頁面,將其內容寫入該磁盤頁面.
同時經過add_swap_cache將頁面鏈入swapper_space的隊列中跟活躍頁面隊列中.
至此,對一個進程的空間頁面的掃描就OK了
  1. /*
  2. * The swap-out functions return 1 if they successfully
  3. * threw something out, and we got a free page. It returns
  4. * zero if it couldn't do anything, and any other value
  5. * indicates it decreased rss, but the page was shared.
  6. *
  7. * NOTE! If it sleeps, it *must* return 1 to make sure we
  8. * don't continue with the swap-out. Otherwise we may be
  9. * using a process that no longer actually exists (it might
  10. * have died while we slept).
  11. */
  12. static int try_to_swap_out(struct mm_struct * mm, struct vm_area_struct* vma, unsigned long address, pte_t * page_table, int gfp_mask)
  13. {
  14. pte_t pte;
  15. swp_entry_t entry;
  16. struct page * page;
  17. int onlist;
  18. pte = *page_table;//獲取頁表項
  19. if (!pte_present(pte))//是否存在物理內存中
  20. goto out_failed;
  21. page = pte_page(pte);//獲取具體的頁
  22. if ((!VALID_PAGE(page)) || PageReserved(page))//頁面不合法或者頁面不容許換出swap分區
  23. goto out_failed;
  24. if (!mm->swap_cnt)
  25. return 1;
  26. //須要具體的考察訪問一個頁面,swap_cnt減一
  27. mm->swap_cnt--;
  28. onlist = PageActive(page);//判斷是否活躍
  29. /* Don't look at this pte if it's been accessed recently. */
  30. if (ptep_test_and_clear_young(page_table)) {//測試頁面是否訪問過(訪問過說明年輕)
  31. age_page_up(page);//增長保留觀察時間
  32. goto out_failed;
  33. }
  34. if (!onlist)//即便不在活躍隊列,並且最近沒有訪問,還不能馬上換出,而要保留觀察,直到其
  35. //page->age等於0爲止
  36. age_page_down_ageonly(page);
  37. /*
  38. * If the page is in active use by us, or if the page
  39. * is in active use by others, don't unmap it or
  40. * (worse) start unneeded IO.
  41. */
  42. if (page->age > 0)
  43. goto out_failed;
  44. if (TryLockPage(page))
  45. goto out_failed;
  46. /* From this point on, the odds are that we're going to
  47. * nuke this pte, so read and clear the pte. This hook
  48. * is needed on CPUs which update the accessed and dirty
  49. * bits in hardware.
  50. *///把頁表項的內容清0(撤銷了映射)
  51. pte = ptep_get_and_clear(page_table);
  52. flush_tlb_page(vma, address);
  53. /*
  54. * Is the page already in the swap cache? If so, then
  55. * we can just drop our reference to it without doing
  56. * any IO - it's already up-to-date on disk.
  57. *
  58. * Return 0, as we didn't actually free any real
  59. * memory, and we should just continue our scan.
  60. */
  61. if (PageSwapCache(page)) {//判斷該頁是否已經在swap緩存中
  62. entry.val = page->index;
  63. if (pte_dirty(pte))
  64. set_page_dirty(page);//轉入髒頁面
  65. set_swap_pte:
  66. swap_duplicate(entry);//對index作一些印證
  67. set_pte(page_table, swp_entry_to_pte(entry));//設置pte爲swap的索引了,這樣完成了交換
  68. drop_pte:
  69. UnlockPage(page);
  70. mm->rss--;//物理頁面斷開的映射,因此rss--
  71. deactivate_page(page);//將其從活躍隊列移到不活躍隊列中
  72. page_cache_release(page);//釋放頁面緩存
  73. out_failed:
  74. return 0;
  75. }
  76. /*
  77. * Is it a clean page? Then it must be recoverable
  78. * by just paging it in again, and we can just drop
  79. * it..
  80. *
  81. * However, this won't actually free any real
  82. * memory, as the page will just be in the page cache
  83. * somewhere, and as such we should just continue
  84. * our scan.
  85. *
  86. * Basically, this just makes it possible for us to do
  87. * some real work in the future in "refill_inactive()".
  88. */
  89. flush_cache_page(vma, address);
  90. if (!pte_dirty(pte))
  91. goto drop_pte;
  92. /*
  93. * Ok, it's really dirty. That means that
  94. * we should either create a new swap cache
  95. * entry for it, or we should write it back
  96. * to its own backing store.
  97. */
  98. if (page->mapping) {
  99. set_page_dirty(page);
  100. goto drop_pte;
  101. }
  102. /*
  103. * This is a dirty, swappable page. First of all,
  104. * get a suitable swap entry for it, and make sure
  105. * we have the swap cache set up to associate the
  106. * page with that swap entry.
  107. */
  108. entry = get_swap_page();
  109. if (!entry.val)
  110. goto out_unlock_restore; /* No swap space left */
  111. /* Add it to the swap cache and mark it dirty */
  112. add_to_swap_cache(page, entry);
  113. set_page_dirty(page);
  114. goto set_swap_pte;
  115. out_unlock_restore:
  116. set_pte(page_table, pte);
  117. UnlockPage(page);
  118. return 0;
  119. }
相關文章
相關標籤/搜索