一,背景php
收到應用服務報警,而後登陸上服務器查看緣由,發現進程再也不了。java
二,問題分析node
1,那麼判斷進程被幹掉的緣由以下:python
(1),機器重啓了mysql
經過uptime看機器並未重啓linux
(2),程序有bug自動退出了nginx
經過查詢程序的error log,並未發現異常算法
(3),被別人幹掉了sql
因爲程序比較消耗內存,故猜測是否是oom了,被系統給幹掉了。因此查messages日誌,發現的確是oom了:vim
Jul 27 13:29:54 kernel: Out of memory: Kill process 17982 (java) score 77 or sacrifice child
2,經過oom詳細信息輸出分析被幹掉的具體緣由
[511250.458988] mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 [511250.458993] mysqld cpuset=/ mems_allowed=0 [511250.458996] CPU: 7 PID: 30063 Comm: mysqld Not tainted 3.10.0-514.21.2.el7.x86_64 #1 [511250.458997] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [511250.458999] ffff88056236bec0 0000000040f4df68 ffff88044b76b910 ffffffff81687073 [511250.459002] ffff88044b76b9a0 ffffffff8168201e ffffffff810eb0dc ffff88081ae80c20 [511250.459004] ffff88081ae80c38 ffff88044b76b9f8 ffff88056236bec0 0000000000000000 [511250.459007] Call Trace: [511250.459015] [<ffffffff81687073>] dump_stack+0x19/0x1b [511250.459020] [<ffffffff8168201e>] dump_header+0x8e/0x225 [511250.459026] [<ffffffff810eb0dc>] ? ktime_get_ts64+0x4c/0xf0 [511250.459033] [<ffffffff81184cfe>] oom_kill_process+0x24e/0x3c0 [511250.459035] [<ffffffff8118479d>] ? oom_unkillable_task+0xcd/0x120 [511250.459038] [<ffffffff81184846>] ? find_lock_task_mm+0x56/0xc0 [511250.459042] [<ffffffff81093c0e>] ? has_capability_noaudit+0x1e/0x30 [511250.459045] [<ffffffff81185536>] out_of_memory+0x4b6/0x4f0 [511250.459047] [<ffffffff81682b27>] __alloc_pages_slowpath+0x5d7/0x725 [511250.459051] [<ffffffff8118b645>] __alloc_pages_nodemask+0x405/0x420 [511250.459055] [<ffffffff811cf94a>] alloc_pages_current+0xaa/0x170 [511250.459058] [<ffffffff81180bd7>] __page_cache_alloc+0x97/0xb0 [511250.459060] [<ffffffff81183750>] filemap_fault+0x170/0x410 [511250.459078] [<ffffffffa01b5016>] ext4_filemap_fault+0x36/0x50 [ext4] [511250.459082] [<ffffffff811ac84c>] __do_fault+0x4c/0xc0 [511250.459084] [<ffffffff811acce3>] do_read_fault.isra.42+0x43/0x130 [511250.459087] [<ffffffff811b1471>] handle_mm_fault+0x6b1/0x1040 [511250.459091] [<ffffffff810f55c0>] ? futex_wake+0x80/0x160 [511250.459096] [<ffffffff81692c04>] __do_page_fault+0x154/0x450 [511250.459098] [<ffffffff81692fe6>] trace_do_page_fault+0x56/0x150 [511250.459101] [<ffffffff8169268b>] do_async_page_fault+0x1b/0xd0 [511250.459103] [<ffffffff8168f178>] async_page_fault+0x28/0x30 [511250.459104] Mem-Info: [511250.459109] active_anon:7922627 inactive_anon:1653 isolated_anon:0 active_file:1675 inactive_file:2820 isolated_file:0 unevictable:0 dirty:11 writeback:2 unstable:0 slab_reclaimable:61817 slab_unreclaimable:25990 mapped:3607 shmem:4602 pagetables:42625 bounce:0 free:50021 free_pcp:149 free_cma:0 [511250.459112] Node 0 DMA free:15892kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [511250.459117] lowmem_reserve[]: 0 2814 31994 31994 [511250.459120] Node 0 DMA32 free:119704kB min:5940kB low:7424kB high:8908kB active_anon:2678512kB inactive_anon:276kB active_file:124kB inactive_file:132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2883436kB mlocked:0kB dirty:0kB writeback:0kB mapped:1100kB shmem:1632kB slab_reclaimable:48796kB slab_unreclaimable:9340kB kernel_stack:5248kB pagetables:11424kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:32902 all_unreclaimable? yes [511250.459124] lowmem_reserve[]: 0 0 29180 29180 [511250.459127] Node 0 Normal free:63896kB min:61608kB low:77008kB high:92412kB active_anon:29011996kB inactive_anon:6336kB active_file:6576kB inactive_file:11148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29881068kB mlocked:0kB dirty:44kB writeback:8kB mapped:13328kB shmem:16776kB slab_reclaimable:198472kB slab_unreclaimable:94604kB kernel_stack:53472kB pagetables:159076kB unstable:0kB bounce:0kB free_pcp:656kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:924 all_unreclaimable? no [511250.459131] lowmem_reserve[]: 0 0 0 0 [511250.459134] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB [511250.459144] Node 0 DMA32: 9372*4kB (UEM) 2427*8kB (UEM) 1179*16kB (UEM) 369*32kB (UEM) 104*64kB (EM) 31*128kB (EM) 14*256kB (UEM) 9*512kB (UEM) 7*1024kB (UEM) 3*2048kB (M) 0*4096kB = 119704kB [511250.459154] Node 0 Normal: 1540*4kB (UE) 6148*8kB (UE) 503*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63392kB [511250.459162] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [511250.459163] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [511250.459164] 9275 total pagecache pages [511250.459166] 0 pages in swap cache [511250.459167] Swap cache stats: add 0, delete 0, find 0/0 [511250.459168] Free swap = 0kB [511250.459168] Total swap = 0kB [511250.459169] 8388478 pages RAM [511250.459170] 0 pages HighMem/MovableOnly [511250.459171] 193375 pages reserved [511250.459172] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [511250.459178] [ 444] 0 444 30482 118 63 0 0 systemd-journal [511250.459180] [ 476] 0 476 14365 114 28 0 -1000 auditd [511250.459182] [ 508] 0 508 5315 75 14 0 0 irqbalance [511250.459184] [ 509] 998 509 132421 1908 50 0 0 polkitd [511250.459186] [ 510] 0 510 6686 196 17 0 0 systemd-logind [511250.459188] [ 514] 81 514 6672 148 16 0 -900 dbus-daemon [511250.459189] [ 592] 0 592 6972 52 18 0 0 atd [511250.459191] [ 595] 0 595 31969 188 17 0 0 crond [511250.459193] [ 607] 0 607 28020 44 11 0 0 agetty [511250.459195] [ 1036] 0 1036 138798 3179 89 0 0 tuned [511250.459197] [ 1037] 0 1037 174118 357 182 0 0 rsyslogd [511250.459198] [ 1089] 38 1089 7865 174 19 0 0 ntpd [511250.459200] [ 4714] 0 4714 26866 243 54 0 -1000 sshd [511250.459202] [ 6624] 0 6624 920 100 4 0 0 aliyun-service [511250.459204] [19284] 0 19284 8386 171 21 0 0 AliYunDunUpdate [511250.459206] [19335] 0 19335 34887 1367 64 0 0 AliYunDun [511250.459208] [21657] 26 21657 59097 1539 52 0 -1000 postgres [511250.459210] [21658] 26 21658 48503 264 43 0 0 postgres [511250.459212] [21660] 26 21660 59124 2338 52 0 0 postgres [511250.459213] [21661] 26 21661 59097 332 48 0 0 postgres [511250.459215] [21662] 26 21662 59097 537 47 0 0 postgres [511250.459217] [21663] 26 21663 59328 513 50 0 0 postgres [511250.459218] [21664] 26 21664 49067 317 44 0 0 postgres [511250.459220] [ 7276] 0 7276 32471 164 16 0 0 screen [511250.459222] [ 7277] 0 7277 29357 123 13 0 0 bash [511250.459223] [ 7388] 0 7388 4303 1880 12 0 0 sagent [511250.459225] [ 7747] 0 7747 32504 200 16 0 0 screen [511250.459226] [ 7748] 0 7748 29357 122 14 0 0 bash [511250.459228] [ 7781] 0 7781 8051 4108 20 0 0 tagent [511250.459230] [ 9897] 0 9897 3062553 270245 774 0 0 java [511250.459231] [ 9937] 26 9937 59406 657 53 0 0 postgres [511250.459233] [ 9940] 26 9940 60212 2570 57 0 0 postgres [511250.459235] [ 9997] 26 9997 60098 2346 56 0 0 postgres [511250.459236] [10076] 26 10076 59574 964 54 0 0 postgres [511250.459238] [10077] 26 10077 59618 1006 54 0 0 postgres [511250.459239] [10078] 26 10078 59617 1005 54 0 0 postgres [511250.459241] [11611] 0 11611 60826 4190 73 0 0 python [511250.459243] [11619] 0 11619 348938 6222 118 0 0 python [511250.459245] [12396] 26 12396 60086 2078 56 0 0 postgres [511250.459246] [12499] 1001 12499 1448783 99046 328 0 0 java [511250.459248] [12600] 1003 12600 2226317 312995 847 0 0 java [511250.459249] [29241] 0 29241 78180 1320 101 0 0 php-fpm [511250.459251] [29242] 1004 29242 135239 2687 108 0 0 php-fpm [511250.459253] [29243] 1004 29243 134924 2408 108 0 0 php-fpm [511250.459255] [29244] 1004 29244 135371 2707 108 0 0 php-fpm [511250.459256] [29245] 1004 29245 143755 11294 125 0 0 php-fpm [511250.459258] [29246] 1004 29246 135367 2706 108 0 0 php-fpm [511250.459260] [29826] 27 29826 28792 86 13 0 0 mysqld_safe [511250.459261] [30051] 27 30051 322930 39761 133 0 0 mysqld [511250.459263] [30234] 0 30234 11365 125 22 0 -1000 systemd-udevd [511250.459264] [11182] 0 11182 82780 5702 114 0 0 salt-minion [511250.459266] [11193] 0 11193 171406 8289 144 0 0 salt-minion [511250.459268] [11195] 0 11195 101432 5712 110 0 0 salt-minion [511250.459269] [29678] 1004 29678 140301 7833 118 0 0 php-fpm [511250.459271] [29998] 1004 29998 134983 2404 108 0 0 php-fpm [511250.459273] [11833] 0 11833 69721 2098 58 0 0 python2.7 [511250.459275] [32113] 26 32113 60131 2012 56 0 0 postgres [511250.459276] [ 1017] 1004 1017 135410 2748 108 0 0 php-fpm [511250.459278] [11915] 1004 11915 144263 11778 126 0 0 php-fpm [511250.459280] [ 5999] 0 5999 8115 3139 20 0 0 tagent [511250.459281] [21572] 1004 21572 134919 2379 108 0 0 php-fpm [511250.459283] [21752] 1004 21752 143751 11276 125 0 0 php-fpm [511250.459285] [ 2977] 1004 2977 134920 2406 107 0 0 php-fpm [511250.459286] [ 9217] 0 9217 330989 183882 550 0 0 python2.7 [511250.459288] [ 2008] 1004 2008 135816 3328 109 0 0 php-fpm [511250.459290] [25089] 1000 25089 2800777 187701 710 0 0 java [511250.459291] [25405] 1000 25405 1335611 105668 366 0 0 java [511250.459293] [26033] 1000 26033 1680746 96082 367 0 0 java [511250.459295] [26112] 1000 26112 1148121 61227 230 0 0 java [511250.459296] [14446] 0 14446 31082 540 56 0 0 nginx [511250.459298] [14447] 1004 14447 31278 739 58 0 0 nginx [511250.459299] [14448] 1004 14448 31278 725 58 0 0 nginx [511250.459301] [14449] 1004 14449 31278 714 58 0 0 nginx [511250.459303] [14450] 1004 14450 31278 715 58 0 0 nginx [511250.459304] [14451] 1004 14451 31245 705 58 0 0 nginx [511250.459306] [14452] 1004 14452 31245 696 58 0 0 nginx [511250.459307] [14453] 1004 14453 31278 712 58 0 0 nginx [511250.459309] [14454] 1004 14454 31245 728 58 0 0 nginx [511250.459310] [14455] 1004 14455 31278 730 58 0 0 nginx [511250.459312] [14456] 1004 14456 31278 718 58 0 0 nginx [511250.459314] [14457] 1004 14457 31245 707 58 0 0 nginx [511250.459315] [14458] 1004 14458 31278 722 58 0 0 nginx [511250.459317] [14459] 1004 14459 31278 717 58 0 0 nginx [511250.459318] [14460] 1004 14460 31245 688 58 0 0 nginx [511250.459320] [14462] 1004 14462 31278 712 58 0 0 nginx [511250.459321] [14463] 1004 14463 31278 736 58 0 0 nginx [511250.459323] [14571] 0 14571 3222105 119555 906 0 0 python [511250.459325] [13969] 0 13969 134928 8719 143 0 0 salt-master [511250.459326] [13982] 0 13982 78554 5647 100 0 0 salt-master [511250.459328] [13985] 0 13985 116150 8034 134 0 0 salt-master [511250.459330] [13989] 0 13989 151040 38826 238 0 0 salt-master [511250.459331] [13990] 0 13990 103527 12904 148 0 0 salt-master [511250.459333] [14067] 0 14067 280592 9651 151 0 0 salt-master [511250.459334] [14072] 0 14072 135099 9889 141 0 0 salt-master [511250.459336] [14220] 0 14220 134928 8828 135 0 0 salt-master [511250.459338] [14221] 0 14221 1941362 9675 332 0 0 salt-master [511250.459339] [14228] 0 14228 175360 9657 148 0 0 salt-master [511250.459341] [14268] 0 14268 175362 9655 148 0 0 salt-master [511250.459343] [14314] 0 14314 175361 9662 148 0 0 salt-master [511250.459344] [14327] 0 14327 175363 9663 148 0 0 salt-master [511250.459346] [14329] 0 14329 175363 9666 148 0 0 salt-master [511250.459347] [14330] 0 14330 175364 9666 148 0 0 salt-master [511250.459349] [14331] 0 14331 175365 9666 148 0 0 salt-master [511250.459350] [14334] 0 14334 175366 9670 148 0 0 salt-master [511250.459352] [14338] 0 14338 175366 9669 148 0 0 salt-master [511250.459354] [14340] 0 14340 175366 9674 148 0 0 salt-master [511250.459355] [14345] 0 14345 175367 9679 148 0 0 salt-master [511250.459357] [14349] 0 14349 175367 9675 148 0 0 salt-master [511250.459358] [14350] 0 14350 175367 9671 148 0 0 salt-master [511250.459360] [14354] 0 14354 175368 9672 148 0 0 salt-master [511250.459362] [14357] 0 14357 175369 9678 148 0 0 salt-master [511250.459363] [14358] 0 14358 175369 9673 148 0 0 salt-master [511250.459365] [14362] 0 14362 175369 9677 148 0 0 salt-master [511250.459366] [14364] 0 14364 175370 9680 148 0 0 salt-master [511250.459368] [14365] 0 14365 175371 9681 148 0 0 salt-master [511250.459369] [14368] 0 14368 175371 9676 148 0 0 salt-master [511250.459371] [14370] 0 14370 175371 9674 148 0 0 salt-master [511250.459372] [14372] 0 14372 175372 9682 148 0 0 salt-master [511250.459374] [14376] 0 14376 175373 9682 148 0 0 salt-master [511250.459375] [14377] 0 14377 175374 9676 148 0 0 salt-master [511250.459377] [14378] 0 14378 175374 9689 148 0 0 salt-master [511250.459379] [14380] 0 14380 175650 9716 149 0 0 salt-master [511250.459381] [14384] 0 14384 175375 9690 148 0 0 salt-master [511250.459382] [14385] 0 14385 175375 9685 148 0 0 salt-master [511250.459384] [14401] 0 14401 175376 9687 148 0 0 salt-master [511250.459385] [14404] 0 14404 175377 9685 148 0 0 salt-master [511250.459387] [14413] 0 14413 175377 9685 148 0 0 salt-master [511250.459388] [14420] 0 14420 175377 9687 148 0 0 salt-master [511250.459390] [14421] 0 14421 175378 9686 148 0 0 salt-master [511250.459392] [14424] 0 14424 175380 9693 148 0 0 salt-master [511250.459393] [14428] 0 14428 175380 9689 148 0 0 salt-master [511250.459395] [14435] 0 14435 175382 9698 148 0 0 salt-master [511250.459396] [14437] 0 14437 175382 9694 148 0 0 salt-master [511250.459398] [14439] 0 14439 175383 9692 148 0 0 salt-master [511250.459399] [14442] 0 14442 175384 9694 148 0 0 salt-master [511250.459401] [14445] 0 14445 175385 9692 148 0 0 salt-master [511250.459403] [14465] 0 14465 175385 9695 148 0 0 salt-master [511250.459404] [14473] 0 14473 175385 9695 148 0 0 salt-master [511250.459406] [14486] 0 14486 175386 9697 148 0 0 salt-master [511250.459407] [14489] 0 14489 175386 9699 148 0 0 salt-master [511250.459409] [14503] 0 14503 175386 9699 148 0 0 salt-master [511250.459410] [14513] 0 14513 175387 9700 148 0 0 salt-master [511250.459412] [14520] 0 14520 175388 9704 148 0 0 salt-master [511250.459414] [14523] 0 14523 175389 9700 148 0 0 salt-master [511250.459415] [14525] 0 14525 175389 9703 148 0 0 salt-master [511250.459417] [14527] 0 14527 175390 9710 148 0 0 salt-master [511250.459419] [14533] 0 14533 175390 9705 148 0 0 salt-master [511250.459420] [14539] 0 14539 175390 9709 148 0 0 salt-master [511250.459422] [14590] 0 14590 175391 9713 148 0 0 salt-master [511250.459423] [14598] 0 14598 175390 9705 148 0 0 salt-master [511250.459425] [14613] 0 14613 175391 9705 148 0 0 salt-master [511250.459426] [14624] 0 14624 175392 9713 148 0 0 salt-master [511250.459428] [14630] 0 14630 175392 9707 148 0 0 salt-master [511250.459429] [14634] 0 14634 175393 9707 148 0 0 salt-master [511250.459431] [14652] 0 14652 175393 9709 148 0 0 salt-master [511250.459433] [14677] 0 14677 175394 9708 148 0 0 salt-master [511250.459434] [14679] 0 14679 175394 9711 148 0 0 salt-master [511250.459436] [14709] 0 14709 175395 9713 148 0 0 salt-master [511250.459438] [14718] 0 14718 175396 9710 148 0 0 salt-master [511250.459439] [14723] 0 14723 175396 9710 148 0 0 salt-master [511250.459441] [14746] 0 14746 175396 9716 148 0 0 salt-master [511250.459443] [14752] 0 14752 175461 9717 148 0 0 salt-master [511250.459444] [14791] 0 14791 175398 9715 148 0 0 salt-master [511250.459446] [14799] 0 14799 175397 9720 148 0 0 salt-master [511250.459447] [14804] 0 14804 175472 9721 148 0 0 salt-master [511250.459449] [14835] 0 14835 175462 9729 148 0 0 salt-master [511250.459450] [14840] 0 14840 175463 9735 148 0 0 salt-master [511250.459452] [14864] 0 14864 175463 9727 148 0 0 salt-master [511250.459453] [14882] 0 14882 175464 9731 148 0 0 salt-master [511250.459455] [14893] 0 14893 175465 9731 148 0 0 salt-master [511250.459456] [14899] 0 14899 175465 9720 148 0 0 salt-master [511250.459458] [14906] 0 14906 175466 9721 148 0 0 salt-master [511250.459460] [14910] 0 14910 175402 9723 148 0 0 salt-master [511250.459461] [14984] 0 14984 175466 9725 148 0 0 salt-master [511250.459463] [14988] 0 14988 175467 9735 148 0 0 salt-master [511250.459464] [14992] 0 14992 175468 9734 148 0 0 salt-master [511250.459466] [15072] 0 15072 175468 9735 148 0 0 salt-master [511250.459467] [15101] 0 15101 175468 9731 148 0 0 salt-master [511250.459469] [15129] 0 15129 175469 9733 148 0 0 salt-master [511250.459470] [15143] 0 15143 175469 9737 148 0 0 salt-master [511250.459472] [15168] 0 15168 175470 9740 148 0 0 salt-master [511250.459474] [15181] 0 15181 175474 9744 148 0 0 salt-master [511250.459475] [15219] 0 15219 175474 9734 148 0 0 salt-master [511250.459477] [15223] 0 15223 175477 9753 148 0 0 salt-master [511250.459479] [15259] 0 15259 175475 9734 148 0 0 salt-master [511250.459481] [15266] 0 15266 175476 9735 148 0 0 salt-master [511250.459482] [15322] 0 15322 175476 9736 148 0 0 salt-master [511250.459493] [15350] 0 15350 175476 9745 148 0 0 salt-master [511250.459495] [15366] 0 15366 175477 9743 148 0 0 salt-master [511250.459497] [15380] 0 15380 175506 9745 148 0 0 salt-master [511250.459498] [15399] 0 15399 175754 9769 149 0 0 salt-master [511250.459500] [15407] 0 15407 175479 9747 148 0 0 salt-master [511250.459501] [15447] 0 15447 175479 9742 148 0 0 salt-master [511250.459503] [15450] 0 15450 175479 9751 148 0 0 salt-master [511250.459504] [15454] 0 15454 175481 9747 148 0 0 salt-master [511250.459506] [15462] 0 15462 175480 9748 148 0 0 salt-master [511250.459508] [23316] 1000 23316 3085650 27853 144 0 0 java [511250.459509] [23319] 1000 23319 3085650 27289 144 0 0 java [511250.459511] [23348] 1000 23348 3085650 27778 142 0 0 java [511250.459512] [23351] 1000 23351 3085650 26840 141 0 0 java [511250.459514] [23373] 1000 23373 3085650 27380 143 0 0 java [511250.459515] [23406] 1000 23406 3085650 26933 143 0 0 java [511250.459517] [23425] 1000 23425 3085650 27371 142 0 0 java [511250.459518] [23445] 1000 23445 3085650 27861 141 0 0 java [511250.459520] [23476] 1000 23476 3085650 27716 143 0 0 java [511250.459522] [23497] 1000 23497 3085650 27902 144 0 0 java [511250.459523] [23690] 1000 23690 2049475 328916 865 0 0 java [511250.459525] [23691] 1000 23691 2082756 356868 894 0 0 java [511250.459527] [23693] 1000 23693 2027460 612751 1357 0 0 java [511250.459528] [23712] 1000 23712 2027460 610571 1348 0 0 java [511250.459529] [23754] 1000 23754 2049474 337457 886 0 0 java [511250.459531] [23785] 1000 23785 2049474 330831 864 0 0 java [511250.459533] [23805] 1000 23805 2027460 615907 1366 0 0 java [511250.459534] [23828] 1000 23828 2027460 610191 1346 0 0 java [511250.459536] [23855] 1000 23855 2629446 589971 1351 0 0 java [511250.459537] [23860] 1000 23860 2328022 144465 519 0 0 java [511250.459539] [13536] 1004 13536 134981 2523 108 0 0 php-fpm [511250.459540] [ 1813] 0 1813 1481817 46140 246 0 0 java [511250.459542] [ 3187] 0 3187 1481817 53461 253 0 0 java [511250.459544] [ 2993] 26 2993 59779 1712 55 0 0 postgres [511250.459546] [ 3059] 1000 3059 3085528 16411 141 0 0 java [511250.459547] [ 3146] 1000 3146 2027460 211779 628 0 0 java [511250.459549] [17982] 996 17982 4950828 635077 1629 0 0 java [511250.459551] [16433] 0 16433 37607 360 74 0 0 sshd [511250.459553] [16436] 0 16436 29390 141 13 0 0 bash [511250.459554] [16466] 0 16466 29390 136 14 0 0 bash [511250.459556] [22511] 0 22511 36968 433 72 0 0 sshd [511250.459558] [22515] 0 22515 19016 257 40 0 0 ssh [511250.459560] [22519] 0 22519 19107 350 39 0 0 ssh [511250.459562] [22522] 0 22522 19016 259 38 0 0 ssh [511250.459563] [24770] 0 24770 38342 657 30 0 0 vim [511250.459565] [24781] 0 24781 45009 303 41 0 0 crond [511250.459566] [24784] 0 24784 91360 8641 134 0 0 python [511250.459568] [24932] 0 24932 28791 45 13 0 0 sh [511250.459570] [24933] 0 24933 93538 7284 104 0 0 ansible-playboo [511250.459571] [24942] 0 24942 94424 7584 103 0 0 ansible-playboo [511250.459573] [24943] 0 24943 96455 9707 107 0 0 ansible-playboo [511250.459574] [24944] 0 24944 94436 7599 103 0 0 ansible-playboo [511250.459576] [24945] 0 24945 16336 70 33 0 0 ssh [511250.459578] [24946] 0 24946 16336 71 33 0 0 ssh [511250.459579] [24947] 0 24947 16336 69 30 0 0 ssh [511250.459581] Out of memory: Kill process 17982 (java) score 77 or sacrifice child [511250.459642] Killed process 17982 (java) total-vm:19803312kB, anon-rss:2540308kB, file-rss:0kB, shmem-rss:0kB
(1)mysqld觸發了oom killer,既mysqld要申請的內存大於了系統可用的物理內存大小。/proc/sys/vm/min_free_kbytes參數來控制,當系統可用內存(不包含buffer和cache)小於這個值的時候,系統會啓動內核線程kswapd來對內存進行回收。而仍是觸發了oom killer,則代表內存真的不夠用了或者在內存回收前或者回收中直接觸發了oom killer。
(2)以下的輸出代表了申請了3次內存都沒有成功
[511250.459112] Node 0 DMA free:15892kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [511250.459117] lowmem_reserve[]: 0 2814 31994 31994 [511250.459120] Node 0 DMA32 free:119704kB min:5940kB low:7424kB high:8908kB active_anon:2678512kB inactive_anon:276kB active_file:124kB inactive_file:132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2883436kB mlocked:0kB dirty:0kB writeback:0kB mapped:1100kB shmem:1632kB slab_reclaimable:48796kB slab_unreclaimable:9340kB kernel_stack:5248kB pagetables:11424kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:32902 all_unreclaimable? yes [511250.459124] lowmem_reserve[]: 0 0 29180 29180 [511250.459127] Node 0 Normal free:63896kB min:61608kB low:77008kB high:92412kB active_anon:29011996kB inactive_anon:6336kB active_file:6576kB inactive_file:11148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29881068kB mlocked:0kB dirty:44kB writeback:8kB mapped:13328kB shmem:16776kB slab_reclaimable:198472kB slab_unreclaimable:94604kB kernel_stack:53472kB pagetables:159076kB unstable:0kB bounce:0kB free_pcp:656kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:924 all_unreclaimable? no [511250.459131] lowmem_reserve[]: 0 0 0 0 [511250.459134] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB [511250.459144] Node 0 DMA32: 9372*4kB (UEM) 2427*8kB (UEM) 1179*16kB (UEM) 369*32kB (UEM) 104*64kB (EM) 31*128kB (EM) 14*256kB (UEM) 9*512kB (UEM) 7*1024kB (UEM) 3*2048kB (M) 0*4096kB = 119704kB [511250.459154] Node 0 Normal: 1540*4kB (UE) 6148*8kB (UE) 503*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63392kB [511250.459162] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [511250.459163] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [511250.459164] 9275 total pagecache pages
(3)被幹掉進程信息
如需輸出確認了被kill的進程爲17982
[511250.459581] Out of memory: Kill process 17982 (java) score 77 or sacrifice child [511250.459642] Killed process 17982 (java) total-vm:19803312kB, anon-rss:2540308kB, file-rss:0kB, shmem-rss:0kB
以下爲17982進程佔用的內存頁數量635077,換算爲內存佔用量是635077*4096=2GB
[511250.459172] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [511250.459549] [17982] 996 17982 4950828 635077 1629 0
每列的含義爲:
pid進程ID。
uid用戶ID。
tgid線程組ID。
total_vm虛擬內存使用(單位爲4 kB內存頁)
rss居民 memory 使用(單位4 kB內存頁)
nr_ptes頁表項
swapents交換條目
oom_score_adj一般爲0;較低的數字表示當調用OOM殺手時,進程將不太可能死亡。
(4)分析系統全部進程rss內存(rss爲程序實際使用物理內存,單位爲4kB內存頁)
把oom輸出中進程的rss內存相加,發現已經使用了32g,那就說明系統是內存耗盡了才觸發的oom killer。而經過分析,發現java程序佔用的的內存總量爲26g,是最大頭。
三,解決
使用的解決辦法:
1,限制java進程的max heap,而且下降java程序的worker數量,從而下降內存使用
2,發現系統沒有開啓swap,給系統加了8G的swap空間
其它解決辦法(不推薦),不容許內存申請過量:
# echo "2" > /proc/sys/vm/overcommit_memory
# echo "80" > /proc/sys/vm/overcommit_ratio
四,擴展
1,overcommit_memory(/proc/sys/vm/overcommit_memory)
Linux是容許memory overcommit的,只要你來申請內存我就給你,寄但願於進程實際上用不到那麼多內存,但萬一用到那麼多了呢?那就會發生相似「銀行擠兌」的危機,現金(內存)不足了。Linux設計了一個OOM killer機制(OOM = out-of-memory)來處理這種危機:挑選一個進程出來殺死,以騰出部份內存,若是還不夠就繼續殺…也可經過設置內核參數 vm.panic_on_oom 使得發生OOM時自動重啓系統。這都是有風險的機制,重啓有可能形成業務中斷,殺死進程也有可能致使業務中斷。因此Linux 2.6以後容許經過內核參數 vm.overcommit_memory 禁止memory overcommit。
(1)內核參數 vm.overcommit_memory 接受三種取值:
0 – Heuristic overcommit handling. 這是缺省值,它容許overcommit,但過於明目張膽的overcommit會被拒絕,好比malloc一次性申請的內存大小就超過了系統總內存。Heuristic的意思是「試探式的」,內核利用某種算法猜想你的內存申請是否合理,它認爲不合理就會拒絕overcommit。
1 – Always overcommit. 容許overcommit,對內存申請來者不拒。內核執行無內存過量使用處理。使用這個設置會增大內存超載的可能性,但也能夠加強大量使用內存任務的性能。
2 – Don’t overcommit. 禁止overcommit。 內存拒絕等於或者大於總可用 swap 大小以及 overcommit_ratio 指定的物理 RAM 比例的內存請求。若是您但願減少內存過分使用的風險,這個設置就是最好的。
(2)Heuristic overcommit算法在如下函數中實現,基本上能夠這麼理解:
單次申請的內存大小不能超過 【free memory + free swap + pagecache的大小 + SLAB中可回收的部分】,不然本次申請就會失敗。
(3)關於禁止overcommit (vm.overcommit_memory=2) ,須要知道的是,怎樣纔算是overcommit呢?kernel設有一個閾值,申請的內存總數超過這個閾值就算overcommit,在/proc/meminfo中能夠看到這個閾值的大小:
# grep -i commit /proc/meminfo CommitLimit: 5967744 kB Committed_AS: 5363236 kB
CommitLimit 就是overcommit的閾值,申請的內存總數超過CommitLimit的話就算是overcommit。
這個閾值是如何計算出來的呢?它既不是物理內存的大小,也不是free memory的大小,它是經過內核參數vm.overcommit_ratio或vm.overcommit_kbytes間接設置的,公式以下:
【CommitLimit = (Physical RAM * vm.overcommit_ratio / 100) + Swap】
注:
vm.overcommit_ratio 是內核參數,缺省值是50,表示物理內存的50%。若是你不想使用比率,也能夠直接指定內存的字節數大小,經過另外一個內核參數 vm.overcommit_kbytes 便可;
若是使用了huge pages,那麼須要從物理內存中減去,公式變成:
CommitLimit = ([total RAM] – [total huge TLB RAM]) * vm.overcommit_ratio / 100 + swap
參見https://access.redhat.com/solutions/665023
/proc/meminfo中的 Committed_AS 表示全部進程已經申請的內存總大小,(注意是已經申請的,不是已經分配的),若是 Committed_AS 超過 CommitLimit 就表示發生了 overcommit,超出越多表示 overcommit 越嚴重。Committed_AS 的含義換一種說法就是,若是要絕對保證不發生OOM (out of memory) 須要多少物理內存。
(4)「sar -r」是查看內存使用情況的經常使用工具,它的輸出結果中有兩個與overcommit有關,kbcommit 和 %commit:
kbcommit對應/proc/meminfo中的 Committed_AS;
%commit的計算公式並無採用 CommitLimit做分母,而是Committed_AS/(MemTotal+SwapTotal),意思是_內存申請_佔_物理內存與交換區之和_的百分比。
# sar -r 05:00:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 05:10:01 PM 160576 3648460 95.78 0 1846212 4939368 62.74 1390292 1854880 4
2,panic_on_oom(/proc/sys/vm/panic_on_oom)
決定系統出現oom的時候,要作的操做。接受的三種取值以下:
0 - 默認值,當出現oom的時候,觸發oom killer
1 - 程序在有cpuset、memory policy、memcg的約束狀況下的OOM,能夠考慮不panic,而是啓動OOM killer。其它狀況觸發 kernel panic,即系統直接重啓
2 - 當出現oom,直接觸發kernel panic,即系統直接重啓
3,oom_adj、oom_score_adj和oom_score
準確的說這幾個參數都是和具體進程相關的,所以它們位於/proc/xxx/目錄下(xxx是進程ID)。假設咱們選擇在出現OOM情況的時候殺死進程,那麼一個很天然的問題就浮現出來:到底幹掉哪個呢?內核的算法卻是很是簡單,那就是打分(oom_score,注意,該參數是read only的),找到分數最高的就OK了。那麼怎麼來算分數呢?能夠參考內核中的oom_badness函數:
unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, const nodemask_t *nodemask, unsigned long totalpages) {…… adj = (long)p->signal->oom_score_adj; if (adj == OOM_SCORE_ADJ_MIN) {----------------------(1) task_unlock(p); return 0;---------------------------------(2) } points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + atomic_long_read(&p->mm->nr_ptes) + mm_nr_pmds(p->mm);---------(3) task_unlock(p); if (has_capability_noaudit(p, CAP_SYS_ADMIN))-----------------(4) points -= (points * 3) / 100; adj *= totalpages / 1000;----------------------------(5) points += adj; return points > 0 ? points : 1; }
(1)對某一個task進行打分(oom_score)主要有兩部分組成,一部分是系統打分,主要是根據該task的內存使用狀況。另一部分是用戶打分,也就是oom_score_adj了,該task的實際得分須要綜合考慮兩方面的打分。若是用戶將該task的 oom_score_adj設定成OOM_SCORE_ADJ_MIN(-1000)的話,那麼實際上就是禁止了OOM killer殺死該進程。
(2)這裏返回了0也就是告知OOM killer,該進程是「good process」,不要幹掉它。後面咱們能夠看到,實際計算分數的時候最低分是1分。
(3)前面說過了,系統打分就是看物理內存消耗量,主要是三部分,RSS部分,swap file或者swap device上佔用的內存狀況以及頁表佔用的內存狀況。
(4)root進程有3%的內存使用特權,所以這裏要減去那些內存使用量。
(5)用戶能夠調整oom_score,具體如何操做呢?oom_score_adj的取值範圍是-1000~1000,0表示用戶不調整oom_score,負值表示要在實際打分值上減去一個折扣,正值表示要懲罰該task,也就是增長該進程的oom_score。在實際操做中,須要根據本次內存分配時候可分配內存來計算(若是沒有內存分配約束,那麼就是系統中的全部可用內存,若是系統支持cpuset,那麼這裏的可分配內存就是該cpuset的實際額度值)。oom_badness函數有一個傳入參數totalpages,該參數就是當時的可分配的內存上限值。實際的分數值(points)要根據oom_score_adj進行調整,例如若是oom_score_adj設定-500,那麼表示實際分數要打五折(基數是totalpages),也就是說該任務實際使用的內存要減去可分配的內存上限值的一半。
瞭解了oom_score_adj和oom_score以後,應該是塵埃落定了,oom_adj是一箇舊的接口參數,其功能相似oom_score_adj,爲了兼容,目前仍然保留這個參數,當操做這個參數的時候,kernel其實是會換算成oom_score_adj,有興趣的同窗能夠自行了解,這裏再也不細述了。
plus:
由任意調整的進程衍生的任意進程將繼承該進程的 oom_score。例如:若是 sshd 進程不受 oom_killer功能影響,全部由 SSH 會話產生的進程都將不受其影響。這可在出現 OOM 時影響 oom_killer 功能救援系統的能力。
4,min_free_kbytes(/proc/sys/vm/min_free_kbytes)
先看官方解釋:
This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a watermark[WMARK_MIN] value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size.
Some minimal amount of memory is needed to satisfy PF_MEMALLOC allocations; if you set this to lower than 1024KB, your system will become subtly broken, and prone to deadlock under high loads.
Setting this too high will OOM your machine instantly.
解釋已經很清楚了,主要有如下幾個關鍵點:
(1). 表明系統所保留空閒內存的最低限。
在系統初始化時會根據內存大小計算一個默認值,計算規則是:
min_free_kbytes = sqrt(lowmem_kbytes * 16) = 4 * sqrt(lowmem_kbytes)(注:lowmem_kbytes便可認爲是系統內存大小)
另外,計算出來的值有最小最大限制,最小爲128K,最大爲64M。
能夠看出,min_free_kbytes隨着內存的增大不是線性增加,comments裏提到了緣由「because network bandwidth does not increase linearly with machine size」。隨着內存的增大,沒有必要也線性的預留出過多的內存,能保證緊急時刻的使用量便足矣。
(2).min_free_kbytes的主要用途是計算影響內存回收的三個參數 watermark[min/low/high]
1) watermark[high] > watermark [low] > watermark[min],各個zone各一套
2)在系統空閒內存低於 watermark[low]時,開始啓動內核線程kswapd進行內存回收(每一個zone一個),直到該zone的空閒內存數量達到watermark[high]後中止回收。若是上層申請內存的速度太快,致使空閒內存降至watermark[min]後,內核就會進行direct reclaim(直接回收),即直接在應用程序的進程上下文中進行回收,再用回收上來的空閒頁知足內存申請,所以實際會阻塞應用程序,帶來必定的響應延遲,並且可能會觸發系統OOM。這是由於watermark[min]如下的內存屬於系統的自留內存,用以知足特殊使用,因此不會給用戶態的普通申請來用。
3)三個watermark的計算方法:
watermark[min] = min_free_kbytes換算爲page單位便可,假設爲min_free_pages。(由於是每一個zone各有一套watermark參數,實際計算效果是根據各個zone大小所佔內存總大小的比例,而算出來的per zone min_free_pages) watermark[low] = watermark[min] * 5 / 4 watermark[high] = watermark[min] * 3 / 2
因此中間的buffer量爲 high - low = low - min = per_zone_min_free_pages * 1/4。由於min_free_kbytes = 4* sqrt(lowmem_kbytes),也能夠看出中間的buffer量也是跟內存的增加速度成開方關係。
4)能夠經過/proc/zoneinfo查看每一個zone的watermark
例如:
Node 0, zone DMA pages free 3960 min 65 low 81 high 97
(3).min_free_kbytes大小的影響
min_free_kbytes設的越大,watermark的線越高,同時三個線之間的buffer量也相應會增長。這意味着會較早的啓動kswapd進行回收,且會回收上來較多的內存(直至watermark[high]纔會中止),這會使得系統預留過多的空閒內存,從而在必定程度上下降了應用程序可以使用的內存量。極端狀況下設置min_free_kbytes接近內存大小時,留給應用程序的內存就會太少而可能會頻繁地致使OOM的發生。
min_free_kbytes設的太小,則會致使系統預留內存太小。kswapd回收的過程當中也會有少許的內存分配行爲(會設上PF_MEMALLOC)標誌,這個標誌會容許kswapd使用預留內存;另一種狀況是被OOM選中殺死的進程在退出過程當中,若是須要申請內存也可使用預留部分。這兩種狀況下讓他們使用預留內存能夠避免系統進入deadlock狀態。
5,lowmem與highmem
關於lowmem和highmem的定義在這裏就不詳細展開了,推薦兩篇文章:
http://ilinuxkernel.com/?p=1013
連接內講的比較清楚,這裏只說結論:
(1)當系統的物理內存 > 內核的地址空間範圍時,才須要引入highmem概念。
x86架構下,linux默認會把進程的虛擬地址空間(4G)按3:1拆分,0~3G user space經過頁表映射,3G-4G kernel space線性映射到進程高地址。就是說,x86機器的物理內存超過1G時,須要引入highmem概念。
(2)內核不能直接訪問1G以上的物理內存(由於這部份內存無法映射到內核的地址空間),當內核須要訪問1G以上的物理內存時,須要經過臨時映射的方式,把高地址的物理內存映射到內核能夠訪問的地址空間裏。
(3)當lowmem被佔滿以後,就算剩餘的物理內存很大,仍是會出現oom的狀況。對於linux2.6來講,oom以後會根據score殺掉一個進程(oom的話題這裏不展開了)。
(4)x86_64架構下,內核可用的地址空間遠大於實際物理內存空間,因此目前沒有上面討論的highmem的問題,能夠認爲系統內存等於lowmem。
6,lowmem_reserve_ratio(/proc/sys/vm/lowmem_reserve_ratio)
官方解釋:
For some specialised workloads on highmem machines it is dangerous for the kernel to allow process memory to be allocated from the "lowmem" zone. This is because that memory could then be pinned via the mlock() system call, or by unavailability of swapspace.
And on large highmem machines this lack of reclaimable lowmem memory can be fatal.
So the Linux page allocator has a mechanism which prevents allocations which _could_ use highmem from using too much lowmem. This means that a certain amount of lowmem is defended from the possibility of being captured into pinned user memory.
The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is in defending these lower zones.
If you have a machine which uses highmem or ISA DMA and your applications are using mlock(), or if you are running with no swap then you probably should change the lowmem_reserve_ratio setting.
(1).做用
除了min_free_kbytes會在每一個zone上預留一部份內存外,lowmem_reserve_ratio是在各個zone之間進行必定的防衛預留,主要是防止高端zone在沒內存的狀況下過分使用低端zone的內存資源。
例如如今常見的一個node的機器有三個zone: DMA,DMA32和NORMAL。DMA和DMA32屬於低端zone,內存也較小,如96G內存的機器兩個zone總和才1G左右,NORMAL就相對屬於高端內存(如今通常沒有HIGH zone),並且數量較大(>90G)。低端內存有必定的特殊做用好比發生DMA時只能分配DMA zone的低端內存,所以須要在 儘可能可使用高端內存時 而 不使用低端內存,同時防止高端內存分配不足的時候搶佔稀有的低端內存。
(2). 計算方法
# cat /proc/sys/vm/lowmem_reserve_ratio
256 256 32
內核利用上述的protection數組計算每一個zone的預留page量,計算出來也是數組形式,從/proc/zoneinfo裏能夠查看:
Node 0, zone DMA pages free 1355 min 3 low 3 high 4 : : numa_other 0 protection: (0, 2004, 2004, 2004) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pagesets cpu: 0 pcp: 0 :
在進行內存分配時,這些預留頁數值和watermark相加來一塊兒決定如今是知足分配請求,仍是認爲空閒內存量太低須要啓動回收。
例如,若是一個normal區(index = 2)的頁申請來試圖分配DMA區的內存,且如今使用的判斷標準是watermark[low]時,內核計算出 page_free = 1355,而watermark + protection[2] = 3 + 2004 = 2007 > page_free,則認爲空閒內存太少而不予以分配。若是分配請求本就來自DMA zone,則 protection[0] = 0會被使用,而知足分配申請。
zone[i] 的 protection[j] 計算規則以下:
(i < j): zone[i]->protection[j] = (total sums of present_pages from zone[i+1] to zone[j] on the node) / lowmem_reserve_ratio[i]; (i = j): (should not be protected. = 0; (i > j): (not necessary, but looks 0)
默認的 lowmem_reserve_ratio[i] 值是:
256 (if zone[i] means DMA or DMA32 zone)
32 (others).
從上面的計算規則能夠看出,預留內存值是ratio的倒數關係,若是是256則表明 1/256,即爲 0.39% 的高端zone內存大小。若是想要預留更多頁,應該設更小一點的值,最小值是1(1/1 -> 100%)。
(3). 和min_free_kbytes(watermark)的配合示例
下面是一段某線上服務器(96G)內存申請失敗時打印出的log:
[38905.295014] java: page allocation failure. order:1, mode:0x20, zone 2 [38905.295020] Pid: 25174, comm: java Not tainted 2.6.32-220.23.1.tb750.el5.x86_64 #1 ... [38905.295348] active_anon:5730961 inactive_anon:216708 isolated_anon:0 [38905.295349] active_file:2251981 inactive_file:15562505 isolated_file:0 [38905.295350] unevictable:1256 dirty:790255 writeback:0 unstable:0 [38905.295351] free:113095 slab_reclaimable:577285 slab_unreclaimable:31941 [38905.295352] mapped:7816 shmem:4 pagetables:13911 bounce:0 [38905.295355] Node 0 DMA free:15796kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15332kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [38905.295365] lowmem_reserve[]: 0 1951 96891 96891 [38905.295369] Node 0 DMA32 free:380032kB min:800kB low:1000kB high:1200kB active_anon:46056kB inactive_anon:10876kB active_file:15968kB inactive_file:129772kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1998016kB mlocked:0kB dirty:20416kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:11716kB slab_unreclaimable:160kB kernel_stack:176kB pagetables:112kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:576 all_unreclaimable? no [38905.295379] lowmem_reserve[]: 0 0 94940 94940 [38905.295383] Node 0 Normal free:56552kB min:39032kB low:48788kB high:58548kB active_anon:22877788kB inactive_anon:855956kB active_file:8991956kB inactive_file:62120248kB unevictable:5024kB isolated(anon):0kB isolated(file):0kB present:97218560kB mlocked:5024kB dirty:3140604kB writeback:0kB mapped:31264kB shmem:16kB slab_reclaimable:2297424kB slab_unreclaimable:127604kB kernel_stack:12528kB pagetables:55532kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [38905.295393] lowmem_reserve[]: 0 0 0 0 [38905.295396] Node 0 DMA: 1*4kB 2*8kB 0*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15796kB [38905.295405] Node 0 DMA32: 130*4kB 65*8kB 75*16kB 72*32kB 95*64kB 22*128kB 10*256kB 7*512kB 4*1024kB 2*2048kB 86*4096kB = 380032kB [38905.295414] Node 0 Normal: 12544*4kB 68*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 54816kB [38905.295423] 17816926 total pagecache pages
1)從第一行log「order:1, mode:0x20」能夠看出來是GFP_ATOMIC類型的申請,且order = 1(page = 2 )
2)第一次內存申請嘗試
在__alloc_pages_nodemask()裏,首先調用 get_page_from_freelist() 嘗試第一次申請,使用的標誌位是 ALLOC_WMARK_LOW|ALLOC_CPUSET,它會對每一個zone都作 zone_watermark_ok()的檢查,使用的就是傳進的watermark[low]閾值。
在zone_watermark_ok()裏會考慮z->lowmem_reserve[],致使在normal上的申請不會落到低端zone。好比對於DMA32:
free pages = 380032KB = 95008 pages < low(1000KB = 250 pages) + lowmem_reserve[normal](94940) = 95190
因此就認爲DMA32也不平不ok,同理更用不了DMA的內存。
而對於normal本身內存來講,free pages = 56552 KB = 14138 pages,也不用考慮lowmem_reserve(0),但這時還會考慮申請order(1),減去order 0的12544個page後只剩 14138 - 12544 = 1594,也小於 low / 2 = (48788KB=12197pages) / 2 = 6098 pages。
因此初次申請嘗試失敗,進入__alloc_pages_slowpath() 嘗試進行更爲積極一些的申請。
3)第二次內存申請嘗試
__alloc_pages_slowpath()首先是經過 gfp_to_alloc_flags() 修改alloc_pages,設上更爲強硬的標誌位。這塊根據原來的GFP_ATOMIC會設上 ALLOC_WMARK_MIN | ALLOC_HARDER | ALLOC_HIGH。但注意的是不會設上 ALLOC_NO_WATERMARKS 標誌位。這個標誌位再也不判斷zone的水位限制,屬於優先級最高的申請,能夠動用全部的reserve內存,但條件是(!in_interrupt() && ((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))),即要求不能在中斷上下文,且是正在進行回收(例如kswapd)或者正在退出的進程。
以後進入拿着新的alloc_pages從新進入get_page_from_pagelist() 嘗試第二次申請,雖然有了 ALLOC_HARDER和ALLOC_HIGH,可是不幸的是在3個zone的zone_watermark_ok檢查中仍是都沒法經過,例如對於DMA32:
free pages = 380032KB = 95008 pages
由於設上了ALLOC_HIGH 因此會將獲得的watermark[min]減半,即min = min/2 = 800K / 2 = 400K = 100pages
而又由於設上了ALLOC_HARDER,會再將min砍去1/4,即min = 3 * min / 4 = 100 pages * 3 / 4 = 75 pages
即使如此,min(75 pages) + lowmem_reserve[normal](94940) = 95015,仍大於free pages,仍認爲沒法分配內存,同理DMA也不不成功,而normal中 free pages裏連續8K的頁太少也沒法知足分配
第二次失敗後,因爲沒有ALLOC_NO_WATERMARK也不會進入__alloc_pages_high_priority 進行最高優先級的申請,同時因爲是GFP_ATOMIC類型的分配不能阻塞回收或者進入OOM,所以就以申請失敗了結。
遇到此種狀況能夠適當調高 min_free_kbytes 使kswapd較早啓動回收,使系統一直留有較多的空閒內存,同時能夠適度下降 lowmem_reserve_ratio(可選),使得內存不足的狀況下(主要是normal zone)能夠借用DMA32/DMA的內存救急(注意不能也不能太低)。
參考:
http://kernel.taobao.org/index.php?title=Kernel_Documents/mm_sysctl