2018年10月20日,宿主上的一臺虛機觸發oom,致使虛機被內核幹掉,問題出現時宿主上內存還剩不少,message中日誌以下:html
說明node
日誌中的order=0說明申請了多少內存,order=0說明申請2的0次方頁內存,也就是4k內存linux
Oct 20 00:43:07 kernel: qemu-kvm invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
Oct 20 00:43:07 kernel: qemu-kvm cpuset=emulator mems_allowed=1
Oct 20 00:43:07 kernel: CPU: 7 PID: 1194284 Comm: qemu-kvm Tainted: G OE ------------ 3.10.0-327.el7.x86_64 #1
Oct 20 00:43:07 kernel: Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.5.5 08/16/2017
Oct 20 00:43:07 kernel: ffff882e328f0b80 000000008b0f4108 ffff882f6f367b00 ffffffff816351f1
Oct 20 00:43:07 kernel: ffff882f6f367b90 ffffffff81630191 ffff882e32a91980 0000000000000001
Oct 20 00:43:07 kernel: 000000000000420f 0000000000000010 ffffffff8197d740 00000000b922b922
Oct 20 00:43:07 kernel: Call Trace:
Oct 20 00:43:07 kernel: [<ffffffff816351f1>] dump_stack+0x19/0x1b
Oct 20 00:43:07 kernel: [<ffffffff81630191>] dump_header+0x8e/0x214
Oct 20 00:43:07 kernel: [<ffffffff8116cdee>] oom_kill_process+0x24e/0x3b0
Oct 20 00:43:07 kernel: [<ffffffff8116c956>] ? find_lock_task_mm+0x56/0xc0
Oct 20 00:43:07 kernel: [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
Oct 20 00:43:07 kernel: [<ffffffff811737f5>] __alloc_pages_nodemask+0xa95/0xb90
Oct 20 00:43:07 kernel: [<ffffffff811b78ca>] alloc_pages_vma+0x9a/0x140
Oct 20 00:43:07 kernel: [<ffffffff81197655>] handle_mm_fault+0xb85/0xf50
Oct 20 00:43:07 kernel: [<ffffffff8122bb37>] ? eventfd_ctx_read+0x67/0x210
Oct 20 00:43:07 kernel: [<ffffffff81640e22>] __do_page_fault+0x152/0x420
Oct 20 00:43:07 kernel: [<ffffffff81641113>] do_page_fault+0x23/0x80
Oct 20 00:43:07 kernel: [<ffffffff8163d408>] page_fault+0x28/0x30
Oct 20 00:43:07 kernel: Mem-Info:
Oct 20 00:43:07 kernel: active_anon:87309259 inactive_anon:444334 isolated_anon:0#012 active_file:101827 inactive_file:1066463 isolated_file:0#012 unevictable:0 dirty:16777 writeback:0 unstable:0#012 free:8521193 slab_reclaimable:179558 slab_unreclaimable:138991#012 mapped:14804 shmem:1180357 pagetables:195678 bounce:0#012 free_cma:0
Oct 20 00:43:07 kernel: Node 1 Normal free:44244kB min:45096kB low:56368kB high:67644kB active_anon:194740280kB inactive_anon:795780kB active_file:80kB inactive_file:100kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:201326592kB managed:198168156kB mlocked:0kB dirty:4kB writeback:0kB mapped:2500kB shmem:2177236kB slab_reclaimable:158548kB slab_unreclaimable:199088kB kernel_stack:109552kB pagetables:478460kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:301 all_unreclaimable? yes
Oct 20 00:43:07 kernel: lowmem_reserve[]: 0 0 0 0
Oct 20 00:43:07 kernel: Node 1 Normal: 10147*4kB (UEM) 22*8kB (UE) 3*16kB (U) 11*32kB (UR) 8*64kB (R) 6*128kB (R) 2*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 44492kB
Oct 20 00:43:07 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 20 00:43:07 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 20 00:43:07 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 20 00:43:07 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 20 00:43:07 kernel: 2349178 total pagecache pages
Oct 20 00:43:07 kernel: 0 pages in swap cache
Oct 20 00:43:07 kernel: Swap cache stats: add 0, delete 0, find 0/0
Oct 20 00:43:07 kernel: Free swap = 0kB
Oct 20 00:43:07 kernel: Total swap = 0kB
Oct 20 00:43:07 kernel: 100639322 pages RAM
Oct 20 00:43:07 kernel: 0 pages HighMem/MovableOnly
Oct 20 00:43:07 kernel: 1646159 pages reserved
Oct 20 00:43:07 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Oct 20 00:43:07 kernel: Out of memory: Kill process 1409878 (qemu-kvm) score 666 or sacrifice child
Oct 20 00:43:07 kernel: Killed process 1409878 (qemu-kvm) total-vm:136850144kB, anon-rss:133909332kB, file-rss:4724kB
Oct 20 00:43:30 libvirtd: 2018-10-19 16:43:30.303+0000: 81546: error : qemuMonitorIO:705 : internal error: End of file from qemu monitor
Oct 20 00:43:30 systemd-machined: Machine qemu-7-c2683281-6cbd-4100-ba91-e221ed06ee60 terminated.
Oct 20 00:43:30 kvm: 6 guests now active
複製代碼
上述日誌省略掉了meminfo的詳細信息和每一個進程佔用內存的信息。bash
從日誌中能夠看到Node 1 Normal free內存只剩下44M左右,因此觸發了oom,但當時其實node0上還有不少內存未被使用,觸發oom的進程kvm,pid爲1194284,經過查日誌定位到引起問題的虛機爲25913bd0-d869-4310-ab53-8df6855dd258,查看出本臺虛機機xml文件配置發現,虛機內存的numa配置爲:app
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
複製代碼
經過virsh client獲取到的信息以下:ide
virsh # numatune 25913bd0-d869-4310-ab53-8df6855dd258
numa_mode : strict
numa_nodeset : 1
複製代碼
發現當mode是strict,placement爲auto的時候,進程會算出一個合適的numa節點配置到這臺虛機上。因此這臺虛機內存就被限定到了node1上,當node1的內存被用盡就觸發了oom工具
參見官網連接性能
嚴格策略意思是,若是目標節點上的內存不能被分配,那麼內存分配就會失敗 指定了numa節點列表,可是沒有定義內存模式默認爲strict策略測試
跨越指定節點集分配內存頁,分配遵循round-robin(循環/輪替)方法ui
內存從單個首選內存節點分配,若是沒有足夠的內存能知足,那麼內存從其餘節點分配。
重要提示
若是在strict模式內存被過量使用,而且guest沒有足夠的swap空間,那麼內核將kill某些guest進程來得到足夠的內存,因此紅帽官方推薦用perferred,配置一個單節點(好比說,nodeset=‘0’)來避免這種狀況
咱們拿了一臺新的宿主建立一臺虛擬機,修改虛擬機的numatune配置,測試了虛機的numatune配置在strict和prefreed兩種mode在如下三種配置下的表現:
interleave這種跨節點內存分配方式性能表現確定會比以上兩種弱,且咱們主要想測在單node節點內存佔用滿的狀況下strict和prefreed兩種模式會不會觸發oom,因此interleave模式不在測試範圍內。
mode 爲strict placement爲auto
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
複製代碼
mode 爲preferred placement爲auto
<numatune>
<memory mode='preferred' placement='auto'/>
</numatune>
複製代碼
mode爲strict nodeset配置爲0-1
<numatune>
<memory mode='strict' nodeset='0-1'/>
</numatune>
複製代碼
將宿主上單個node的節點內存用memholder(這個工具屬於ssplatform2-tools這個rpm包)佔用滿(具體命令numactl -i 0 memholder 64000 &),而後在虛機上也跑memholder進程,看虛機佔用內存也不斷升高時,內存在numa節點上的分配狀況。
virsh client段獲取到的信息以下,placement是auto,可是qemu-kvm進程仍是選了個node肯定了下來
virsh # numatune 638abba7-bba8-498b-88d6-ddc70f2cef18
numa_mode : strict
numa_nodeset : 1
複製代碼
開始虛機內存佔用以下
# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1332894 (qemu-kv 0 693 694
1764062 (qemu-kv 0 366 366
--------------- ------ ------ -----
Total 1 1060 1060
複製代碼
用memholder把node1內存佔用滿以後宿主的內存佔用
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 58476 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 64 MB
node distances:
node 0 1
0: 10 21
1: 21 10
複製代碼
虛機裏運行完memholder開始佔用內存以後,虛機的內存佔用以下:
numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1332894 (qemu-kv 6 685 692
1764062 (qemu-kv 7 4670 4677
--------------- ------ ------ -----
Total 13 5355 5368
複製代碼
宿主的內存佔用:
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 58650 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 52181 MB
node distances:
node 0 1
0: 10 21
1: 21 10
複製代碼
這個時候發現kvm進程已經出發了oom,宿主上佔用內存的memholder進程已經被kernel kill調了,宿主內存空閒了出來
message裏日誌以下:
Nov 13 21:07:07 kernel: qemu-kvm invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
Nov 13 21:07:07 kernel: qemu-kvm cpuset=emulator mems_allowed=1
Nov 13 21:07:07 kernel: CPU: 28 PID: 1332894 Comm: qemu-kvm Not tainted 4.4.36-1.el7.elrepo.x86_64 #1
Nov 13 21:07:07 kernel: Mem-Info:
Nov 13 21:07:07 kernel: active_anon:1986423 inactive_anon:403229 isolated_anon:0#012 active_file:116773 inactive_file:577075 isolated_file:0#012 unevictable:14364416 dirty:142 writeback:0 unstable:0#012 slab_reclaimable:61182 slab_unreclaimable:296489#012 mapped:14400991 shmem:15542531 pagetables:35749 bounce:0#012 free:14983912 free_pcp:0 free_cma:0
Nov 13 21:07:07 kernel: Node 1 Normal free:44952kB min:45120kB low:56400kB high:67680kB active_anon:5485032kB inactive_anon:1571408kB active_file:308kB inactive_file:0kB unevictable:57286820kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:66044484kB mlocked:57286820kB dirty:48kB writeback:0kB mapped:57330444kB shmem:61948048kB slab_reclaimable:143752kB slab_unreclaimable:1107004kB kernel_stack:16592kB pagetables:129312kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2248 all_unreclaimable? yes
Nov 13 21:07:07 kernel: lowmem_reserve[]: 0 0 0 0
Nov 13 21:07:07 kernel: Node 1 Normal: 1018*4kB (UME) 312*8kB (UE) 155*16kB (UE) 34*32kB (UE) 293*64kB (UM) 53*128kB (U) 5*256kB (U) 1*512kB (U) 1*1024kB (E) 2*2048kB (UM) 2*4096kB (M) = 50776kB
Nov 13 21:07:07 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 13 21:07:07 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 13 21:07:07 kernel: 16236582 total pagecache pages
Nov 13 21:07:07 kernel: 0 pages in swap cache
Nov 13 21:07:07 kernel: Swap cache stats: add 0, delete 0, find 0/0
Nov 13 21:07:07 kernel: Free swap = 0kB
Nov 13 21:07:07 kernel: Total swap = 0kB
Nov 13 21:07:07 kernel: 33530456 pages RAM
Nov 13 21:07:07 kernel: 0 pages HighMem/MovableOnly
Nov 13 21:07:07 kernel: 551723 pages reserved
Nov 13 21:07:07 kernel: 0 pages hwpoisoned
複製代碼
咱們測試的進程是1764062,可是出發oom的進程是1332894,該進程對應的虛機的numatune配置也爲配置一,且運行virsh client獲取到的nodeset也是1
virsh # numatune c11a155a-95b0-4593-9ce5-f2a42dc0ccca
numa_mode : strict
numa_nodeset : 1
複製代碼
virsh client獲取到的虛機numatune以下:
virsh # numatune 638abba7-bba8-498b-88d6-ddc70f2cef18
numa_mode : preferred
numa_nodeset : 1
複製代碼
開始虛機的內存佔用以下
[@ ~]# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1332894 (qemu-kv 6 691 698
1897916 (qemu-kv 17 677 694
--------------- ------ ------ -----
Total 24 1368 1392
複製代碼
用memholder把node1內存佔用滿以後宿主的內存佔用
[@ ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 58403 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 56 MB
node distances:
node 0 1
0: 10 21
1: 21 10
複製代碼
虛機裏運行完memholder開始佔用內存以後,虛機的內存佔用以下
[@ ~]# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1332894 (qemu-kv 7 690 697
1897916 (qemu-kv 4012 682 4695
--------------- ------ ------ -----
Total 4019 1372 5391
複製代碼
宿主的內存佔用:
[@ ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 54395 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 55 MB
node distances:
node 0 1
0: 10 21
1: 21 10
複製代碼
從以上表現來看雖然preferred是node1可是當node1內存不足的時候,進程申請了node0的內存,並未引起oom
說明1308480這個進程是咱們測試的虛機進程
開始虛機內存佔用以下
[@ ~]# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1308480 (qemu-kv 141 584 725
1332894 (qemu-kv 0 707 708
--------------- ------ ------ -----
Total 141 1291 1432
複製代碼
宿主上的內存佔用以下
[@ ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 58241 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 131 MB
node distances:
node 0 1
0: 10 21
1: 21 10
複製代碼
虛機裏運行完memholder開始佔用內存以後,虛機的內存佔用以下:
[@ ~]# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1308480 (qemu-kv 4017 682 4699
1332894 (qemu-kv 7 681 688
--------------- ------ ------ -----
Total 4024 1363 5387
複製代碼
宿主上的內存佔用以下:
[@ ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 54410 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 55 MB
node distances:
node 0 1
0: 10 21
1: 21 10
複製代碼
從測試來看,第二種和第三種配置方式都不會致使因爲兩個node節點內存使用不均衡致使oom,可是哪一種配置性能更好還須要後續的測試。
參考連接