Docker與CGroup資源限制結合內幕深刻剖析-Docker商業環境實戰

專一於大數據及容器雲核心技術解密,可提供全棧的大數據+雲原平生臺諮詢方案,請持續關注本套博客。若有任何學術交流,可隨時聯繫。更多內容請關注《數據雲技術社區》公衆號。 linux

1 Linux Cgroup(新瓶裝舊酒)

  • Linux Cgroup最主要的做用是爲一個進程組設置資源使用的上限,這些資源包括CPU、內存、磁盤、網絡等。在linux中,Cgroup給用戶提供的操做接口是文件系統,其以文件和目錄的方式組織在/sys/fs/cgroup路徑下。
  • Cgroup的資源限制是經過目錄進行控制的,好比:在/sys/fs/cgroup/cpu目錄下面建立hello文件夾,將會自動生成一堆默認cpu限制文件。
  • docker首先會在/sys/fs/cgroup/cpu路徑下建立名爲docker的目錄
  • 緊接着會在docker的目錄下建立容器id名稱的子目錄
  • 對容器的cpu的使用限制是經過操做容器id子目錄下的文件設置達成的
  • 容器內的進程均受容器的資源設置限制
  • 其餘的資源好比內存、網絡等設置與cpu結構相同

2 Docker容器與Cgroup結合

  • 初始化場景下:Docker容器沒有啓動。
  • 由於Docker容器沒有運行,/sys/fs/cgroup/cpu/docker目錄下面沒有對應資源限制,以下所示:
  • 由於Docker容器沒有運行,/sys/fs/cgroup/memory/docker目錄下面沒有對應資源限制,以下所示:
  • 啓動運行時容器時,在/sys/fs/cgroup/cpu/docker目錄下面建立d65aa14f8c929631f83c267b5575b07771b148e2f200ba1756236104169ce917目錄
docker run -itd --rm --cpu-shares 512 progrium/stress --cpu 1  --timeout 1000s
複製代碼
  • 以下就是啓動運行時容器時對應的場景

3 Docker容器基於Cgroup進行壓測

3.1 CPU份額測試

  • 運行3個容器,指定容器的--cpu-share的值分別爲5十二、5十二、1024,這3個容器使用CPU的時間比例爲1:1:2,使用ctop或者top查看CPU利用率,理想的狀況下,CPU佔用接近25%、25%、50%
docker run -itd --rm --cpu-shares 512 progrium/stress --cpu 1  --timeout 1000s
docker run -itd --rm --cpu-shares 512 progrium/stress --cpu 1  --timeout 100s
docker run -itd --rm --cpu-shares 1024 progrium/stress --cpu 1  --timeout 100s
複製代碼
  • 啓動三個Docker容器進程
  • 查看CPU佔比爲1:1:2
  • 查看/sys/fs/cgroup/cpu/docker目錄下對應三個目錄,分別是:4ba04effda39be626d3bd1945b90a43e4ff99471a3296e05616c58a1c11ba873,d65aa14f8c929631f83c267b5575b07771b148e2f200ba1756236104169ce917,fe2f33bad9d15e42b7b2941528394e256ea7e119f3f5a563029c032a30519e5e
  • 查看cpu.shares限制文件對應的值512:512:1024

3.1 Memory份額測試

  • 運行2個stress容器,測試內存的佔用,每一個容器產生4個線程,第一個容器每一個線程消耗128MB內存,第二個容器的4個線程每一個消耗256MB內存:
docker stop $(docker ps -q) & docker rm $(docker ps -aq)

docker run --rm -it progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 60s
docker run -itd --rm  progrium/stress --vm 4 --vm-bytes 128M  --timeout 100s
docker run -itd --rm  progrium/stress --vm 4 --vm-bytes 256M  --timeout 100s

[root@worker3 local]# docker run -itd --rm progrium/stress --vm 4 --vm-bytes 128M --timeout 100s
888b0a8b4afdd92e241d0446c63d940cb486559f86263a070577a3860e0f5356
[root@worker3 local]# docker run -itd --rm progrium/stress --vm 4 --vm-bytes 256M --timeout 100s
25cc7585895d2e5f9fee4a1723d5cc09464c426c9213854ce56e1d6bb3c1256d

top - 23:31:52 up  2:23,  3 users,  load average: 5.95, 3.93, 2.45
Tasks: 113 total,   9 running, 104 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.7 us, 89.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1872956 total,   853736 free,   821600 used,   197620 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   848116 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                 
30972 root      20   0  138380  75868    256 R 12.3  4.1   0:01.57 stress                                                                                                  
30975 root      20   0  138380  21092    256 R 12.3  1.1   0:01.57 stress                                                                                                  
31022 root      20   0  269452 158148    256 R 12.3  8.4   0:01.54 stress                                                                                                  
31023 root      20   0  269452 133648    256 R 12.3  7.1   0:01.54 stress                                                                                                  
31025 root      20   0  269452 135696    256 R 12.3  7.2   0:01.54 stress                                                                                                  
30973 root      20   0  138380  71772    256 R 12.0  3.8   0:01.57 stress                                                                                                  
30974 root      20   0  138380  28556    256 R 12.0  1.5   0:01.57 stress                                                                                                  
31024 root      20   0  269452  31076    256 R 12.0  1.7   0:01.54 stress  


[root@worker3 docker]# ls
16a19075bbc6f7525bbcef670fcf920223d6a54396bad4393110fdf9c6afd57c  memory.kmem.max_usage_in_bytes      memory.memsw.failcnt             memory.stat
cgroup.clone_children                                             memory.kmem.slabinfo                memory.memsw.limit_in_bytes      memory.swappiness
cgroup.event_control                                              memory.kmem.tcp.failcnt             memory.memsw.max_usage_in_bytes  memory.usage_in_bytes
cgroup.procs                                                      memory.kmem.tcp.limit_in_bytes      memory.memsw.usage_in_bytes      memory.use_hierarchy
ecb8d7dac939ff1c45713928a406721f078fdce87965c04f63291b8ef4172717  memory.kmem.tcp.max_usage_in_bytes  memory.move_charge_at_immigrate  notify_on_release
memory.failcnt                                                    memory.kmem.tcp.usage_in_bytes      memory.numa_stat                 tasks
memory.force_empty                                                memory.kmem.usage_in_bytes          memory.oom_control
memory.kmem.failcnt                                               memory.limit_in_bytes               memory.pressure_level
memory.kmem.limit_in_bytes                                        memory.max_usage_in_bytes           memory.soft_limit_in_bytes
複製代碼

4 Docker容器基於Cgroup資源控制

4.1 Docker進程與tasks對應關係

  • 經過docker exec在容器中運行stress服務,並查看tasks文件
[root@worker3 local]# docker run -tid --name stressbash --entrypoint bash progrium/stress
998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33

[root@worker3 local]# docker exec -it stressbash stress --vm-bytes 128M --vm 4
stress: info: [28] dispatching hogs: 0 cpu, 0 io, 4 vm, 0 hdd

top - 23:51:43 up  2:43,  4 users,  load average: 4.16, 2.75, 2.90
Tasks: 110 total,   5 running, 105 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.6 us, 94.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1872956 total,  1252252 free,   429780 used,   190924 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  1244348 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                 
31933 root      20   0  138380  71760    188 R 25.3  3.8   0:30.33 stress                                                                                                  
31934 root      20   0  138380   4176    188 R 25.0  0.2   0:30.33 stress                                                                                                  
31935 root      20   0  138380  67664    188 R 25.0  3.6   0:30.33 stress                                                                                                  
31936 root      20   0  138380 122960    188 R 24.7  6.6   0:30.32 stress  

[root@worker3 ~]# docker top stressbash
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                31722               31711               0                   23:46               pts/3               00:00:00            bash
root                31922               31913               0                   23:49               pts/5               00:00:00            stress --vm-bytes 128M --vm 4
root                31933               31922               25                  23:49               pts/5               00:00:02            stress --vm-bytes 128M --vm 4
root                31934               31922               25                  23:49               pts/5               00:00:02            stress --vm-bytes 128M --vm 4
root                31935               31922               25                  23:49               pts/5               00:00:02            stress --vm-bytes 128M --vm 4
root                31936               31922               25                  23:49               pts/5               00:00:02            stress --vm-bytes 128M --vm 4

[root@worker3 998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33]# pwd
/sys/fs/cgroup/memory/docker/998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33

[root@worker3 998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33]# cat tasks 
31722
31922
31933
31934
31935
31936
複製代碼

4.2 Docker容器cpu限制

[root@worker3 local]# docker run -itd --rm progrium/stress --cpu 1 --vm-bytes 200M
e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027

[root@worker3 e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027]# cat cpu.cfs_period_us
100000
[root@worker3 e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027]# cat cpu.cfs_quota_us
-1
[root@worker3 e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027]# pwd
/sys/fs/cgroup/cpu/docker/e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027
可見沒有對stress容器作cpu限制

top - 23:57:59 up  2:50,  4 users,  load average: 2.04, 3.08, 3.08
Tasks: 104 total,   3 running, 101 sleeping,   0 stopped,   0 zombie
%Cpu(s): 99.7 us,  0.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1872956 total,  1536080 free,   145972 used,   190904 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  1528180 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                 
32032 root      20   0    7304    100      0 R 99.7  0.0   0:38.25 stress 
複製代碼
  • 對容器cpu作限制,--cpu-period=100000,--cpu-quota=60000
docker run -itd  --cpu-period 100000 --cpu-quota 60000 --rm  progrium/stress --cpu 1 --vm-bytes 200M
db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33

[root@worker3 db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33]# ls
cgroup.clone_children  cgroup.procs  cpuacct.usage         cpu.cfs_period_us  cpu.rt_period_us   cpu.shares  notify_on_release
cgroup.event_control   cpuacct.stat  cpuacct.usage_percpu  cpu.cfs_quota_us   cpu.rt_runtime_us  cpu.stat    tasks
[root@worker3 db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33]# cat cpu.cfs_period_us 
100000
[root@worker3 db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33]# cat cpu.cfs_quota_us 
60000

top命令
top - 00:03:42 up  2:55,  4 users,  load average: 0.63, 1.43, 2.35
Tasks: 104 total,   2 running, 102 sleeping,   0 stopped,   0 zombie
%Cpu(s): 58.2 us,  0.0 sy,  0.0 ni, 41.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1872956 total,  1533696 free,   148388 used,   190872 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  1525792 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                              
32164 root      20   0    7304     96      0 R 60.1  0.0   1:06.34 stress 
複製代碼

4.2 Docker容器內存限制

  • 重點關注 --memory,--memory-swap,--memory-swappiness三個參數
[root@worker3 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G        145M        1.5G        9.6M        186M        1.5G
Swap:            0B          0B          0B

[root@worker3 local]# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@worker3 local]# docker run -itd --name stress --memory 1G --memory-swap 3G --memory-swappiness 20 --entrypoint bash progrium/stress737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f
[root@worker3 local]# free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G        145M        1.5G        9.6M        186M        1.5G
Swap:            0B          0B          0B
[root@worker3 local]# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
737f2b75ab73        progrium/stress     "bash"              24 seconds ago      Up 24 seconds   


[root@worker3 docker]# pwd
/sys/fs/cgroup/memory/docker
[root@worker3 docker]# cd 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f/
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# ls
cgroup.clone_children           memory.kmem.slabinfo                memory.memsw.failcnt             memory.soft_limit_in_bytes
cgroup.event_control            memory.kmem.tcp.failcnt             memory.memsw.limit_in_bytes      memory.stat
cgroup.procs                    memory.kmem.tcp.limit_in_bytes      memory.memsw.max_usage_in_bytes  memory.swappiness
memory.failcnt                  memory.kmem.tcp.max_usage_in_bytes  memory.memsw.usage_in_bytes      memory.usage_in_bytes
memory.force_empty              memory.kmem.tcp.usage_in_bytes      memory.move_charge_at_immigrate  memory.use_hierarchy
memory.kmem.failcnt             memory.kmem.usage_in_bytes          memory.numa_stat                 notify_on_release
memory.kmem.limit_in_bytes      memory.limit_in_bytes               memory.oom_control               tasks
memory.kmem.max_usage_in_bytes  memory.max_usage_in_bytes           memory.pressure_level
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# cat memory.limit_in_bytes 
1073741824 對應--memory=1G
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# cat memory.memsw.limit_in_bytes 
3221225472  對應--memory-swap=3G
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# cat memory.swappiness 
20  對應--memory-swappiness 20
複製代碼

4.3 容器oom kill分析

[root@worker3 local]# free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G        142M        1.5G        9.5M        191M        1.5G
Swap:            0B          0B          0B
[root@worker3 local]# docker run --rm -it progrium/stress --cpu 1 --vm 2 --vm-bytes 19.9999G
stress: info: [1] dispatching hogs: 1 cpu, 0 io, 2 vm, 0 hdd
stress: dbug: [1] using backoff sleep of 9000us
stress: dbug: [1] --> hogcpu worker 1 [5] forked
stress: dbug: [1] --> hogvm worker 2 [6] forked
stress: dbug: [1] using backoff sleep of 3000us
stress: dbug: [1] --> hogvm worker 1 [7] forked
stress: dbug: [7] allocating 20401094656 bytes ...
stress: FAIL: [7] (495) hogvm malloc failed: Cannot allocate memory
stress: FAIL: [1] (395) <-- worker 7 returned error 1
stress: WARN: [1] (397) now reaping child worker processes
stress: dbug: [1] <-- worker 5 reaped
stress: dbug: [1] <-- worker 6 reaped
stress: FAIL: [1] (452) failed run completed in 0s
複製代碼
  • 當容器使用完宿主機內存,檢測容器內存不足,將會oom被kill
  • 對容器設置memory限制且不使用swap,並設置--oom-kill-disable=true,通過一段時間,容器並無發生oom,stats命令中的MEM %參數的值一直是100.00%
docker run --rm -it --memory 5G -memory-swap 5G --oom-kill-disable=true progrium/stress --cpu 1 --vm 2 --vm-bytes 1G

去掉 --oom-kill-disable=true 的設置,檢測結果,很快容器就發生了oom killer事件,
可見容器只有在設置了memory限制以後,--oom-kill-disable纔會起做用
複製代碼

5 總結

本文針對Docker與Cgroup資源限制結合再次進行深度總結,花費將近5個小時。docker

專一於大數據及容器雲核心技術解密,可提供全棧的大數據+雲原平生臺諮詢方案,請持續關注本套博客。若有任何學術交流,可隨時聯繫。更多內容請關注《數據雲技術社區》公衆號。 api

相關文章
相關標籤/搜索