Linux Cgroups 是 Linux 內核中用來爲進程設置資源限制的一個重要功能. Cgroups將進程進行分組, 而後對這一組進程進行統一的資源監控和限制。Cgroups當前有V1和V2版本,爲了後續用於實現簡單容器sdocker,這裏只驗證V1版本的cpu和memory子系統。linux
Linux能夠經過以下命令來查看當前系統支持的cgroup子系統:docker
1 linux: # cat /proc/cgroups 2 #subsys_name hierarchy num_cgroups enabled 3 cpuset 11 1 1 4 cpu 2 78 1 5 cpuacct 2 78 1 6 blkio 3 78 1 7 memory 9 79 1 8 devices 4 78 1 9 freezer 8 1 1 10 net_cls 7 78 1 11 perf_event 5 1 1 12 net_prio 7 78 1 13 hugetlb 6 1 1 14 pids 10 86 1 15 linux: #
有的系統(debian8/suse12), cgroup.memory沒有啓用, 這時可能會影響到下面幾個方面:segmentfault
1. 在/sys/fs/cgroup/memory下創建目錄失敗, 提示readonly; 2. docker info裏面也會有提示信息; 3. 使用kubeadm安裝kubernetes時會提示錯誤;
解決辦法, 在/etc/default/grub文件中增長以下選項(debian使用update_grub, suse使用grub2-mkconfig, 而後reboot):app
1 linux: # cat /etc/default/grub | grep cgroup_enable 2 GRUB_CMDLINE_LINUX="cgroup_enable=memory" 3 linux: #
對於Cgroup.CPU,限制cpu利用率主要經過修改下面兩個文件來實現:ide
1 /sys/fs/cgroup/cpu/cpu.cfs_quota_us 2 /sys/fs/cgroup/cpu/cpu.cfs_period_us
把cpu.cfs_quota_us / cpu.cfs_period_us(默認100000)的值做爲能夠使用的CPU的百分比。使用方法舉例以下(摘錄自附錄網頁):測試
1 Examples 2 -------- 3 1. Limit a group to 1 CPU worth of runtime. 4 5 If period is 250ms and quota is also 250ms, the group will get 6 1 CPU worth of runtime every 250ms. 7 8 # echo 250000 > cpu.cfs_quota_us /* quota = 250ms */ 9 # echo 250000 > cpu.cfs_period_us /* period = 250ms */ 10 11 2. Limit a group to 2 CPUs worth of runtime on a multi-CPU machine. 12 13 With 500ms period and 1000ms quota, the group can get 2 CPUs worth of 14 runtime every 500ms. 15 16 # echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */ 17 # echo 500000 > cpu.cfs_period_us /* period = 500ms */ 18 19 The larger period here allows for increased burst capacity. 20 21 3. Limit a group to 20% of 1 CPU. 22 23 With 50ms period, 10ms quota will be equivalent to 20% of 1 CPU. 24 25 # echo 10000 > cpu.cfs_quota_us /* quota = 10ms */ 26 # echo 50000 > cpu.cfs_period_us /* period = 50ms */ 27 28 By using a small period here we are ensuring a consistent latency 29 response at the expense of burst capacity.
針對Cgroup.CPU進行測試,對於以下的cpu密集型程序, 啓動後從top中能夠看到cpu佔用100%:ui
1 linux: # cat cpu.c 2 int main(void) { 3 for (; ;); 4 5 return 0; 6 } 7 linux: #
1 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2 4033 root 20 0 4052 708 632 R 100.00 0.002 1:33.02 cpu
經過給cpu.cfs_quota_us賦值20000,同時把程序pid賦值給tasks文件,讓程序只能使用1/5的cpu。spa
1 linux: # mkdir /sys/fs/cgroup/cpu/sdocker 2 linux: # mkdir /sys/fs/cgroup/cpu/sdocker/4033 3 linux: # echo 20000 > /sys/fs/cgroup/cpu/sdocker/4033/cpu.cfs_quota_us 4 linux: # echo 4033 > /sys/fs/cgroup/cpu/sdocker/4033/tasks
設置後當即生效,top能夠看到進程cpu佔用率在20%左右波動:code
1 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2 4033 root 20 0 4052 708 632 R 20.202 0.002 1:57.80 cpu
退出程序並清理cgroup資源:blog
1 linux: # kill -9 4033 2 linux: # rmdir /sys/fs/cgroup/cpu/sdocker/4033/
/sys/fs/cgroup/memory下定義了Cgroup.Memory子系統的相關文件, 各文件含義以下:
1 cgroup.event_control #用於eventfd的接口 2 memory.usage_in_bytes #顯示當前已用的內存字節數 3 memory.limit_in_bytes #設置/顯示當前限制的內存額度, 當usage_in_bytes超限時, 若是memory.swappiness配置可以使用swap, kernel會優先把內存數據轉移到swap空間, 最後若轉移swap失敗, 則根據memory.oom_control判斷是否觸發oom 4 memory.failcnt #顯示內存使用量達到限制值的次數, 當usage_in_bytes超限時, 會觸發該值增長 5 memory.max_usage_in_bytes #歷史內存最大使用量 6 memory.soft_limit_in_bytes #設置/顯示當前限制的內存軟額度 7 memory.stat #顯示當前cgroup的內存使用狀況 8 memory.use_hierarchy #設置/顯示是否將子cgroup的內存使用狀況統計到當前cgroup裏面 9 memory.force_empty #觸發系統當即儘量的回收當前cgroup中能夠回收的內存 10 memory.pressure_level #設置內存壓力的通知事件,配合cgroup.event_control一塊兒使用 11 memory.swappiness #設置和顯示當前的swappiness 12 memory.move_charge_at_immigrate #設置當進程移動到其餘cgroup中時,它所佔用的內存是否也隨着移動過去 13 memory.oom_control #設置/顯示oom controls相關的配置, 默認0啓用 14 memory.numa_stat #顯示numa相關的內存
針對Cgroup.Memory進行測試,以下的測試代碼經過不斷分配內存來觸發內存限制功能:
1 linux: # cat memory.cpp 2 #include <unistd.h> 3 4 #include <csignal> 5 #include <cstdlib> 6 #include <cstdio> 7 #include <cstring> 8 #include <vector> 9 using std::vector; 10 11 vector<int *> g_mem_pointer; 12 13 void sig_handler(int sig) { 14 printf("\n%d handle\n", sig); 15 for (auto p : g_mem_pointer) { 16 free(p); 17 } 18 19 exit(-1); 20 } 21 22 int main(void) { 23 unsigned total_mem = 0, chunk_size = 1024 * 1024; 24 25 signal(SIGTERM, sig_handler); 26 signal(SIGINT, sig_handler); 27 28 int *p; 29 while (1) { 30 if (NULL == (p = (int *)malloc(chunk_size))) { 31 printf("[-] malloc failed!\n"); 32 kill(getpid(), 15); 33 } 34 35 memset(p, 0xff, chunk_size); 36 g_mem_pointer.push_back(p); 37 total_mem += chunk_size; 38 printf("[+] malloc size: %u\n", total_mem); 39 sleep(10); 40 } 41 42 return 0; 43 } 44 linux: #
設置內存限制6m到memory.limit_in_bytes,同時把進程pid設置到tasks文件, 一段時間後能夠看到進程oom-kill.
測試發現進程實際打印分配的總內存遠遠大於設置的內存上限時, memory.usage_in_bytes中的數值纔會慢慢趨近於memory.limit_in_bytes,即便設置memory.swappiness爲0也如此;
1 linux:~ # mkdir /sys/fs/cgroup/memory/sdocker 2 linux:~ # mkdir /sys/fs/cgroup/memory/sdocker/5239 3 linux:~ # echo 6m > /sys/fs/cgroup/memory/sdocker/5239/memory.limit_in_bytes 4 linux:~ # cat /sys/fs/cgroup/memory/sdocker/5239/memory.limit_in_bytes 5 32768 6 linux:~ # echo 5239 > /sys/fs/cgroup/memory/sdocker/5239/tasks 7 linux:~ # rmdir /sys/fs/cgroup/memory/sdocker/5239
參考網址:
1 https://segmentfault.com/u/wuyangchun 2 https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt 3 https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt