Linux性能優化（十五）——CPU綁定

時間 2021-01-19

標籤 python linux 算法編程緩存多線程架構 ide 工具性能欄目 Linux 简体版

原文原文鏈接

1、孤立CPU

一、孤立CPU簡介

針對CPU密集型的任務，CPU負載較高，推薦設置CPU Affinity，以提升任務執行效率，避免CPU進行上下文切換，提升CPU Cache命中率。
默認狀況下，Linux內核調度器可使用任意CPU核心，若是特定任務（進程/線程）須要獨佔一個CPU核心而且不想讓其它任務（進程/線程）使用時，能夠把指定CPU孤立出來，不讓其它進程使用。python

二、孤立CPU的特色

孤立CPU能夠有效地提升孤立CPU上任務運行的實時性，在保證孤立CPU上任務運行的同時會減小了其它任務能夠運行的CPU資源，所以須要對計算機CPU資源進行規劃。linux

三、孤立CPU設置

Linux Kernel中isolcpus啓動參數用於在SMP均衡調度算法中將一個或多個CPU孤立出來，經過CPU Affinity設置將指定進程置於孤立CPU運行。
isolcpus= cpu_number [, cpu_number ,...]
（1）修改grub配置文件
默認grub配置爲/etc/default/grub，GRUB_CMDLINE_LINUX值中加入isolcpus=11,12,13,14,15，全部CPU核心必須用逗號進行分隔，不支持區域範圍。
GRUB_CMDLINE_LINUX="isolcpus=1,2 crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet"
（2）更新grub
從新生成grub引導文件/boot/grub/grub.cfg，重啓系統生效。算法

update-grub
update-grub2
grub-mkconfig -o /boot/grub/grub.cfg

一旦Linux Kernel使用isolcpus參數啓動，Linux Kernel任務均衡調度器不會再將進程調度給指定CPU核心，用戶一般須要使用taskset或cset命令將進程綁定到CPU核心。編程

2、CPU綁定簡介

一、CPU核心簡介

超線程技術(Hyper-Threading)是利用特殊的硬件指令，把兩個邏輯內核(CPU core)模擬成兩個物理芯片，讓單個處理器都能使用線程級並行計算，進而兼容多線程操做系統和軟件，減小了CPU的閒置時間，提升CPU的運行效率。
物理CPU是計算機主板上安裝的CPU。
邏輯CPU是一顆物理CPU上的物理CPU核心，一般一顆物理CPU有多顆物理內核，即有多個邏輯CPU。若是支持Intel超線程技術(HT)，能夠在邏輯CPU上再分一倍數量的CPU Core。
cat /proc/cpuinfo|grep "physical id"|sort -u|wc -l
查看物理CPU個數
cat /proc/cpuinfo|grep "cpu cores"|uniq
查看每一個物理CPU中core的個數(即核數)
cat /proc/cpuinfo|grep "processor"|wc -l
查看邏輯CPU的個數
cat /proc/cpuinfo|grep "name"|cut -f2 -d:|uniq
查看CPU的名稱型號
ps -eo pid,args,psr
查看進程運行的邏輯CPU緩存

二、CPU綁定簡介

CPU綁定是對進程或線程設置相應的CPU Affinity，確保進程或線程只會在設置有相應標誌位的CPU上運行，進而提升應用程序對CPU的使用效率。若是應用能夠在多個CPU上運行，操做系統會在CPU之間頻繁切換應用，引發CPU緩存失效，下降緩存的命中率，致使CPU使用效率降低。使用CPU綁定技術能夠在必定程度上會避免CPU Cache失效，提高系統性能。
CPU affinity是一種調度屬性(scheduler property)，能夠將一個進程綁定到一個或一組CPU上。
在SMP(Symmetric Multi-Processing對稱多處理)架構下，Linux調度器(scheduler)會根據CPU affinity設置讓指定的進程運行在綁定的CPU上，而不會在其它CPU上運行.，
Linux調度器一樣支持天然CPU親和性(natural CPU affinity): 調度器會試圖保持進程在相同的CPU上運行, 這意味着進程一般不會在處理器之間頻繁遷移,進程遷移的頻率小就意味着產生的負載小。
由於程序的做者比調度器更瞭解程序,因此咱們能夠手動地爲其分配CPU核，而不會過多地佔用CPU0，或是讓咱們關鍵進程和一堆別的進程擠在一塊兒,全部設置CPU親和性可使某些程序提升性能。
Linux內核進程調度器天生具備軟CPU親和性（affinity）特性，進程一般不會在處理器之間頻繁遷移。
查看全部進程CPU分配狀況
ps -eo pid,cmd,psr
查看進程的全部線程的CPU分配狀況
ps -To 'pid,lwp,psr,cmd' -p [PID]多線程

三、CPU綁定的特色

將進程/線程與CPU綁定，能夠顯著提升CPU Cache命中率，從而減小內存訪問損耗，提升應用性能。我以爲在NUMA架構下，這個操做對系統運行速度的提高有較大的意義，而在SMP架構下，這個提高可能就比較小。這主要是由於二者對於cache、總線這些資源的分配使用方式不一樣形成的，NUMA架構下，每一個CPU有本身的一套資源體系；SMP架構下，每一個核心仍是須要共享這些資源的。
每一個CPU核運行一個進程的時候，因爲每一個進程的資源都獨立，因此CPU核心之間切換的時候無需考慮上下文；每一個CPU核運行一個線程的時候，有時線程之間須要共享資源，因此共享資源必須從CPU的一個核心被複制到另一個核心，形成額外開銷。架構

四、taskset綁定進程

yum install util-linux
安裝taskset工具
taskset [options] [mask] -p pid
查看進程的CPU Affinity，使用-p選項指定PID，默認打印十六進制數，若是指定-cp選項打印CPU核列表。3的二進制形式是0011，對應-cp打印0和1，表示進程只能運行在CPU的第0個核和第1個核。
taskset -c -p pid
查看指定進程的CPU Affinityide

taskset -p mask pid
taskset -c [CPU NUMBER] -p PID

設置指定進程的CPU Affinity，對於孤立CPU，只有第一個CPU有效。
使用11,12,13,14,15號CPU運行進程工具

taskset -c 11,12,13,14,15 python xx.py
taskset -c 11-15 python xx.py

Docker容器中，孤立CPU仍然能夠被使用；建立Docker容器時能夠經過參數--cpuset-cpus指定容器只能使用哪些CPU，實現Docker容器內孤立CPU。性能

五、cset綁定進程

cset set --cpu CPU CPUSET NAME
定義CPU核心集合，對於獨立CPU，只有第一個CPU核心有效。
cset proc --move --pid=PID,...,PID --toset=CPUSET NAME
移動多個進程到指定CPU集合

3、進程綁定CPU

一、系統調用API

#define _GNU_SOURCE        
#include <sched.h>
int sched_setaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask);
int sched_getaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask);

參數：
pid：進程號，若是pid值爲0，則表示指定當前進程。
cpusetsize：mask參數所指定數的長度，一般設定爲sizeof(cpu_set_t)。
mask：CPU掩碼

二、編程實現

#include<stdlib.h>
#include<stdio.h>
#include<sys/types.h>
#include<sys/sysinfo.h>
#include<unistd.h>

#define __USE_GNU
#include<sched.h>
#include<ctype.h>
#include<string.h>
#include<pthread.h>

#define THREAD_MAX_NUM 10  //1個CPU內的最多進程數
int CPU_NUM = 0;  //cpu中核數
int CPU = 3; // CPU編號

void* threadFun(void* arg)
{
    cpu_set_t mask;  //CPU核的集合

    CPU_ZERO(&mask);
    // set CPU MASK
    CPU_SET(CPU, &mask);
    //設置當前進程的CPU Affinity
    if (sched_setaffinity(0, sizeof(mask), &mask) == -1)
    {
        printf("warning: could not set CPU affinity, continuing...\n");
    }
    cpu_set_t affinity;   //獲取在集合中的CPU
    CPU_ZERO(&affinity);
    // 獲取當前進程的CPU Affinity
    if (sched_getaffinity(0, sizeof(affinity), &affinity) == -1)
    {
        printf("warning: cound not get Process affinity, continuing...\n");
    }
    int i = 0;
    for (i = 0; i < CPU_NUM; i++)
    {
        if (CPU_ISSET(i, &affinity))//判斷線程與哪一個CPU有親和力
        {
            printf("this thread %d is running processor : %d\n", *((int*)arg), i);
        }
    }

    return NULL;
}

int main(int argc, char* argv[])
{
    int tid[THREAD_MAX_NUM];
    pthread_t thread[THREAD_MAX_NUM];
    // 獲取核數
    CPU_NUM = sysconf(_SC_NPROCESSORS_CONF);
    printf("System has %i processor(s). \n", CPU_NUM);
    int i = 0;
    for(i=0;i<THREAD_MAX_NUM;i++)
    {
        tid[i] = i;
        pthread_create(&thread[i],NULL,threadFun, &tid[i]);
    }
    for(i=0; i< THREAD_MAX_NUM; i++)
    {
        pthread_join(thread[i],NULL);
    }
    return 0;
}

編譯：
gcc -o test test.c -pthread
運行結果：

System has 4 processor(s). 
this thread 1 is running processor : 3
this thread 0 is running processor : 3
this thread 4 is running processor : 3
this thread 9 is running processor : 3
this thread 7 is running processor : 3
this thread 5 is running processor : 3
this thread 6 is running processor : 3
this thread 8 is running processor : 3
this thread 3 is running processor : 3
this thread 2 is running processor : 3

三、taskset綁定進程至CPU

（1）綁定進程至指定CPU

taskset -pc CPU_NUMBER  PID
taskset -p PID

查看進程的CPU Affinity
（2）進程啓動時綁定至CPU
taskset -c CPU_NUMBER PROGRAM&
啓動PROGRAM程序後臺運行，綁定進程至CPU_NUMBER核心，
taskset -p PID
查看進程的CPU Affinity

4、線程綁定CPU

一、系統調用API

#define _GNU_SOURCE            
#include <pthread.h>
int pthread_setaffinity_np(pthread_t thread, size_t cpusetsize, const cpu_set_t *cpuset);
int pthread_getaffinity_np(pthread_t thread, size_t cpusetsize, cpu_set_t *cpuset)

參數：
pthead：線程對象
cpusetsize：mask參數所指定數的長度，一般設定爲sizeof(cpu_set_t)。
mask：CPU掩碼

二、編程實現

#include<stdlib.h>
#include<stdio.h>
#include<sys/types.h>
#include<sys/sysinfo.h>
#include<unistd.h>

#define __USE_GNU
#include<sched.h>
#include<ctype.h>
#include<string.h>
#include<pthread.h>

#define THREAD_MAX_NUM 10  //1個CPU內的最多進程數
int CPU_NUM = 0;  //cpu中核數
int CPU = 3; // CPU編號

void* threadFun(void* arg)
{
    cpu_set_t affinity;   //獲取在集合中的CPU
    CPU_ZERO(&affinity);
    pthread_t thread = pthread_self();
    // 獲取當前進程的CPU Affinity
    if (pthread_getaffinity_np(thread, sizeof(affinity), &affinity) == -1)
    {
        printf("warning: cound not get Process affinity, continuing...\n");
    }
    int i = 0;
    for (i = 0; i < CPU_NUM; i++)
    {
        if (CPU_ISSET(i, &affinity))//判斷線程與哪一個CPU有親和力
        {
            printf("this thread %d is running processor : %d\n", *((int*)arg), i);
        }
    }

    return NULL;
}

int main(int argc, char* argv[])
{
    int tid[THREAD_MAX_NUM];
    pthread_t thread[THREAD_MAX_NUM];
    // 獲取核數
    CPU_NUM = sysconf(_SC_NPROCESSORS_CONF);
    printf("System has %i processor(s). \n", CPU_NUM);
    cpu_set_t mask;  //CPU核的集合

    CPU_ZERO(&mask);
    // set CPU MASK
    CPU_SET(CPU, &mask);

    int i = 0;
    for(i=0;i<THREAD_MAX_NUM;i++)
    {
        tid[i] = i;
        pthread_create(&thread[i],NULL,threadFun, &tid[i]);
        //設置當前進程的CPU Affinity
        if (pthread_setaffinity_np(thread[i], sizeof(mask), &mask) != 0)
        {
            printf("warning: could not set CPU affinity, continuing...\n");
        }
    }
    for(i=0; i< THREAD_MAX_NUM; i++)
    {
        pthread_join(thread[i],NULL);
    }
    return 0;
}

編譯：
gcc -o test test.c -pthread
運行結果：

System has 4 processor(s). 
this thread 0 is running processor : 3
this thread 1 is running processor : 3
this thread 2 is running processor : 3
this thread 3 is running processor : 3
this thread 5 is running processor : 3
this thread 4 is running processor : 3
this thread 6 is running processor : 3
this thread 9 is running processor : 3
this thread 7 is running processor : 3
this thread 8 is running processor : 3