01-runC-介紹與命令

時間 2020-12-01

標籤 html node linux git github docker json 安全 bash app 欄目 HTML 简体版

原文原文鏈接

一、什麼是runC ?

根據官方定義

runC是一個根據OCI（Open Container Initiative）標準建立並運行容器的CLI tool

runC簡介

容器的工業級標準化組織OCI(Open Container Initiative)出爐，這是業界大佬爲避免容器生態和Docker耦合過緊作的努力，也是Docker作出的妥協
runC是輕量級的可移植容器運行時，對環境可以抽象底層主機的詳細信息（以實現可移植性），而無需徹底重寫應用程序（以實現廣泛存在），而且不引入過多的性能開銷（用於擴展）

runC功能包括

全面支持Linux名稱空間，包括用戶名稱空間；
對Linux中可用的全部安全功能的本地支持：Selinux，Apparmor，seccomp，cgroups，capability，pivotroot，uid / gid刪除等。若是Linux能夠作到，runC也能夠;
在Parallels的CRIU團隊的幫助下，對實時遷移的本機支持；
Microsoft工程師直接爲Windows 10容器提供了本機支持•計劃的對Arm，Power，Sparc的本機支持將由Arm，Intel，高通，IBM和整個硬件製造商生態系統；
可移植的性能配置文件，由Google工程師根據他們在生產中部署容器的經驗提供
正式指定的配置格式，在Linux Foundation的主持下由Open Container Project管理。換句話說：這是一個真正的標準 (OCP 已改名爲OCI)

二、runC 命令

runc 命令總覽, 後面版本參數有小改動，但影響不大；

$ runc -h

VERSION:

   1.0.0-rc4+dev

commit: 3f2f8b84a77f73d38244dd690525642a72156c64

spec: 1.0.0

COMMANDS:

     checkpoint checkpoint a running container

     create create a container

     delete delete any resources held by the container often used with detached container

     events display container events such as OOM notifications, cpu, memory, and IO usage statistics

     exec execute new process inside the container

     init initialize the namespaces and launch the process (do not call it outside of runc)
     kill kill sends the specified signal (default: SIGTERM) to the container's init process

     list lists containers started by runc with the given root

     pause pause suspends all processes inside the container

     ps ps displays the processes running inside a container

     restore restore a container from a previous checkpoint

     resume resumes all processes that have been previously paused

     run create and run a container

     spec create a new specification file

     start executes the user defined process in a created container

     state output the state of a container

     update update container resource constraints

     help, h Shows a list of commands or help for one command

GLOBAL OPTIONS:

   --debug enable debug output for logging

   --log value set the log file path where internal debug information is written (default: "/dev/null")

   --log-format value set the format used by logs ('text' (default), or 'json') (default: "text")

   --root value root directory for storage of container state (this should be located in tmpfs) (default: "/run/runc")

   --criu value path to the criu binary used for checkpoint and restore (default: "criu")

   --systemd-cgroup enable systemd cgroup support, expects cgroupsPath to be of form "slice:prefix:name" for e.g. "system.slice:runc:434234"

   --help, -h show help

   --version, -v print the version

源碼文件入口

runc/main.go ; 全部的子命令都彙總在這做爲入口

...

...

func main() {

 app := cli.NewApp()

 app.Name = "runc"

 app.Usage = usage

 var v []string

 if version != "" {

  v = append(v, version)

 }

 if gitCommit != "" {

  v = append(v, "commit: "+gitCommit)

 }

 v = append(v, "spec: "+specs.Version)

 v = append(v, "go: "+runtime.Version())

 if seccomp.IsEnabled() {

  major, minor, micro := seccomp.Version()

  v = append(v, fmt.Sprintf("libseccomp: %d.%d.%d", major, minor, micro))

 }

 app.Version = strings.Join(v, "\n")

 xdgRuntimeDir := ""

 root := "/run/runc"

 if shouldHonorXDGRuntimeDir() {

  if runtimeDir := os.Getenv("XDG_RUNTIME_DIR"); runtimeDir != "" {

   root = runtimeDir + "/runc"

   xdgRuntimeDir = root

  }

 }

 app.Flags = []cli.Flag{

  cli.BoolFlag{

   Name: "debug",

   Usage: "enable debug output for logging",

  },

  cli.StringFlag{

   Name: "log",

   Value: "",

   Usage: "set the log file path where internal debug information is written",

  },

  cli.StringFlag{

   Name: "log-format",

   Value: "text",

   Usage: "set the format used by logs ('text' (default), or 'json')",

  },

  cli.StringFlag{

   Name: "root",

   Value: root,

   Usage: "root directory for storage of container state (this should be located in tmpfs)",

  },

  cli.StringFlag{

   Name: "criu",

   Value: "criu",

   Usage: "path to the criu binary used for checkpoint and restore",

  },

  cli.BoolFlag{

   Name: "systemd-cgroup",

   Usage: "enable systemd cgroup support, expects cgroupsPath to be of form \"slice:prefix:name\" for e.g. \"system.slice:runc:434234\"",

  },

  cli.StringFlag{

   Name: "rootless",

   Value: "auto",

   Usage: "ignore cgroup permission errors ('true', 'false', or 'auto')",

  },

 }

// 全部子命令的入口

 app.Commands = []cli.Command{

  checkpointCommand,

  createCommand,

  deleteCommand,

  eventsCommand,

  execCommand,

  initCommand,

  killCommand,

  listCommand,

  pauseCommand,

  psCommand,

  restoreCommand,

  resumeCommand,

  runCommand,

  specCommand,

  startCommand,

  stateCommand,

  updateCommand,

 }

 app.Before = func(context *cli.Context) error {

  if !context.IsSet("root") && xdgRuntimeDir != "" {

   // According to the XDG specification, we need to set anything in

   // XDG_RUNTIME_DIR to have a sticky bit if we don't want it to get

   // auto-pruned.

   if err := os.MkdirAll(root, 0700); err != nil {

    fmt.Fprintln(os.Stderr, "the path in $XDG_RUNTIME_DIR must be writable by the user")

    fatal(err)

   }

   if err := os.Chmod(root, 0700|os.ModeSticky); err != nil {

    fmt.Fprintln(os.Stderr, "you should check permission of the path in $XDG_RUNTIME_DIR")

    fatal(err)

   }

  }

  if err := reviseRootDir(context); err != nil {

   return err

  }

  return logs.ConfigureLogging(createLogConfig(context))

 }

 // If the command returns an error, cli takes upon itself to print

 // the error on cli.ErrWriter and exit.

 // Use our own writer here to ensure the log gets sent to the right location.

 cli.ErrWriter = &FatalWriter{cli.ErrWriter}

 if err := app.Run(os.Args); err != nil {

  fatal(err)

 }

}

...

子命令文件都是統一放在 main 文件的同層目錄下，後面繼續詳解全部命令

2.一、runC 命令使用前準備

獲取一個鏡像，使用docker pull 鏡像

$ docker pull busybox

$ mkdir -p /tmp/mycontainer/rootfs

$ cd /tmp/mycontainer

$ docker export $(docker create busybox) | tar -C rootfs -xvf -

在 rootfs 目錄下就是 busybox 鏡像的文件系統，而後生成 config.json 文件，使用runc 命令；

# 該命令是根據OCI 規範來生成配置文件，後面會該命令源碼的解析，這裏再也不展開；     

$ runc spec 

$ ls 

config.json rootfs

若是直接使用生成的 config.json，接下來的演示不會太流暢，因此簡單起見，咱們稍微修改一下剛剛生成的 config.json 文件。就是把 "terminal": true 改成 false，把 "args": ["sh"] 改成 "args": ["sleep", "3600"]
爲何要把 terminal 的true 改成false ？是由於受限於 create 容器時若是開啓 terminal 參數，是須要提供一個「可接收引用控制檯僞終端主端的文件描述符」即socket 文件的路徑，不然啓動失敗；

{

"ociVersion": "1.0.0",

 "process": {

  "terminal":false,

  "user": {

   "uid": 0,

   "gid": 0

  },

  "args": [

   "/bin/sleep", "3600"

  ],

...

}

容器狀態
creating:使用 create 命令建立容器，這個過程稱爲建立中。
created:容器已經建立出來，可是尚未運行，表示鏡像文件和配置沒有錯誤，容器可以在當前平臺上運行。
running：容器裏面的進程處於運行狀態，正在執行用戶設定的任務。
stopped：容器運行完成，或者運行出錯，或者kill 命令以後，容器處於暫停狀態。這個狀態，容器還有不少信息保存在平臺中，並無徹底被刪除。
paused：暫停容器中的全部進程，可使用 resume 命令恢復這些進程的執行

三、runC 命令演示

ps: 演示時有些結果和實際書寫時內容有些不一致，但不影響實驗過程。html

3.0、切換工做目錄

$ cd /tmp/mycontainer

3.一、runc create

建立一個容器

$ sudo runc create demo 1

$ runc list

使用 state 命令查看容器的狀態, 當前狀態已經爲created

$ sudo runc state demo1

{

  "ociVersion": "1.0.0",

  "id": "demo1",

  "pid": 29314,

  "status": "created",

  "bundle": "/tmp/mycontainer/",

"rootfs": "/tmp/mycontainer/rootfs",

  "created": "2020-11-29T06:42:30.366937499Z",

  "owner": ""

}

使用 runc ps 能夠查看當前容器在跑什麼進程

$ sudo runc ps demo1

能夠看到如今該狀態是跑着一個叫init 的進程，咱們須要執行的sleep 命令並未執行；init 進程是幫咱們初始化整個容器的運行環境，後面源碼會詳細介紹；node

3.二、runc run

啓動容器，此次是執行咱們定義的命令

$ sudo runc start demo1

$ sudo runc list

查看當前容器運行什麼進程，能夠看到當前進程是咱們設定的sleep 進程；

$ sudo runc ps demo1

3.三、runc exec

該命令的意思是進入容器中執行命令，不一樣於ps 命令的實現，ps 命令只是從外部查看該進程的的狀態相似於 ps aux | grep <PID>的形式；exec 是使用c 代碼setns 實現的nsenter，與linux nsenter 相似，用於進入程序的namespace；
更多詳解，期待後面的源碼分析
示例一

$ sudo runc exec  demo1 ps

示例二，分配一個 tty 進入sh程序

$ sudo runc exec  -t demo1 /bin/sh

# 進入了demo1 的進程空間

/ $ ls /

bin dev etc home proc root sys tmp usr var

/ $ ls ~

/ $

/ $ ps aux

PID USER TIME COMMAND

    1 root 0:00 /bin/sleep 100000000

    9 root 0:00 /bin/sh

   15 root 0:00 ps aux

/ $

3.四、runc pause/resume

暫停容器，核心原理是利用了 cgroup 的子系統 freezer 來實現進程的掛起，freezer成批做業管理系統頗有用，能夠成批啓動/中止任務，以達到及其資源的調度。

* 如何使用freezer，期待後面的源碼分析linux

$ sudo runc state demo1

{

  "ociVersion": "1.0.0",

  "id": "demo1",

  "pid": 29314,

# 當前status 是running

  "status": "running",   

  "bundle": "/data/docker_lab/bundle",

  "rootfs": "/data/docker_lab/bundle/rootfs",

  "created": "2020-11-29T06:42:30.366937499Z",

  "owner": ""

}

$ sudo runc pause demo1

$ sudo runc state demo1

{

  "ociVersion": "1.0.0",

  "id": "demo1",

  "pid": 29314,

# 當前狀態是 paused

  "status": "paused",

  "bundle": "/data/docker_lab/bundle",

  "rootfs": "/data/docker_lab/bundle/rootfs",

  "created": "2020-11-29T06:42:30.366937499Z",

  "owner": ""

}

$ sudo runc resume demo1

$ sudo runc state demo1

{

  "ociVersion": "1.0.0",

  "id": "demo1",

  "pid": 29314,

  "status": "running",

  "bundle": "/data/docker_lab/bundle",

  "rootfs": "/data/docker_lab/bundle/rootfs",

  "created": "2020-11-29T06:42:30.366937499Z",

  "owner": ""

}

3.五、runc kill/delete

中止容器進程，kill 實現比較簡單，就是查找到容器的進程ID 發送kill 信號
更多詳解，期待後面的源碼分析

# 15 是kill信號，默認也是15

$ sudo runc kill demo1 15

$ sudo runc state demo1

{

  "ociVersion": "1.0.0",

  "id": "demo1",

  "pid": 0,

  "status": "stopped",

  "bundle": "/data/docker_lab/bundle",

  "rootfs": "/data/docker_lab/bundle/rootfs",

  "created": "2020-11-29T06:42:30.366937499Z",

  "owner": ""

}

# 刪除容器

$ sudo runc delete demo1

3.六、runc events

*events 命令可以向咱們報告容器事件及其資源佔用的統計信息git

更多詳解，期待後面的源碼分析

$ sudo runc events demo1

{"type":"stats","id":"demo1","data":{"cpu":{"usage":{"total":10140103,"percpu":[2003042,8137061],"kernel":0,"user":0},"throttling":{}},"memory":{"usage":{"limit":9223372036854771712,"usage":208896,"max":671744,"failcnt":0},"swap":{"limit":9223372036854771712,"usage":208896,"max":671744,"failcnt":0},"kernel":{"limit":9223372036854771712,"usage":172032,"max":176128,"failcnt":0},"kernelTCP":{"limit":9223372036854771712,"failcnt":0},"raw":{"active_anon":0,"active_file":0,"cache":0,"dirty":0,"hierarchical_memory_limit":9223372036854771712,"hierarchical_memsw_limit":9223372036854771712,"inactive_anon":0,"inactive_file":0,"mapped_file":0,"pgfault":66,"pgmajfault":0,"pgpgin":66,"pgpgout":43,"rss":98304,"rss_huge":0,"shmem":0,"swap":0,"total_active_anon":0,"total_active_file":0,"total_cache":0,"total_dirty":0,"total_inactive_anon":0,"total_inactive_file":0,"total_mapped_file":0,"total_pgfault":66,"total_pgmajfault":0,"total_pgpgin":66,"total_pgpgout":43,"total_rss":98304,"total_rss_huge":0,"total_shmem":0,"total_swap":0,"total_unevictable":0,"total_writeback":0,"unevictable":0,"writeback":0}},"pids":{"current":1},"blkio":{},"hugetlb":{"2MB":{"failcnt":0}}}}

3.七、runc update

update 命令主要用於控制cgroup 參數，先看看能夠調整的subsystem, 主要包含 cpu， memory，kmemory

$ sudo runc update -h

Note: if data is to be read from a file or the standard input, all

other options are ignored.

   --blkio-weight value Specifies per cgroup weight, range is from 10 to 1000 (default: 0)

   --cpu-period value CPU CFS period to be used for hardcapping (in usecs). 0 to use system default

   --cpu-quota value CPU CFS hardcap limit (in usecs). Allowed cpu time in a given period

   --cpu-share value CPU shares (relative weight vs. other containers)

   --cpu-rt-period value CPU realtime period to be used for hardcapping (in usecs). 0 to use system default

   --cpu-rt-runtime value CPU realtime hardcap limit (in usecs). Allowed cpu time in a given period

   --cpuset-cpus value CPU(s) to use

   --cpuset-mems value Memory node(s) to use

   --kernel-memory value Kernel memory limit (in bytes)

   --kernel-memory-tcp value Kernel memory limit (in bytes) for tcp buffer

   --memory value Memory limit (in bytes)

   --memory-reservation value Memory reservation or soft_limit (in bytes)

   --memory-swap value Total memory usage (memory + swap); set '-1' to enable unlimited swap

   --pids-limit value Maximum number of pids allowed in the container (default: 0)

限制demo1 使用內存爲100MB

$ sudo runc update --memory 104857600 demo1

# 查看demo1 的cgroup 是否被設置了

$ cat /sys/fs/cgroup/memory/user.slice/demo1/memory.limit_in_bytes

104857600

3.八、runc checkpoint/restore

容器的熱遷移簡介

它是經過criu工具對一個正在運行的程序進行凍結，而且checkpoint它到一系列的文件，而後你就可使用這些文件在任何主機從新恢復這個程序到被凍結的那個點(白話就是實現對已運行程序的備份和恢復)。因此criu一般被用在程序或者容器的熱遷移、快照、遠程調試等；

實現原理

CRIU的功能的實現基本分爲兩個過程,checkpoint和restore。在checkpoint過程，criu主要經過ptrace機制把一段特殊代碼動態注入到dumpee進程（待備份的程序進程）並運行，這段特殊代碼就實現了收集dumpee進程的全部上下文信息，而後criu把這些上下文信息按功能分類存儲爲一個個鏡像文件。在restore過程。criu解析checkpoint過程產生的鏡像文件，以此來恢復程序備份前的狀態沒，讓程序從備份前的狀態繼續運行。

注意/參考

因爲能力有限，沒法對該部分作更多的詳解，並且runc 對這塊的支持是有問題的，後面演示這塊功能時會提到；儘管在Docker中使用該功能也須要特別注意 Docker版本、Linux內核、CRIU版本一致，不然生成的文件會會有所不一樣，致使恢復不了。
參考：https://blog.csdn.net/weixin_...

演示

按正常的 create / start 啓動的容器，執行checkpoint:

$ runc checkpoint demo1

criu failed: type NOTIFY errno 0

log file: /run/runc/demo1/criu.work/dump.log

# 查看dump 文件, 大概意思找不到標準輸入文件, 具體緣由（能力有限）不清楚爲何；google了下 也有很多issue 提到這個問題

$ cat/run/runc/demo1/criu.work/dump.log

...

(00.021115) Error (criu/files-reg.c:1294): Can't lookup mount=26 for fd=0 path=/dev/pts/0

...

後面我直接使用run 命令而且保持在前臺運行容器

$ cd /tmp/mycontainer
$ runc run demo1

啓動另外一個終端，能夠看到是運行狀態

$ runc list

下面開始使用checkpoint

# 保存了當前進程的運行狀態，如快照，並退出了進程

$ runc checkpoint demo1 

# 會發現程序已經被中止

$ runc list

# 查看當前目錄， 多了個checkpoint 目錄

$ ls ./

checkpoint config.json rootfs

# 能夠看到不少img 文件，就是保存程序運行時的一些狀態；用於恢復使用;

$ ls ./checkpoint

cgroup.img files.img inventory.img mm-1.img pagemap-1.img route6-9.img tmpfs-dev-63.tar.gz.img

core-1.img fs-1.img ip6tables-9.img mountpoints-12.img pages-1.img route-9.img tmpfs-dev-65.tar.gz.img

descriptors.json ids-1.img ipcns-var-10.img netdev-9.img pipes-data.img seccomp.img tmpfs-dev-66.tar.gz.img

fdinfo-2.img ifaddr-9.img iptables-9.img netns-9.img pstree.img tmpfs-dev-61.tar.gz.img utsns-11.img

下面進行 restore

# 必須在此目錄下執行

$ cd/tmp/mycontainer

# 恢復後也是在前臺執行的；

$ runc restore demo1

# 在另外一個終端查看容器是否執行

$ runc list

3.九、runc init

init 命令是初始化容器運行環境，上面曾經提到過created狀態的容器中運行着/proc/self/exe init 進程 , /proc/self/exe 等同於當前命令 runc，因此即 runc init;
init 命令是內部使用的命令，該進程會承載 cgroups， namespace，環境變量, 芯片參數設置，網卡接口建立等主要設置，後面會有更詳細的源碼解讀進行分析；