Docker Architecture、Docker Usage

時間 2019-11-11

標籤 docker architecture usage 欄目 Docker 简体版

原文原文鏈接

目錄php

0. 引言 - 爲何要有Docker技術
1. Docker簡介
2. Docker安裝、部署、使用
3. Docker安全
4. Docker底層實現
5. Docker網絡配置
6. Dockerfile詳解
7. Docker Volume

0. 引言 - 爲何要有Docker技術html

0x1: 虛擬機技術和LXC容器技術的差異node

1. 虛擬機VM(Virtual Machine)技術
每一個虛擬機都有本身獨享的內核, 能運行完整的不做修改的操做系統 

2. 容器技術LXC(Linux Container)技術
容器(container)是一種輕量級的虛擬化技術, 用於生成一個獨立的標準運行環境而不須要多個內核實例，例如Docker就是一種典型的LXC容器技術的實現

0x2: LXC容器技術的優點linux

1. 快速部署(秒級)，啓動一個容器只須要派生一個進程並完成OS啓動的用戶態部分
而啓動一個虛擬機須要執行額外的 BIOS 和內核代碼

2. 容器幾乎沒有額外的 IO 性能開銷
若是沒有完善的硬件虛擬化支持，虛擬機會引入顯著的 IO 性能開銷

3. 容器的內存開銷較小: 啓動一個沒有任何負載的容器僅須要幾十 MB 的內存
而虛擬機因爲包含完整的內核，內存開銷要大得多。另外若是使用 Union FS 來構造容器的文件系統，能減小 page cache 帶來的內存開銷

4. 較小的磁盤空間佔用: 構造容器的文件系統時，靜態文件可使用 bind-mount 或者Union FS 方式從宿主機加載，能夠節省大量磁盤空間

0x3: LXC容器技術的優點git

1. 資源隔離效果遜於虛擬機
對於虛擬機技術，因爲有 Hypervisor 的存在，資源的隔離實現很是完整。
而容器技術還處於開發階段，資源隔離的效果要遜於虛擬機

2. 內核的修改會影響全部的容器
虛擬機由於 Hypervisor 的存在，內核的更新只會影響一個應用

3. 缺乏動態遷移的支持
目前 OpenVZ 的 CRIU 項目提供了初步的 checkpointing 和 restore支持，但完整的動態遷移仍需時日。虛擬機的動態遷移方案相對比較完整

1. Docker簡介github

Docker基於Go語言開發，代碼託管在 Github上，並遵循Apache 2.0開源協議web

Build, Ship and Run Any App, Anywhere
Docker - An open platform for distributed applications for developers and sysadmins.

Docker項目的目標是實現輕量級的操做系統虛擬化解決方案。Docker的基礎是Linux容器(LXC)等技術
在LXC的基礎上Docker進行了進一步的封裝，讓用戶不須要去關心容器的管理，使得操做更爲簡便。用戶操做Docker的容器就像操做一個快速輕量級的虛擬機同樣簡單
docker

0x1: Docker的特性shell

1. Build
Develop an app using Docker containers with any language and any toolchain.

2. Ship
Ship the "Dockerized" app and dependencies anywhere - to QA, teammates, or the cloud - without breaking anything.

3. Run
Scale to 1000s of nodes, move between data centers and clouds, update with zero downtime and more.

0x2: Docker和Virtual Machines(VM)的區別ubuntu

下面的圖片比較了Docker和傳統虛擬化方式的不一樣之處，可見容器是在操做系統層面上實現虛擬化，直接複用本地主機的操做系統，而傳統方式則是在硬件層面實現

1. Virtual Machines

Each virtualized application includes

1. the application: which may be only 10s of MB
2. the necessary binaries and libraries
3. an entire guest operating system - which may weigh 10s of GB

2. Docker

The Docker Engine container

1. the application 
2. its dependencies(Bins/Libs)

It runs as an isolated process in userspace on the host operating system, sharing the kernel with other containers. Thus, it enjoys the resource isolation and allocation benefits of VMs but is much more portable and efficient.

1. 一個Container一般包含應用及應用依賴項，Container用來隔離進程，這些進程主要運行在主機操做系統上的隔離區和用戶空間。
這個是明顯不一樣於傳統的VMs

2. 傳統的硬件虛擬化(例如VMWare、KVM、Xen、EC2)旨在創造一個完整虛擬機。每一個虛擬化應用不只包含應用的二進制文件，還需運行該應用程序所需的庫、一個完整的Guest操做系統 

3. 因爲全部的容器共享同一個操做系統(以及二進制文件和庫)，因此，他們明顯要比VM小的多，這樣，就徹底能夠在一個物理主機上託管100個VMs(通常VM數量會受到嚴格限制)。此外，由於它們使用主機操做系統，重啓一個VM並不意味着要重啓操做系統，所以，容器更加輕便、高效 

4. Docker中的容器效率會更高。由於一個傳統的VM、應用、每一個應用副本以及每一個應用微小的變動都須要從新建立一個完整的VM  
一個新的應用在主機上僅僅包含應用及其二進制文件/庫，這樣就無需建立一個新的客戶機操做系統。 

5. 若是想在主機上運行該應用的幾個副本，你甚至無需複製共享的二進制文件，即便你對應用進行了變動，你也無需拷貝變動內容

0x3: Docker技術的核心競爭優點

1. Docker簡單來講就是一個Container的管理工具。而Container就是一個更輕量級的虛擬機，可是這個虛擬機沒有操做系統和設備(操做系統是共享的)，container技術目前解決了軟件行業的最大的幾個問題
    1) 應用的共享
    2) 配置管理和維護(還有應用的隔離，效率等等)
    3) 無論是在物理機環境仍是雲環境和虛擬機相比，container不只更輕量，並且配置簡化了不少(不用考慮操做系統和設備的配置) 
2. 寫應用的人不用考慮操做系統的配置，應用都在container裏面
3. Docker 容器幾乎能夠在任意的平臺上運行，包括物理機、虛擬機、公有云、私有云、我的電腦、服務器等。 這種兼容性可讓用戶把一個應用程序從一個平臺直接遷移到另一個。
4. Docker容器的運行不須要額外的hypervisor支持，它是內核級的虛擬化，所以能夠實現更高的性能和效率。事實上，Linux的內核已經在不少方面(例如命名空間)對虛擬化進行了支持

0x4: Docker的架構

Docker架構下的三種運行方式

1. 做爲守護進程，在Linux主機上管理LXC容器
    1) 使用namespaces來作權限的控制和隔離
    2) 使用cgroups來進行資源的配置
    3) 經過aufs來提升文件系統的資源利用率
    aufs是UnionFS的一種，它能夠把對文件系統的改動當成一次commit一層層的疊加。這樣的話多個容器之間就能夠共享他們的文件系統層次。這樣的話極大的節省了對存儲的需求，而且也能加速容器的啓動

2. 做爲一個CLI，與守護進程的REST API進行對話(docker run ...)
3. 做爲倉庫的客戶端，分享你所構建的內容(docker pull, docker commit)

0x5: Docker鏡像

1. Docker鏡像就是一個只讀的模板
2. 一個鏡像能夠包含一個完整的ubuntu操做系統環境，裏面僅安裝了Apache或用戶須要的其它應用程序
3. 鏡像能夠用來建立Docker容器
4. Docker提供了一個很簡單的機制來建立鏡像或者更新現有的鏡像，用戶甚至能夠直接從其餘人那裏下載一個已經作好的鏡像來直接使用

鏡像的實現原理
Docker鏡像是怎麼實現增量的修改和維護的呢？每一個鏡像都由不少層次構成，Docker使用Union FS將這些不一樣的"層"結合到一個鏡像中去，一般Union FS有兩個用途

1. 一方面能夠實現不借助LVM、RAID將多個disk掛到同一個目錄下
2. 另外一個更經常使用的就是將一個只讀的分支和一個可寫的分支聯合在一塊兒，Live CD 正是基於此方法能夠容許在鏡像不變的基礎上容許用戶在其上進行一些寫操做。Docker在AUFS上構建的容器也是利用了相似的原理

0x6: Docker容器

1. Docker 利用容器來運行應用 
2. 容器是從鏡像建立的"運行實例"。它能夠被啓動、開始、中止、刪除。每一個容器都是相互隔離的(isolation)、保證安全的平臺 
3. 能夠把容器看作是一個簡易版的Linux環境(包括root用戶權限、進程空間、用戶空間和網絡空間等)和運行在其中的應用程序 
4. 鏡像是隻讀的，容器在啓動的時候建立一層可寫層做爲最上層

0x7: Docker倉庫

1. 倉庫是集中存放鏡像文件的場所。有時候會把"倉庫"和"倉庫註冊服務器"(Registry)混爲一談，並不嚴格區分。實際上，倉庫註冊服務器上每每存放着多個倉庫，每一個倉庫中又包含了多個鏡像，每一個鏡像有不一樣的標籤(tag)
2. 倉庫分爲
    1) 公開倉庫(Public): 最大的公開倉庫是Docker Hub，存放了數量龐大的鏡像供用戶下載。國內的公開倉庫包括Docker Pool等，能夠提供大陸用戶更穩定快速的訪問
    2) 私有倉庫(Private): 用戶也能夠在本地網絡內建立一個私有倉庫
當用戶建立了本身的鏡像以後就可使用push命令將它上傳到公有或者私有倉庫，這樣下次在另一臺機器上使用這個鏡像時候，只須要從倉庫上pull下來就能夠了
3. Docker倉庫的概念跟Git相似，"倉庫註冊服務器"能夠理解爲GitHub這樣的託管服務

0x8: Docker源碼學習

http://www.infoq.com/cn/articles/docker-source-code-analysis-part1
http://www.infoq.com/cn/articles/docker-source-code-analysis-part2
http://www.infoq.com/cn/articles/docker-source-code-analysis-part3

Relevant Link:

http://www.csdn.net/article/2014-06-20/2820325-cloud-Docker
http://www.infoq.com/cn/dockers/
http://yeasy.gitbooks.io/docker_practice/
https://docker.cn/p/
http://www.csdn.net/article/a/2014-06-18/15819053
https://www.docker.com/resources/usecases/
https://www.docker.com/
http://blog.csdn.net/u012601664/article/details/39547319
http://special.csdncms.csdn.net/BeDocker/
http://www.csdn.net/article/2014-02-01/2818306-Docker-Story
http://www.csdn.net/article/2014-06-20/2820325-cloud-Docker
http://www.csdn.net/article/2014-07-02/2820497-what's-docker
http://dockerpool.com/static/books/docker_practice/introduction/why.html
https://github.com/docker/docker

2. Docker安裝、部署、使用

0x1: Docker Installation Based On Red Hat (64 bit)

https://docs.docker.com/installation/rhel/
https://fedoraproject.org/wiki/EPEL#How_can_I_use_these_extra_packages.3F
https://code.csdn.net/u010702509/docker_redhat

對於CentOS6，可使用EPEL庫安裝Docker，命令以下

//1. 安裝Docker支持
sudo yum install http://mirrors.yun-idc.com/epel/6/i386/epel-release-6-8.noarch.rpm
sudo yum install docker-io
//2. 登陸Docker鏡像倉庫
docker login
//3. 下載最新鏡像
sudo docker pull centos:latest

0x2: Docker Installation Based On ubntu 12.04 (64 bit)

1. 安裝、升級內核
//Docker的運行須要Linux內核提供相應的支持
sudo apt-get update
sudo apt-get install linux-image-generic-lts-raring linux-headers-generic-lts-raring
sudo reboot

2. 第一次添加Docker的repository到你的本地祕鑰庫
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
sudo apt-get update
sudo apt-get install lxc-docker
//期間會遇到一個警告，說這個包不可靠，你只須要回復yes而後繼續安裝就能夠了

3. 官方也提供了一個簡單腳本幫助你安裝，你能夠用curl來獲取這個腳本而後執行安裝
curl -s https://get.docker.io/ubuntu/ | sudo sh

4. 下載安裝ubuntu的鏡像封裝到一個沙箱中
sudo docker run -i -t ubuntu /bin/bash

0x3: Docker Installation Based On ubntu 13.04 (64 bit)

1. 確認是否安裝了AUFS
sudo apt-get update
sudo apt-get install linux-image-extra-`uname -r`

2. 以後的步驟同ubntu 12.04 (64 bit)

Relevant Link:

https://code.csdn.net/u010702509/docker_ubntu

0x4: Install Docker on Ubuntu 14.04 LTS

apt-get update
apt-get -y install docker.io

0x5: Docker簡單命令使用

1. 列出機器上的鏡像(images)

docker images

咱們能夠根據REPOSITORY來判斷這個鏡像是來自哪一個服務器，若是沒有 / 則表示官方鏡像，相似於username/repos_name表示Github的我的公共庫，相似於regsistory.example.com:5000/repos_name則表示的是私服

2. 在docker index中搜索image(search)

docker search remnux
//搜索的範圍是官方鏡像和全部我的公共鏡像。NAME列的 / 後面是倉庫的名字

3. 從docker registry server 中下拉image或repository(pull)

docker pull centos
//默認拉取最新鏡像: centos:latest

也能夠明確指定具體的鏡像
docker pull centos:centos6

也能夠從某我的的公共倉庫(包括本身是私人倉庫)拉取
docker pull remnux/thug

若是你沒有網絡，或者從其餘私服獲取鏡像，形如 
docker pull dl.dockerpool.com:5000/mongo:latest

4. 推送一個image或repository到registry(push)

與上面的pull對應，能夠推送到Docker Hub的Public、Private以及私服，但不能推送到Top Level Repository

docker push seanlook/mongo
docker push registry.tp-link.net:5000/mongo:2014-10-27

5. 使用image建立container並執行相應命令，而後中止

docker run命令首先會從特定的image創之上create一層可寫的container，而後經過start命令來啓動它。中止的container能夠從新啓動並保留原來的修改。run命令啓動參數有不少

docker run --rm=true ubuntu echo "hello world"
hello word
//--rm=true參數，即完成操做後中止容器並從文件系統移除
當利用docker run來建立容器時，Docker在後臺運行的標準操做包括
    1) 檢查本地是否存在指定的鏡像，不存在就從公有倉庫下載
    2) 利用鏡像建立並啓動一個容器
    3) 分配一個文件系統，並在只讀的鏡像層外面掛載一層可讀寫層
    4) 從宿主主機配置的網橋接口中橋接一個虛擬接口到容器中去
    5) 從地址池配置一個ip地址給容器
    6) 執行用戶指定的應用程序
    7) 執行完畢後容器被終止

6. 使用image建立container並進入交互模式, login shell是/bin/bash

docker run -i -t --name mytest centos:centos6 /bin/bash
bash-4.1#
//--name參數能夠指定啓動後的容器名字，若是不指定則docker會幫咱們取一個名字
//-t 選項讓Docker分配一個僞終端(pseudo-tty)並綁定到容器的標準輸入上， -i 則讓容器的標準輸入保持打開

7. 運行出一個container放到後臺運行

docker run -d ubuntu /bin/sh -c "while true; do echo hello world; sleep 2; done"
ae60c4b642058fefcc61ada85a610914bed9f5df0e2aa147100eab85cea785dc
//它將直接把啓動的container掛起放在後臺運行(saas)，而且會輸出一個CONTAINER ID，經過docker ps能夠看到這個容器的信息，可在container外面查看它的輸出docker logs ae60c4b64205，也能夠經過docker attach ae60c4b64205鏈接到這個正在運行的終端

8. 映射host到container的端口和目錄

映射主機到容器的端口是頗有用的，好比在container中運行memcached，端口爲11211，運行容器的host能夠鏈接container的 internel_ip:11211 訪問，若是有從其餘主機訪問memcached需求那就能夠經過-p選項，形如-p <host_port:contain_port>，存在如下幾種寫法 
-p 11211:11211 這個便是默認狀況下，綁定主機全部網卡(0.0.0.0)的11211端口到容器的11211端口上
-p 127.0.0.1:11211:11211 只綁定localhost這個接口的11211端口
-p 127.0.0.1::5000
-p 127.0.0.1:80:8080

9. 終止容器

終止一個容器有2種方法
    1) 使用docker stop來終止一個運行中的容器 
    2) 當Docker容器中指定的應用終結時，容器也自動終止
//終止狀態的容器能夠用docker ps -a命令看到，處於終止狀態的容器，能夠經過docker start命令來從新啓動  
在使用 -d 參數時，容器啓動後會進入後臺。 某些時候須要進入容器進行操做，可使用docker rm來刪除一個處於終止狀態的容器，要注意的是，只有處於中止狀態的容器才能刪除

10. 查看日誌文件來確認它是否正常工做

sudo docker attach -sig-proxy=false $CONTAINER_ID

11. 將在終止狀態(stopped)的容器從新啓動

能夠利用docker start -a containerID命令，直接將一個已經終止的容器啓動運行，容器的核心爲所執行的應用程序，所須要的資源都是應用程序運行所必需的。除此以外，並無其它的資源

12. 將一個container固化爲一個新的image(commit)

當咱們在製做本身的鏡像的時候，會在container中安裝一些工具、修改配置，若是不作commit保存起來，那麼container中止之後再啓動，這些更改就消失了。
docker commit <container> [repo:tag]
只能提交正在運行的container，即經過docker ps能夠看見的容器

13. 默認狀況下全部的命令都會通過一個受保護的Unix socket轉發給docker進程,因此咱們必須運行root或者經過sudo受權

sudo docker help

Relevant Link:

https://code.csdn.net/u010702509/docker_puppet
https://code.csdn.net/u010702509/docker_shareimage
https://code.csdn.net/u010702509/docker_buildimage
https://code.csdn.net/u010702509/docker_basic
https://code.csdn.net/u010702509/docker/file/Docker.md
http://www.liquidweb.com/kb/how-to-install-docker-on-ubuntu-14-04-lts/
https://docs.docker.com/engine/installation/linux/ubuntulinux/
https://code.csdn.net/u010702509/docker_helloword
https://segmentfault.com/a/1190000000751601

0x6: JAVA Tomcat Running Environment Installation

http://blog.csdn.net/junjun16818/article/details/34845613#comments

3. Docker安全

評估Docker的安全性時，主要考慮三個方面

1. 由內核的"名字空間(namespace)"和"控制組機制"提供的容器內在安全
2. Docker程序(特別是服務端)自己的抗攻擊性
3. 內核安全性的增強機制對容器安全性的影響

0x1: 內核名字空間(namespace)

Docker容器和LXC容器很類似，所提供的安全特性也是相似的。當用docker run啓動一個容器時，在後臺Docker爲容器建立了一個獨立的名字空間和控制組集合

1. 名字空間提供了最基礎也是最直接的隔離，在容器中運行的進程不會被運行在主機上的進程和其它容器發現和做用
    1) 父命名空間的進程"不能"被子命名空間看到
    2) 子命名空間的進程"能夠"被父命名空間看到
    3) 同級之間的命名空間之間的進程"不可見"

1. 每一個容器都有本身獨有的網絡棧，意味着它們不能訪問其餘容器的sockets或接口。可是，若是主機系統上作了相應的設置，容器能夠像跟主機交互同樣的和其餘容器交互。當指定公共端口或使用links來鏈接2個容器時，容器就能夠相互通訊了(能夠根據配置來限制通訊的策略)
從網絡架構的角度來看，全部的容器經過本地主機的網橋接口相互通訊，就像物理機器經過物理交換機通訊同樣

Docker的名字空間是基於Linux內核的命名空間架構實現的，關於Linux內核命名空間的相關知識，請參閱另外一篇文章

http://www.cnblogs.com/LittleHann/p/4026781.html
//搜索：2. Linux命名空間

0x2: 控制組

控制組是Linux容器機制的另一個關鍵組件，負責實現資源的審計和限制
它提供了不少有用的特性；以及確保各個容器能夠公平地分享主機的內存、CPU、磁盤 IO 等資源。更重要的是，控制組確保了當容器內的資源使用產生壓力時不會連累主機系統
儘管控制組不負責隔離容器之間相互訪問、處理數據和進程，它在防止拒絕服務(DDOS)攻擊方面是必不可少的。尤爲是在多用戶的平臺(好比公有或私有的PaaS)上，控制組十分重要。例如，當某些應用程序表現異常的時候，能夠保證一致地正常運行和性能

0x3: 內核能力機制

能力機制(Capability)是Linux內核一個強大的特性，能夠提供細粒度的權限訪問控制。Linux內核自2.2版本起就支持能力機制，它將權限劃分爲更加細粒度的操做能力，既能夠做用在進程上，也能夠做用在文件上(DAC、MAC模型)
例如，一個Web服務進程只須要綁定一個低於1024的端口的權限，並不須要root權限。那麼它只須要被受權net_bind_service能力便可
使用能力機制對增強 Docker 容器的安全有不少好處。一般，在服務器上會運行一堆須要特權權限的進程，包括有 ssh、cron、syslogd、硬件管理工具模塊(例如負載模塊)、網絡配置工具等等。容器跟這些進程是不一樣的，由於幾乎全部的特權進程都由容器之外的支持系統來進行管理

1. ssh: 訪問被主機上ssh服務來管理 
2. cron: 一般應該做爲用戶進程執行，權限交給使用它服務的應用來處理 
3. 日誌系統: 由Docker或第三方服務管理  
4. 網絡管理: 在主機上設置，除非特殊需求，容器不須要對網絡進行配置

從上面的例子能夠看出，大部分狀況下，容器並不須要"真正的"root權限，容器只須要少數的能力便可。爲了增強安全，容器能夠禁用一些不必的權限

1. 徹底禁止任何mount操做 
2. 禁止直接訪問本地主機的套接字 
3. 禁止訪問一些文件系統的操做，好比建立新的設備、修改文件屬性等 
4. 禁止模塊加載

默認狀況下，Docker啓動的容器被嚴格限制只容許使用內核的一部分能力，這樣，就算攻擊者在容器中取得了root權限，也不能得到本地主機的較高權限，能進行的破壞也有限
默認狀況下，Docker採用白名單機制，禁用必需功能以外的其它權限。固然，用戶也能夠根據自身需求來爲Docker容器啓用額外的權限

0x4: 系統原生提供的底層安全機制

除了"能力機制"以外，還能夠利用一些現有的安全機制來加強使用Docker的安全性

1. LSM(Linux Security Module)
    1) TOMOYO
    2) AppArmor
    3) SELinux
2. 編譯和運行時的安全檢查
    1) GRSEC
    2) PAX
    3) 經過地址隨機化(ASLR Address space layout randomization)避免惡意探測

3. 使用一些有加強安全特性的容器模板
    1) 帶AppArmor的模板
    2) 帶SELinux策略的模板

4. 用戶能夠自定義訪問控制機制來定製安全策略

跟其它添加到Docker容器的第三方工具同樣(好比網絡拓撲和文件系統共享)，有不少相似的機制，在不改變Docker內核狀況下就能夠加固現有的容器

Relevant Link:

http://dockerpool.com/static/books/docker_practice/security/README.html

4. Docker底層實現

Docker底層的核心技術包括

1. Linux上的名字空間(Namespaces)
2. 控制組(Control groups)
3. Union文件系統(Union file systems)
4. 容器格式(Container format)

咱們知道，傳統的虛擬機經過在宿主主機中運行hypervisor來模擬一整套完整的硬件環境提供給虛擬機的操做系統。虛擬機系統看到的環境是可限制的，也是彼此隔離的。這種直接的作法實現了對資源最完整的封裝，但不少時候每每意味着系統資源的浪費。例如，以宿主機和虛擬機系統都爲Linux系統爲例，虛擬機中運行的應用其實能夠利用宿主機系統中的運行環境
咱們知道，在操做系統中，包括

1. 內核
2. 文件系統
3. 網絡
4. PID
5. UID
6. IPC
7. 內存
8. 硬盤
9. CPU等等

全部的資源都是應用進程直接共享的。要想實現虛擬化，須要實現對這些資源的虛擬隔離
隨着Linux系統對於"名字空間(name space)"功能的完善實現，Linux內核已經能夠實現上面的全部需求，讓某些進程在彼此隔離的名字空間中運行。你們雖然都共用一個內核和某些運行時環境(例如一些系統命令和系統庫)，可是彼此卻看不到，都覺得系統中只有本身的存在。這種機制就是容器(Container)，利用名字空間來作權限的隔離控制，利用cgroups來作資源分配

Docker採用了C/S架構，包括客戶端和服務端。Docker daemon做爲服務端接受來自客戶的請求，並處理這些請求(建立、運行、分發容器)。客戶端和服務端既能夠運行在一個機器上，也可經過 socket或者RESTful API來進行通訊

Docker daemon通常在宿主主機後臺運行，等待接收來自客戶端的消息。Docker客戶端則爲用戶提供一系列可執行命令，用戶用這些命令實現跟Docker daemon交互

0x1: 名字空間

名字空間是Linux內核一個強大的特性。每一個容器都有本身單獨的名字空間，運行在其中的應用都像是在獨立的操做系統中運行同樣。名字空間保證了容器之間彼此互不影響

1. pid名字空間
不一樣用戶的進程就是經過pid名字空間隔離開的，且不一樣名字空間中能夠有相同pid。全部的LXC進程在Docker中的父進程爲Docker進程，每一個LXC進程具備不一樣的名字空間。同時因爲容許嵌套，所以能夠很方便的實現嵌套的Docker容器

2. net名字空間
有了pid名字空間，每一個名字空間中的pid可以相互隔離，可是網絡端口仍是共享host的端口。網絡隔離是經過net名字空間實現的，每一個net名字空間有獨立的網絡設備、IP 地址、路由表、/proc/net目錄。這樣每一個容器的網絡就能隔離開來。Docker默認採用veth的方式，將容器中的虛擬網卡同host上的一個Docker網橋docker0鏈接在一塊兒

3. ipc名字空間
容器中進程交互仍是採用了Linux常見的進程間交互方法(interprocess communication - IPC)，包括信號量、消息隊列和共享內存等。然而同VM不一樣的是，容器的進程間交互實際上仍是host上具備相同pid名字空間中的進程間交互，所以須要在IPC資源申請時加入名字空間信息，每一個IPC資源有一個惟一的32位id

4. mnt名字空間
相似chroot，將一個進程放到一個特定的目錄執行。mnt名字空間容許不一樣名字空間的進程看到的文件結構不一樣，這樣每一個名字空間中的進程所看到的文件目錄就被隔離開了。同chroot不一樣，每一個名字空間中的容器在/proc/mounts的信息只包含所在名字空間的mount point，即mnt名字空間是真正意義上的"chroot"

5. uts名字空間
UTS("UNIX Time-sharing System") 名字空間容許每一個容器擁有獨立的hostname和domain name，使其在網絡上能夠被視做一個獨立的節點而非主機上的一個進程

6. user名字空間
每一個容器能夠有不一樣的用戶和組id，也就是說能夠在容器內用容器內部的用戶執行程序而非主機上的用戶

0x2: 控制組(cgroups)

Cgroups是control groups的縮寫，是Linux內核提供的一種能夠限制、記錄、隔離進程組(process groups)所使用的物理資源(如：cpu、memory、IO等等)的機制。最初由google的工程師提出，後來被整合進Linux內核。Cgroups也是LXC爲實現虛擬化所使用的資源管理手段，能夠說沒有cgroups就沒有LXC
Cgroups最初的目標是爲資源管理提供的一個統一的框架，既整合現有的cpuset等子系統，也爲將來開發新的子系統提供接口。如今的cgroups適用於多種應用場景，從單個進程的資源控制，到實現操做系統層次的虛擬化(OS Level Virtualization)。Cgroups提供了一下功能

1. 限制進程組可使用的資源數量(Resource limiting)
    1) memory子系統能夠爲進程組設定一個memory使用上限，一旦進程組使用的內存達到限額再申請內存，就會出發OOM(out of memory)
2. 進程組的優先級控制(Prioritization)
    1) 使用cpu子系統爲某個進程組分配特定cpu share 
3. 記錄進程組使用的資源數量(Accounting)
    1) 使用cpuacct子系統記錄某個進程組使用的cpu時間
4. 進程組隔離(Isolation)
    1) 使用ns子系統可使不一樣的進程組使用不一樣的namespace，以達到隔離的目的，不一樣的進程組有各自的進程、網絡、文件系統掛載空間 
5. 進程組控制(Control)
    1) 使用freezer子系統能夠將進程組掛起和恢復

Relevant Link:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch01.html
http://www.cnblogs.com/lisperl/archive/2012/04/17/2453838.html

0x3: 聯合文件系統(UnionFS)

聯合文件系統(UnionFS)是一種分層、輕量級而且高性能的文件系統，它支持對文件系統的修改做爲一次提交來一層層的疊加，同時能夠將不一樣目錄掛載到同一個虛擬文件系統下(unite several directories into a single virtual filesystem)
聯合文件系統是Docker鏡像的基礎。鏡像能夠經過分層來進行繼承，基於基礎鏡像(沒有父鏡像)，能夠製做各類具體的應用鏡像，另外，不一樣Docker容器就能夠共享一些基礎的文件系統層，同時再加上本身獨有的改動層，大大提升了存儲的效率。
Docker中使用的AUFS(AnotherUnionFS)就是一種聯合文件系統。AUFS支持爲每個成員目錄(相似Git的分支)設定只讀(readonly)、讀寫(readwrite)和寫出(whiteout-able)權限, 同時 AUFS 裏有一個相似分層的概念, 對只讀權限的分支能夠邏輯上進行增量地修改(不影響只讀部分的)
Docker 目前支持的聯合文件系統種類包括

1. AUFS
2. btrfs
3. vfs
4. DeviceMapper

Unionfs是一個堆棧式的聯合文件系統，它能夠把多個目錄(也叫分支)內容合併在一塊兒, 而目錄的物理位置是分開的。Unionfs容許只讀和可讀寫目錄並存，就是說可同時刪除和增長內容. Unionfs應用的地方不少

1. 在多個磁盤分區上合併不一樣文件系統的主目錄
2. 把幾張CD光盤合併成一個統一的光盤目錄(歸檔)
3. 具備寫時複製(copy-on-write)功能Unionfs能夠把只讀和可讀寫文件系統合併在一塊兒，虛擬上容許只讀文件系統的修改能夠保存到可寫文件系統當中

Linux上unionfs的實現依託了VFS設計的靈活性，從架構上看，它在VFS的下層，在具體文件系統(如ext3, ext4等)的上層。系統調用read()/write()落在VFS上，VFS找到待操做文件的inode(unionfs inode)，通過unionfs的路由後找到真實文件的inode，執行操做。這是一個kernel hook的過程

Relevant Link:

http://www.kissthink.com/archive/9120.html
http://fengchj.com/?tag=%E8%81%94%E5%90%88%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F
http://en.wikipedia.org/wiki/Union_mount
http://en.wikipedia.org/wiki/UnionFS
http://lwn.net/Articles/325369/
http://lwn.net/Articles/327738/

0x4: overlayfs

An overlay-filesystem tries to present a filesystem which is the result over overlaying one filesystem on top of the other.
This approach is 'hybrid' because the objects that appear in the filesystem do not all appear to belong to that filesystem. In many cases an object accessed in the union will be indistinguishable from accessing the corresponding object from the original filesystem. This is most obvious from the 'st_dev'(設備號) field returned by stat.
While directories will report an st_dev from the overlay-filesystem, all non-directory objects will report an st_dev from the lower or upper filesystem that is providing the object. Similarly st_ino will only be unique when combined with st_dev, and both of these can change over the lifetime of a non-directory object. Many applications and tools ignore these values and will not be affected.

overlayfs能將兩個目錄"合併"(藉助了Linux內核提供的命名空間的基礎數據結構、以及位於VFS和底層具體文件系統之間的UnionFS實現的邏輯上虛擬的合併)，例如

1. dir1/目錄
    ./fire
    ./water
2. dir2/目錄
    ./apple
    ./banana

mount -t overlayfs overlayfs -olowerdir=/dir1,upperdi=/dir2 /test1/
合併之後，test1/目錄裏將會有

./fire
./water
./apple
./banana

其中/test1/fire、/dir1/fire實際上是"同一個文件"，用的是一臺page cache

基於overlayfs實現"文件目錄共享"

1. 準備一個base目錄，將經常使用的可共用的系統文件放置進去，例如/etc、/bin、/lib、/usr等
2. 列出每一個虛擬機容器須要獨立使用的目錄，例如/store一、/store二、/store三、/store4
3. 爲每一個虛擬機準備一個空目錄，例如/container一、/container二、/container三、/container4
4. mount
    mount -t overlayfs overlayfs -olowerdir=/base,upperdir=/store1 /container1
    mount -t overlayfs overlayfs -olowerdir=/base,upperdir=/store2 /container2
    mount -t overlayfs overlayfs -olowerdir=/base,upperdir=/store3 /container3
    mount -t overlayfs overlayfs -olowerdir=/base,upperdir=/store4 /container4
5. 運維人員執行
    cd /container1/
    chroot
這樣，每一個虛擬機就在擁有本身"獨立"的目錄的前提下，並共享了系統的一些庫文件

overlayfs從概念上有點相似視圖的概念，對lowerdir的文件目錄進行了一層抽象，對外提供了一層虛擬的視圖

1. Upper and Lower

An overlay filesystem combines two filesystems

1. an 'upper' filesystem
The upper filesystem will normally be writable and if it is it must support the creation of trusted.* extended attributes, and must provide valid d_type in readdir responses, so NFS is not suitable.
A read-only overlay of two read-only filesystems may use any filesystem type.

2. a 'lower' filesystem
The lower filesystem can be any filesystem supported by Linux and does not need to be writable.  The lower filesystem can even be another overlayfs

When a name exists in both filesystems, the object in the 'upper' filesystem is visible while the object in the 'lower' filesystem is either hidden or, in the case of directories, merged with the 'upper' object.

2. Directories

Overlaying mainly involves directories. If a given name appears in both upper and lower filesystems and refers to a non-directory in either, then the lower object is hidden - the name refers only to the upper object.
Where both upper and lower objects are directories, a merged directory is formed.
At mount time, the two directories given as mount options "lowerdir" and "upperdir" are combined into a merged directory:

mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,workdir=/work /merged
//The "workdir" needs to be an empty directory on the same filesystem as upperdir.

Then whenever a lookup is requested in such a merged directory, the lookup is performed in each actual directory and the combined result is cached in the dentry belonging to the overlay filesystem.
If both actual lookups find directories, both are stored and a merged directory is created, otherwise only one is stored: the upper if it exists, else the lower.
Only the lists of names from directories are merged. Other content such as metadata and extended attributes are reported for the upper directory only. These attributes of the lower directory are hidden.

3. whiteouts and opaque directories

In order to support rm and rmdir without changing the lower filesystem, an overlay filesystem needs to record in the upper filesystem that files have been removed. This is done using whiteouts and opaque directories (non-directories are always opaque).

1. A whiteout is created as a character device with 0/0 device number. When a whiteout is found in the upper level of a merged directory, any matching name in the lower level is ignored, and the whiteout itself is also hidden.
2. A directory is made opaque by setting the xattr "trusted.overlay.opaque" to "y".  Where the upper filesystem contains an opaque directory, any directory in the lower filesystem with the same name is ignored.

4. readdir

When a 'readdir' request is made on a merged directory, the upper and lower directories are each read and the name lists merged in the obvious way (upper is read first, then lower - entries that already exist are not re-added). This merged name list is cached in the 'struct file' and so remains as long as the file is kept open. If the
directory is opened and read by two processes at the same time, they will each have separate caches. A seekdir to the start of the directory (offset 0) followed by a readdir will cause the cache to be discarded and rebuilt.
This means that changes to the merged directory do not appear while a directory is being read. This is unlikely to be noticed by many programs.
seek offsets are assigned sequentially when the directories are read. Thus if

1. read part of a directory
2. remember an offset, and close the directory
3. re-open the directory some time later
4. seek to the remembered offset

there may be little correlation between the old and new locations in the list of filenames, particularly if anything has changed in the directory.
Readdir on directories that are not merged is simply handled by the underlying directory (upper or lower).

5. Non-directories

Objects that are not directories (files, symlinks, device-special files etc.) are presented either from the upper or lower filesystem as appropriate

1. When a file in the lower filesystem is accessed in a way the requires write-access, such as opening for write access, changing some metadata etc.
2. the file is first copied from the lower filesystem to the upper filesystem (copy_up)
//Note that creating a hard-link also requires copy_up, though of course creation of a symlink does not.

The copy_up may turn out to be unnecessary, for example if the file is opened for read-write but the data is not modified.

1. The copy_up process first makes sure that the containing directory exists in the upper filesystem - creating it and any parents as necessary.
2. It then creates the object with the same metadata (owner,mode, mtime, symlink-target etc.) 
3. and then if the object is a file, the data is copied from the lower to the upper filesystem.  
4. Finally any extended attributes are copied up.
5. Once the copy_up is complete, the overlay filesystem simply provides direct access to the newly created file in the upper filesystem 
6. future operations on the file are barely noticed by the overlay filesystem (though an operation on the name of the file such as rename or unlink will of course be noticed and handled).

6. Multiple lower layers

Multiple lower layers can now be given using the the colon (":") as a separator character between the directory names. For example:

mount -t overlay overlay -olowerdir=/lower1:/lower2:/lower3 /merged
//As the example shows, "upperdir=" and "workdir=" may be omitted.  In that case the overlay will be read-only.

The specified lower directories will be stacked beginning from the rightmost one and going left. In the above example lower1 will be the top, lower2 the middle and lower3 the bottom layer

7. Non-standard behavior

The copy_up operation essentially creates a new, identical file and moves it over to the old name. The new file may be on a different filesystem, so both st_dev and st_ino of the file may change.
Any open files referring to this inode will access the old data and metadata. Similarly any file locks obtained before copy_up will not apply to the copied up file.
On a file opened with O_RDONLY fchmod(2), fchown(2), futimesat(2) and fsetxattr(2) will fail with EROFS.
If a file with multiple hard links is copied up, then this will "break" the link. Changes will not be propagated to other names referring to the same inode.
Symlinks in /proc/PID/ and /proc/PID/fd which point to a non-directory object in overlayfs will not contain valid absolute paths, only relative paths leading up to the filesystem's root. This will be fixed in the future.
Some operations are not atomic, for example a crash during copy_up or rename will leave the filesystem in an inconsistent state. This will be addressed in the future.

VFS的高度靈活性爲overlayfs的實現提供了堅實的架構基礎，既然對一個文件的操做是能夠任意實現的，那就徹底能夠把文件A的read直接轉爲對另外一個文件B的read
這就是overlayfs的根本，把對overlayfs文件系統系統裏一個文件的操做，轉爲對lowerdir裏對應文件的操做

static struct file *ovl_open(struct dentry *dentry, int flags, const struct cred *cred) 
{ 
    int err; 
    struct path realpath; 
    enum ovl_path_type type; 

    /*
    overlayfs在dentry的d_fsdata成員裏放了文件對應的lowerdir和upperdir的信息，此處即可以找到該文件對應的lowerdirdir的文件
    */
    type = ovl_path_real(dentry, &realpath); 
    if (ovl_open_need_copy_up(flags, type, realpath.dentry)) 
    { 
        if (flags & O_TRUNC) 
            err = ovl_copy_up_truncate(dentry, 0); 
        else 
            err = ovl_copy_up(dentry); 
        if (err) 
            return ERR_PTR(err); 

        ovl_path_upper(dentry, &realpath); 
     } 
    /*
    將這個lowerdir對應的文件path傳給vfs_open，這樣，被真正open的就是lowerdir對應的"下層文件"了，以後的read/mmap都是調用該文件的file_operation
    */
    return vfs_open(&realpath, flags, cred); 
}

而overlayfs提供的"虛擬共享目錄"也是經過VFS層實現的

static int ovl_readdir(struct file *file, void *buf, filldir_t filler) 
{ 
    struct ovl_dir_file *od = file->private_data; 
    int res; 

    if (!file->f_pos) 
        ovl_dir_reset(file); 
    if (od->is_real) 
    { 
        res = vfs_readdir(od->realfile, filler, buf); 
        file->f_pos = od->realfile->f_pos; 
        return res; 
    } 

    if (!od->is_cached) 
    { 
        struct path lowerpath; 
        struct path upperpath; 
        struct ovl_readdir_data rdd = { .list = &od->cache }; 
    
        ovl_path_lower(file->f_path.dentry, &lowerpath); 
        ovl_path_upper(file->f_path.dentry, &upperpath); 
        //將上下兩層目錄的內容都讀出來，合併，放入rdd這個數據結構
        res = ovl_dir_read_merged(&upperpath, &lowerpath, &rdd); 
        if (res) 
            return res; 
        od->cache_version = ovl_dentry_version_get(file->f_path.dentry); 
        od->is_cached = true; 
        /*
        將od->curser指向od-cache裏對應偏移的dentry，這個dentry能夠理解爲相似該目錄下面文件的dentry
        */
        ovl_seek_cursor(od, file->f_pos); 
    } 
    /*
    rdd數據結構裏的list成員，其實是指向od->cache的，因此移動od->cursor就是在沿着rdd->list找到全部dentry
    */
    while (od->cursor.next != &od->cache) 
    { 
        int over; 
        loff_t off; 
        struct ovl_cache_entry *p; 
        
        p = list_entry(od->cursor.next, struct ovl_cache_entry, l_node); 
        off = file->f_pos; 
        file->f_pos++; 
        list_move(&od->cursor, &p->l_node); 
        
        if (p->is_whiteout) 
            continue; 
        over = filler(buf, p->name, p->len, off, p->ino, p->type); 
        if (over) 
            break; 
     } 
    //兩層目錄下的文件的文件名都讀出來放到一塊兒了
    return 0; 
}

overlayfs的特性是將上下層目錄合併爲一個文件系統，"下層"目錄是隻讀的，因此修改都是對"上層"的修改，在這個"新"的文件系統中

1. 新建立的文件實際是建立在"上層"目錄的
2. 刪除文件時
    1) 若是文件來自"下層"目錄，則隱藏它(從虛擬機容器用戶的角度來看就像刪除了同樣)
    2) 若是文件來自"上層"目錄，則直接刪除便可
3. 寫一個文件時
    1) 若是文件來自"下層"目錄，則拷貝其到上層目錄，而後寫這個"上層"的新文件
    2) 若是文件來自"上層"目錄，則直接寫便可

有VFS的高度靈活性，纔有overlayfs的簡潔實現

1. 上下合併
2. 同名覆蓋
3. 寫時拷貝

Relevant Link:

http://en.wikipedia.org/wiki/OverlayFS
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/overlayfs.txt
https://github.com/torvalds/linux/commit/e9be9d5e76e34872f0c37d72e25bc27fe9e2c54c
http://www.phoronix.com/scan.php?page=news_item&px=MTc5OTc
http://wenku.baidu.com/view/2c82473ca32d7375a41780ab.html
https://dev.openwrt.org/browser/trunk/target/linux/generic/patches-2.6.38/209-overlayfs.patch?rev=26213
http://issuu.com/byjgli/docs/overlayfs_procfs_____________/3

5. Docker網絡配置

Dokcer 經過使用 Linux 橋接提供容器之間的通訊，docker0 橋接接口的目的就是方便 Docker 管理。當 Docker daemon 啓動時須要作如下操做

1. creates the docker0 bridge if not present: 若是 docker0 不存在則建立
2. searches for an IP address range which doesn’t overlap with an existing route: 搜索一個與當前路由不衝突的 ip 段
3. picks an IP in the selected range: 在肯定的範圍中選擇 ip
4. assigns this IP to the docker0 bridge: 綁定 ip 到 docker0

0x1: Docker四種網絡模式

docker run 建立 Docker 容器時，能夠用 --net 選項指定容器的網絡模式，Docker 有如下 4 種網絡模式

1. host模式: 使用 --net=host 指定
2. container模式: 使用 --net=container:NAME_or_ID 指定 
3. none模式: 使用 --net=none 指定
4. bridge模式:使用 --net=bridge 指定，默認設置

1. host模式

若是啓動容器的時候使用 host 模式，那麼這個容器將不會得到一個獨立的 Network Namespace，而是和宿主機共用一個 Network Namespace。容器將不會虛擬出本身的網卡，配置本身的 IP 等，而是使用宿主機的 IP 和端口
例如，咱們在 10.10.101.105/24 的機器上用 host 模式啓動一個含有 web 應用的 Docker 容器，監聽 tcp 80 端口。當咱們在容器中執行任何相似 ifconfig 命令查看網絡環境時，看到的都是宿主機上的信息。而外界訪問容器中的應用，則直接使用 10.10.101.105:80 便可，不用任何 NAT 轉換，就如直接跑在宿主機中同樣。可是，容器的其餘方面，如文件系統、進程列表等仍是和宿主機隔離的

2. container模式

這個模式指定新建立的容器和已經存在的一個容器共享一個 Network Namespace，而不是和宿主機共享。新建立的容器不會建立本身的網卡，配置本身的 IP，而是和一個指定的容器共享 IP、端口範圍等。一樣，兩個容器除了網絡方面，其餘的如文件系統、進程列表等仍是隔離的。兩個容器的進程能夠經過 lo 網卡設備通訊

3. none模式

這個模式和前兩個不一樣。在這種模式下，Docker 容器擁有本身的 Network Namespace，可是，並不爲 Docker容器進行任何網絡配置。也就是說，這個 Docker 容器沒有網卡、IP、路由等信息。須要咱們本身爲 Docker 容器添加網卡、配置 IP 等

4. bridge模式

bridge 模式是 Docker 默認的網絡設置，此模式會爲每個容器分配 Network Namespace、設置 IP 等，並將一個主機上的 Docker 容器鏈接到一個虛擬網橋上。當 Docker server 啓動時，會在主機上建立一個名爲 docker0 的虛擬網橋，此主機上啓動的 Docker 容器會鏈接到這個虛擬網橋上。虛擬網橋的工做方式和物理交換機相似，這樣主機上的全部容器就經過交換機連在了一個二層網絡中
接下來就要爲容器分配 IP 了，Docker 會從 RFC1918 所定義的私有 IP 網段中，選擇一個和宿主機不一樣的IP地址和子網分配給 docker0，鏈接到 docker0 的容器就從這個子網中選擇一個未佔用的 IP 使用。如通常 Docker 會使用 172.17.0.0/16 這個網段，並將 172.17.42.1/16 分配給 docker0 網橋(在主機上使用 ifconfig 命令是能夠看到 docker0 的，能夠認爲它是網橋的管理接口，在宿主機上做爲一塊虛擬網卡使用)

Relevant Link:

https://opskumu.gitbooks.io/docker/content/chapter6.html
http://dockone.io/article/402
http://www.oschina.net/translate/docker-network-configuration

6. Dockerfile詳解

docker  build -t edwardsbean/centos6-jdk1.7  .
//當前目錄下包含Dockerfile,使用命令build來建立新的image,並命名爲edwardsbean/centos6-jdk1.7

0x1: Format

# Comment
FROM {YOUR Base Image from which you are building}
INSTRUCTION arguments

Docker runs the instructions in a Dockerfile in order. The first instruction must be `FROM` in order to specify the Base Image from which you are building.

1. FROM: 基於哪一個鏡像
2. RUN: 安裝軟件用
RUN echo 'we are running some # of cool things'

3. MAINTAINER: 鏡像建立者
4. CMD: container啓動時執行的命令，可是一個Dockerfile中只能有一條CMD命令，多條則只執行最後一條CMD.
CMD echo hello world
//CMD主要用於container時啓動指定的服務，當docker run command的命令匹配到CMD command時，會替換CMD執行的命令

5. ENTRYPOINT: container啓動時執行的命令，可是一個Dockerfile中只能有一條ENTRYPOINT命令，若是多條，則只執行最後一條
//ENTRYPOINT沒有CMD的可替換特性

6. USER: 使用哪一個用戶跑container
ENTRYPOINT ["memcached"]
USER daemon

7. EXPOSE: container內部服務開啓的端口。主機上要用還得在啓動container時，作host-container的端口映射
docker run -d -p 127.0.0.1:33301:22 centos6-ssh
//container ssh服務的22端口被映射到主機的33301端口

8. ENV: 用來設置環境變量
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8

9. ADD: 將文件<src>拷貝到container的文件系統對應的路徑<dest>
全部拷貝到container中的文件和文件夾權限爲0755，uid和gid爲0
若是文件是可識別的壓縮格式，則docker會幫忙解壓縮
若是要ADD本地文件，則本地文件必須在 docker build <PATH>，指定的<PATH>目錄下
若是要ADD遠程文件，則遠程文件必須在 docker build <PATH>，指定的<PATH>目錄下 
/*
docker build github.com/creack/docker-firefox
docker-firefox目錄下必須有Dockerfile和要ADD的文件
*/
注意:使用docker build - < somefile方式進行build，是不能直接將本地文件ADD到container中。只能ADD url file.
ADD只有在build鏡像的時候運行一次，後面運行container的時候不會再從新加載了。

10. VOLUME: 將本地文件夾或者其餘container的文件夾掛載到container中。
11. WORKDIR: 切換目錄用，能夠屢次切換(至關於cd命令)，對RUN,CMD,ENTRYPOINT生效
12. ONBUILD: ONBUILD 指定的命令在構建鏡像時並不執行，而是在它的子鏡像中執行

Relevant Link:

http://blog.csdn.net/wsscy2004/article/details/25878223
https://docs.docker.com/engine/reference/builder/

7. Docker Volume

Docker鏡像是由多個文件系統(只讀層)疊加而成。當咱們啓動一個容器的時候，Docker會加載只讀鏡像層並在其上(鏡像棧頂部)添加一個讀寫層。若是運行中的容器修改了現有的一個已經存在的文件，那該文件將會從讀寫層下面的只讀層複製到讀寫層，該文件的只讀版本仍然存在，只是已經被讀寫層中該文件的副本所隱藏
當刪除Docker容器，並經過該鏡像從新啓動時，以前的更改將會丟失。在Docker中，只讀層及在頂部的讀寫層的組合被稱爲Union File System(聯合文件系統)，這就是爲何咱們能在每次啓動容器時都得到一個"純淨"的新鏡像系統，在容器內部任意的修改都不會影響到其餘容器以及宿主機的鏡像

0x1: 數據卷

爲了可以保存(持久化)數據以及共享容器間的數據，Docker提出了Volume的概念。簡單來講，Volume就是目錄或者文件，它能夠繞過默認的聯合文件系統，而以正常的文件或者目錄的形式存在於宿主機上(即讓容器和宿主機共享目錄、文件)。數據卷是一種特殊的存在於一個或者多個docker內部的不一樣於Union File System的目錄。數據卷提供多種有用的特性用來持久化和共享數據

1. 數據卷在docker初始化時建立。若是容器的鏡像包含外掛的數據，外掛的數據將在卷初始化時被拷貝到新的本地卷 
2. 數據卷能夠被共享和在多個docker間複用
3. 能夠對數據卷直接修改
4. 更新鏡像時數據卷並不受影響
5. 即便鏡像被刪除，數據卷也仍然會持久化到本地

數據卷被設計用來持久化存儲數據，獨立於容器的生命週期。當你刪除容器時，docker並不會自動刪除數據卷，不使用的數據卷，也不會自動進行"垃圾回收"

0x1: 增長一個數據卷
咱們能夠經過兩種方式來初始化Volume，這兩種方式有些細小而又重要的差異。咱們能夠在運行時使用-v來聲明Volume

docker run -it --name container-test -h CONTAINER -v /data debian /bin/bash
//上面的命令會將/data掛載到容器中，並繞過聯合文件系統，咱們能夠在主機上直接操做該目錄。任何在該鏡像/data路徑的文件將會被複制到Volume
//當使用-v參數時，鏡像目錄下的任何文件都不會被複制到Volume中(宿主機的Volume會複製到鏡像目錄，鏡像不會複製到卷)

docker中的卷默認是讀寫權限，但你也能夠設置爲只讀

sudo docker run --rm -it -v /root/Limon/samples:/home/malware/samples ubuntu bash

0x2: 數據共享

若是要受權一個容器訪問另外一個容器的Volume，咱們可使用-volumes-from參數來執行docker run

docker run -it -h NEWCONTAINER --volumes-from container-test debian /bin/bash
//值得注意的是無論container-test是否運行，它都會起做用。只要有容器鏈接Volume，它就不會被刪除

0x3: 刪除Volumes

若是你已經使用docker rm來刪除你的容器，那可能有不少的孤立的Volume仍在佔用着空間
Volume只有在下列狀況下才能被刪除

1. 該容器是用docker rm －v命令來刪除的(-v是必不可少的)
2. docker run中使用了--rm參數
//即便用以上兩種命令，也只能刪除沒有容器鏈接的Volume。鏈接到用戶指定主機目錄的Volume永遠不會被docker刪除

0x4: 使用共享卷的注意點

多個容器能夠共享一個或者多個數據卷，可是同時寫入的時候會發生衝突
數據卷在宿主裏面是能夠直接操做的。你可使用普通的linux工具操做它們。可是建議你不要這樣直接作，由於容器和應用並不知道你的操做，這可能會致使數據操做衝突
Relevant Link:

https://www.goodmemory.cc/docker%E5%AE%B9%E5%99%A8%E7%9A%84%E6%95%B0%E6%8D%AE%E7%AE%A1%E7%90%86/
http://dockone.io/article/128

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。