Kubernetes 設計概要

時間 2019-11-20

標籤 kubernetes 設計概要简体版

原文原文鏈接

英文原文：Kubernetes Design Overview

Overview

Kubernetes builds on top of Docker to construct a clustered container scheduling service. The goals of the project are to enable users to ask a Kubernetes cluster to run a set of containers. The system will automatically pick a worker node to run those containers on.node

As container based applications and systems get larger, some tools are provided to facilitate sanity. This includes ways for containers to find and communicate with each other and ways to work with and manage sets of containers that do similar work.nginx

When looking at the arechitecture of the system, we'll break it down to services that run on the worker node and services that play a "master" role.git

譯者信息

概述

Kubernetes基於Docker，其構建了一個集羣容器的調度服務。該項目的目標是運行用戶在Kubernetes集羣上運行一系列容器。系統會自動在這些容器上選擇一個工做節點。github

因爲基於容器的應用和系統不斷變大，一些提供便捷智能的工具出現了。這包括容器間發現和彼此的溝通方式，以及與容器管理集的協做方法等相似工做。算法

當審視系統的架構時，咱們但願將架構打破，將其變爲那些運行在工做節點和服務上的「master」角色。docker

Key Concept: Container Pod

While Docker itself works with individual containers, Kubernetes works with a pod. A pod is a group of containers that are scheduled onto the same physical node. In addition to defining the containers that run in the pod, the containers in the pod all use the same network namespace/IP and define a set of storage volumes. Ports are also mapped on a per-pod basis.bootstrap

The Kubernetes Node

The Kubernetes node has the services necessary to run Docker containers and be managed from the master systems.後端

The Kubernetes node design is an extension of the Container-optimized Google Compute Engine image. Over time the plan is for these images/nodes to merge and be the same thing used in different ways. It has the services necessary to run Docker containers and be managed from the master systems.api

Each node runs Docker, of course. Docker takes care of the details of downloading images and running containers.瀏覽器

譯者信息

關鍵概念：Pod容器

Docker自己是一個獨立的容器，然而Kubernetes則基於一個pod容器。所謂的pod是指一組被安排到相同的物理節點上的容器。除了定義運行在pod的容器，pod中的容器都使用相同的網絡命名空間/IP而且定義了一組存儲卷。端口則基於每一個pod基礎上進行部署。

Kubernetes節點

Kuberneter節點包含運行Docker容器的必須服務，而且能夠被主系統管理。

Kuberneter節點是基於容器優化的谷歌計算引擎圖像的擴展而設計的。隨着時間的推移，該計劃主要爲了對這些圖像/節點進行合併，使之以不一樣的方式使用一樣的事情。它包含運行Docker容器的必須服務，而且能夠被主系統管理。

固然，每個節點都運行着Docker。Docker關注下載圖像和運行容器的詳細信息。

Kubelet

The second component on the node is called the kubelet. The Kubelet is the logical successor (and rewrite in go) of the Container Agent that is part of the Compute Engine image.

The Kubelet works in terms of a container manifest. A container manifest (defined here) is a YAML file that describes a pod. The Kubelet takes a set of manifests that are provided in various mechanisms and ensures that the containers described in those manifests are started and continue running.

There are 4 ways that a container manifest can be provided to the Kubelet:

File Path passed as a flag on the command line. This file is rechecked every 20 seconds (configurable with a flag).
HTTP endpoint HTTP endpoint passed as a parameter on the command line. This endpoint is checked every 20 seconds (also configurable with a flag.)
etcd server The Kubelet will reach out and do a watch on an etcd server. The etcd path that is watched is /registry/hosts/$(hostname -f). As this is a watch, changes are noticed and acted upon very quickly.
HTTP server The kubelet can also listen for HTTP and respond to a simple API (underspec'd currently) to submit a new manifest.

譯者信息

Kubelet

節點上的第二個組件叫作kubelet。Kubelelet是容器代理的邏輯後繼者（用Go重寫），它是計算引擎圖像的一部分。

Kubelet工做於容器清單。容器清單是一個描述了pod的YAML文件(定義在這兒)。Kubelet採用了一組不一樣機制提供的清單，並確保這些清單描述的容器已啓動並繼續運行。

容器清單能夠爲Kubelet提供四種方式：

文件傳遞路徑做爲命令行上的標誌。文件每20秒從新檢查一次（用標籤配置）。
HTTP端點 傳遞HTTP端點做爲命令行上的參數。端點沒20秒檢查一次（也是用標籤配置）。
etcd服務器 Kubelet將在etcd服務器上作一個看守。查看etcd路徑/registry/hosts/$(hostname -f)。由於這是一個看守，因此變更將被注意到並迅速的採起行動。
HTTP服務器 kubelet也能監聽HTTP並響應簡單的API提交的新清單。

Kubernetes Proxy

Each node also runs a simple network proxy. This reflects services as defined in the Kubernetes API on each node and can do simple TCP stream forwarding or round robin TCP forwarding across a set of backends.

The Kubernetes Master

The Kubernetes master is split into a set of components. These work together to provide an unified view of the cluster.

etcd

All persistent master state is stored in an instance of etcd. This provides a great way to store configuration data reliably. With watch support, coordinating components can be notified very quickly of changes.

譯者信息

Kubernetes代理

每一個節點此外還運行着一個簡單的網絡代理。它反射了定義在 Kubernetes API 中每一個節點上的service，而且能夠在一組後端作簡單的 TCP流轉發或TCP循環轉發。

Kubernetes Master

Kuberneters 控制被分到了一系列的組件中。這些組件工做在一塊兒，提供統一的集羣視圖。

etcd

全部持久化的控制狀態都被存儲在一個etcd的實例中。這提供了一中很是可靠的存儲配置數據的方式。經過watch支持，各個協調組件的變動通知能夠是很是快速的查看到。

Kubernetes API Server

This server serves up the main Kubernetes API.

It validates and configures data for 3 types of objects:

pod: Each pod has a representation at the Kubernetes API level.
service: A service is a configuration unit for the proxies that run on every worker node. It is named and points to one or more Pods.
replicationController: A replication controller takes a template and ensures that there is a specified number of "replicas" of that template running at any one time. If there are too many, it'll start more. If there are too few, it'll kill some.

Beyond just servicing REST operations, validating them and storing them in etcd, the API Server does two other things:

Schedules pods to worker nodes. Right now the scheduler is very simple.
Synchronize pod information (where they are, what ports they are exposing) with the service configuration.

譯者信息

Kubernetes API服務器

此服務器提供了主要的 Kubernetes API。

它將驗證和配置3種對象類型的數據：

pod：每個 pod 在 Kubernetes API 級別具備一個描述。
service：一個服務是運行在每個工做節點的代理配置單元。它被命名和指向一個或多個pod。
replicationController：一個複製控制器須要一個模板，並確保在任何一個時間運行該模板的是指定的數量的」副本「。若是副本過多，則啓動更多控制器；若是副本過少，則殺掉一些replicationController。

除了僅僅服務「其餘」操做，校驗和在etcd中存儲，API服務器還要作其餘兩件事情：

將pod附着在工做者節點上。目前，調度器相對比較簡單。
與服務配置同步pod信息（如pod部署在哪，哪些端口是打開的）。

Kubernetes Controller Manager Server

The repliationController type described above isn't strictly necessary for Kubernetes to be useful. It is really a service that is layered on top of the simple pod API. To enforce this layering, the logic for the repliationController is actually broken out into another server. This server watches etcd for changes toreplicationController objects and then uses the public Kubernetes API to implement the repliation algorithm.

譯者信息

Kubernetes控制管理服務器

對於Kubernetes的使用來講，repliationController的類型描述並不是嚴格必需。它實際是一個基於簡單pod API的服務。在這一層上執行，其邏輯定義的repliationController其實是被劃分到其餘的服務器上的。在服務器上查看etcd是爲了改變repliationController對象和使用公共的KubernetesAPI去實現響應算法。

Key Concept: Labels

Pods are organized using labels. Each pod can have a set of key/value labels set on it.

Via a "label query" the user can identify a set of pods. This simple mechanism is a key part of how bothservices and replicationControllers work. The set of pods that a service points at is defined with a label query. Similarly the population of pods that a replicationController is monitoring is also defined with a label query.

Label queries would typically be used to identify and group pods into, say, a tier in an application. You could also idenitfy the stack such as dev, staging or production.

譯者信息

關鍵概念：標籤

pod用標籤進行組織。每一個pod具有一個key/value鍵值映射的標籤。

經過「標籤查詢」，用戶能夠識別一系列的pod集合。這個簡單的方法是services和replicationControllers如何工做的關鍵部分。一個service指向的pod集合由一個標籤查詢定義。相似的，由replicationController監聽的pod的數量一樣也是由標籤查詢定義。

標籤查詢一般會用於，咱們講，在一個應用程序層次上面識別和分組的pod。一樣能夠用來識別棧，例如dev、staging或者production。

These sets could be overlapping. For instance, a service might point to all pods with tier in (frontend), stack in (prod). Now say you have 10 replicated pods that make up this tier. But you want to be able to 'canary' a new version of this component. You could set up a replicationController (with replicasset to 9) for the bulk of the replicas with labels tier=frontend,stack=prod,canary=no and anotherreplicationController (with replicas set to 1) for the canary with labels tier=frontend, stack=prod, canary=yes. Now the service is covering both the canary and non-canary pods. But you can mess with the replicationControllers separately to test things out, monitor the results, etc.

譯者信息

這些設置能夠重複。舉例來講，一項服務可能會經過tier in (frontend), stack in (pod)來指向全部的pod。那若是如今，你說你有10個pod的副本組成了這一層，可是你但願可以建立一個新的"canary"版本的組件，你能夠經過標籤 tier=fronted, stack=prod, canary=no 來爲大部分副本設置一個replicationController，另外經過標籤tier=fronted, stack=prod, canary=yes爲canary設置一個replicationController。如今服務就可以覆蓋canary和非canary的pod。但所以你可能會搞混兩個replicationController單獨進行測試和監測獲得的結果。

Network Model

Kubernetes expands the default Docker networking model. The goal is to have each pod have an IP in a shared networking namespace that has full communication with other physical computers and containers across the network. In this way, it becomes much less necessary to map ports.

For the Google Compute Engine cluster configuration scripts, advanced routing is set up so that each VM has a extra 256 IP addresses that get routed to it. This is in addition to the 'main' IP address assigned to the VM that is NAT-ed for Internet access. The networking bridge (called cbr0 to differentiate it fromdocker0) is set up outside of Docker proper and only does NAT for egress network traffic that isn't aimed at the virtual network.

譯者信息

網絡模型

Kuberneters擴展了默認的Docker網絡模型。其目標是實現每個pod都能在一個可經過網絡與其餘物理電腦和容器進行充分溝通的共享網絡空間中有本身的Ip地址。經過這種方式，能夠儘量的減小部署端口。

對於谷歌計算引擎羣集配置腳本，經過配置高級路由，使每一個 VM 都有額外256個IP地址能夠尋址到它。固然，這裏面出去了爲VM分配的主Ip地址，這個地址是用來接入互聯網的。以區別於docker0的網絡橋cbr0，經過正確設置Docker，僅做Nat功能，用來疏散網絡流量，這裏的網絡不是指的虛擬網絡。

Ports mapped in from the 'main IP' (and hence the internet if the right firewall rules are set up) are proxied in user mode by Docker. In the future, this should be done with iptables by either the Kubelet or Docker: Issue #15.

Release Process

Right now "building" or "releasing" Kubernetes consists of some scripts (in release/) to create a tar of the necessary data and then uploading it to Google Cloud Storage. In the future we will generate Docker images for the bulk of the above described components: Issue #19.

譯者信息

從 '主要 IP'（確保互聯網防火牆被正確的配置）進行的端口映射都是Docker在用戶模式下實現的。未來，這項工做應該由Kubelet或者Docker經過iptables實現：Issue#15。

發佈過程

如今"building"或者"releasing"Kubernetes由一些腳本(在release/下)組成，經過建立一個tar包來包含所需的數據，而後將它上傳到谷歌雲存儲。在未來，咱們將生成以上描述組件的大部分Docker圖像：Issue#19。

GCE Cluster Configuration

The scripts and data in the cluster/ directory automates creating a set of Google Compute Engine VMs and installing all of the Kubernetes components. There is a single master node and a set of worker (called minion) nodes.

config-default.sh has a set of tweakable definitions/parameters for the cluster.

The heavy lifting of configuring the VMs is done by SaltStack.

The bootstrapping works like this:

The kube-up.sh script uses the GCE startup-script mechanism for both the master node and the minion nodes.

For the minion, this simply configures and installs SaltStack. The network range that this minion is assigned is baked into the startup-script for that minion.
For the master, the release files are downloaded from GCS and unpacked. Various parts (specifically the SaltStack configuration) are installed in the right places.

SaltStack then installs the necessary servers on each node.

All go code is currently downloaded to each machine and compiled at install time.
The custom networking bridge is configured on each minion before Docker is installed.
Configuration (like telling the apiserver the hostnames of the minions) is dynamically created during the saltstack install.

After the VMs are started, the kube-up.sh script will call curl every 2 seconds until the apiserverstarts responding.

kube-down.sh can be used to tear the entire cluster down. If you build a new release and want to update your cluster, you can use kube-push.sh to update and apply (highstate in salt parlance) the salt config.

譯者信息

GCE集羣配置

位於cluster/目錄下的腳本和數據自動建立了一系列Google Compute Engine VM而且安裝了全部的Kubernetes組件。這裏有一個單獨的主節點和一系列的工做者節點。

對於集羣，config-default.sh中有一整套可調的定義/參數。

繁重的配置VM的操做由SaltStack完成。

引導這樣工做：

1. kubeup.sh腳本對主節點和下屬節點都採起GCE startup-script的方式進行操做。

對於下屬節點，腳本僅簡單的配置和安裝SaltStack。爲下屬節點分配的網絡範圍同時又添加到該下屬節點的startup-script中。
對於主節點，版本文件在GCS上下載，並進行解壓操做。大量的文件（尤爲是SaltStack配置）被安裝到正確的位置。

2. SaltStack會在每一個節點上安裝必要的服務項。

全部代碼會下載到每一臺機器，在安裝時編譯。
在每一個下屬節點的自定義網絡橋在Docker安裝以前進行配置。
Saltstack 安裝過程當中動態地建立配置（例如告知apiserver下屬的主機名）。

3. VM啓動以後，kube-up.sh腳本每2s會調用curl，直到apiserver開始響應。

kube-down.sh能夠用來卸載整個集羣。若是你build一個新的版本，須要升級你的集羣，你能夠用kube-push.sh進行升級和應用（用salt的術語就是highstate）salt配置。

Cluster Security

As there is no security currently built into the apiserver, the salt configuration will install nginx. nginx is configured to serve HTTPS with a self signed certificate. HTTP basic auth is used from the client to nginx. nginx then forwards the request on to the apiserver over plain old HTTP. Because a self signed certificate is used, access to the server should be safe from eavesdropping but is subject to "man in the middle" attacks. Access via the browser will result in warnings and tools like curl will require an "--insecure" flag.

All communication within the cluster (worker nodes to the master, for instance) occurs on the internal virtual network and should be safe from eavesdropping.

The password is generated randomly as part of the kube-up.sh script and stored in ~/.kubernetes_auth.

譯者信息

集羣安全

目前尚未創建安全的apiserver，salt的配置會安裝Nginx。配置Night是爲了使用帶有簽名證書的HTTPS。HTTP的基礎認證使用的是來自客戶端Nginx。Nginx會經過原來的HTTP發起對apiserver普通請求。因爲使用了簽名證書，從而保證了被訪問服務器不遭到"man in the middle"的攻擊被竊聽。使用瀏覽器訪問將致使警告和工具同樣的須要」——不安全」的標誌。

集羣內的全部信息交流（例如，工做的節點）都在在內部虛擬網絡中以避免遭到竊聽。

密碼是隨機生成的kube-up.sh腳本部分以及存儲在kubernetes_auth 。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。