kubernetes入門-概念篇

概述

什麼是 Kubernetes?

image
image

  • 核心層:Kubernetes最核心的功能,對外提供API構建高層的應用,最內提供插件式應用執行環境
  • 應用層:部署(無狀態應用、有狀態應用、批處理任務、集羣應用等)和路由(服務發現、DNS解析等)
  • 管理層:系統度量(如基礎設施、容器和網絡的度量),自動化(如自動擴展、動態Provision等)以及策略管理(RBAC、Quota、PSP、NetworkPolicy等)
  • 接口層:kubectl命令行工具、客戶端SDK以及集羣聯邦
  • 生態系統:在接口層之上的龐大容器集羣管理調度的生態系統,能夠劃分爲兩個範疇
    • Kubernetes外部:日誌、監控、配置管理、CI、CD、Workflow、FaaS、OTS應用、ChatOps等
    • Kubernetes內部:CRI、CNI、CVI、鏡像倉庫、Cloud Provider、集羣自身的配置和管理等

Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure.html

特性:node

  • Deploy your applications quickly and predictably.
  • Scale your applications on the fly.
  • Seamlessly roll out new features.
  • Optimize use of your hardware by using only the resources you need.

其餘特色:mysql

  • portable: public, private, hybrid, multi-cloud
  • extensible: modular, pluggable, hookable, composable
  • self-healing: auto-placement, auto-restart, auto-replication, auto-scaling

Why containers?

  • Agile application creation and deployment: Increased ease and efficiency of container image creation compared to VM image use.
  • Continuous development, integration, and deployment: Provides for reliable and frequent container image build and deployment with quick and easy rollbacks (due to image immutability).
  • Dev and Ops separation of concerns: Create application container images at build/release time rather than deployment time, thereby decoupling applications from infrastructure.
  • Environmental consistency across development, testing, and production: Runs the same on a laptop as it does in the cloud.
  • Cloud and OS distribution portability: Runs on Ubuntu, RHEL, CoreOS, on-prem, Google Container Engine, and anywhere else.
  • Application-centric management: Raises the level of abstraction from running an OS on virtual hardware to run an application on an OS using logical resources.
  • Loosely coupled, distributed, elastic, liberated micro-services: Applications are broken into smaller, independent pieces and can be deployed and managed dynamically – not a fat monolithic stack running on one big single-purpose machine.
  • Resource isolation: Predictable application performance.
  • Resource utilization: High efficiency and density.

Kubernetes提供的功能

  • co-locating helper processes, facilitating composite applications and preserving the one-application-per-container model,
  • mounting storage systems,
  • distributing secrets,
  • application health checking,
  • replicating application instances,
  • horizontal auto-scaling,
  • naming and discovery,
  • load balancing,
  • rolling updates,
  • resource monitoring,
  • log access and ingestion,
  • support for introspection and debugging, and
  • identity and authorization.

總結:調度,管理,擴展(deployment/demon set/stateful set/job, health check,auto-scaling,rolling updates)應用程序,提供應用程序運行平臺(日誌,監控,服務發現,負載均衡,鑑權),以及管理控制和分配平臺資源(內存,cpu,網絡,存儲,鏡像)linux

咱們看一下操做系統的定義
操做系統(Operating System, OS)是指控制和管理整個計算機系統的硬件和軟件資源,併合理地組織調度計算機的工做和資源的分配,以提供給用戶和其餘軟件方便的接口和環境的程序集合. kubernetes就是一個分佈式的操做系統,它管理一個計算機集羣的軟件和硬件資源,而且合理的組織調用程序(容器)和資源的分配,以提供給用戶和其餘軟件方便的接口和環境。
單機操做系統中的大多概念 都在k8s有或者正在有對應的形態。舉個例子systemctl有reload操做,這個k8s也沒有,可是是k8s正在作的。nginx

Kubernetes不是什麼

這段頗有意思,很值得看,Kubernetes不是什麼,裏面不少都是Kubernetes發行商須要考慮和完成的事git

  • Does not limit the types of applications supported. It does not dictate application frameworks (e.g., Wildfly), restrict the set of supported language runtimes (for example, Java, Python, Ruby), cater to only 12-factor applications, nor distinguish apps from services. Kubernetes aims to support an extremely diverse variety of workloads, including stateless, stateful, and data-processing workloads. If an application can run in a container, it should run great on Kubernetes.
  • Does not provide middleware (e.g., message buses), data-processing frameworks (for example, Spark), databases (e.g., mysql), nor cluster storage systems (e.g., Ceph) as built-in services. Such applications run on Kubernetes.
  • Does not have a click-to-deploy service marketplace.
  • Does not deploy source code and does not build your application. Continuous Integration (CI) workflow is an area where different users and projects have their own requirements and preferences, so it supports layering CI workflows on Kubernetes but doesn’t dictate how layering should work.
  • Allows users to choose their logging, monitoring, and alerting systems. (It provides some integrations as proof of concept.)
  • Does not provide nor mandate a comprehensive application configuration language/system (for example, jsonnet).
  • Does not provide nor adopt any comprehensive machine configuration, maintenance, management, or self-healing systems.

Kubernetes Components 組件

角色 組件 說明
Master Components kube-apiserver kube-apiserver exposes the Kubernetes API;
- - it is the front-end for the Kubernetes control plane.
Master Components etcd Kubernetes’ backing store. stored All cluster data
Master Components kube-controller-manager 一個binary包括:
- - 1.Node Controller: noticing & responding when nodes go down.
- - 2.Replication Controller:maintain correct number of pods for every Replication Controller object. - - 3.Endpoints Controller: Populates the Endpoints object (如join Services & Pods).
- - 4.Service Account & Token Controllers:Create default accounts,API access tokens for namespaces.
- - 5.others.
Master Components cloud-controller-manager a binary run controllers interact with cloud providers.包括:
- - 1.Node Controller: checking cloud provider,determine if node deleted in cloud after stops responding
- - 2.Route Controller: For setting up routes in the underlying cloud infrastructure
- - 3.Service Controller: For creating, updating and deleting cloud provider load balancers
- - 4. Volume Controller: For creating,attaching,mounting,interacting with cloud provider to orchestrate volumes
Master Components kube-scheduler kube-scheduler watches newly created pods that have no node assigned, and selects a node for them to run on.
Master Components addons Addons are pods and services that implement cluster features.
- - 如:DNS (Cluster DNS is a DNS server, in addition to the other DNS server(s) in your environment, which serves DNS records for Kubernetes services.),
- - User interface,Container Resource Monitoring,Cluster-level Logging
Node components kubelet primary node agent,主要功能:
- - 1.Watches for pods that have been assigned to its node (either by apiserver or via local configuration file)
- - 2.Mounts the pod’s required volumes
- - 3.Downloads the pod’s secrets
- - 4.Runs the pod’s containers via docker (or, experimentally, rkt).
- - 5.Periodically executes any requested container liveness probes.
- - 6.Reports the status of the pod back to the rest of the system, by creating a 「mirror pod」 if necessary
- - 7.Reports the status of the node back to the rest of the system.
Node components kube-proxy kube-proxy enables the Kubernetes service abstraction by maintaining network rules on the host and performing connection forwarding.
Node components docker/rkt for actually running containers.
Node components supervisord supervisord is a lightweight process babysitting system for keeping kubelet and docker running.
Node components fluentd fluentd is a daemon which helps provide cluster-level logging.

Kubernetes Objects - Kubernetes對象

Understanding Kubernetes Objects

分類github

類別 名稱
資源對象 Pod、ReplicaSet、ReplicationController、Deployment、StatefulSet、DaemonSet、Job、CronJob、HorizontalPodAutoscaling
配置對象 Node、Namespace、Service、Secret、ConfigMap、Ingress、Label、ThirdPartyResource、 ServiceAccount
存儲對象 Volume、Persistent Volume
策略對象 SecurityContext、ResourceQuota、LimitRange

Kubernetes Objects are persistent entities in the Kubernetes system. Kubernetes uses these entities to represent the state of your cluster. Specifically, they can describe:web

  • What containerized applications are running (and on which nodes) 應用
  • The resources available to those applications 資源
  • The policies around how those applications behave, such as restart policies, upgrades, and fault-tolerance 策略

Kubernetes Objects描述desired state => 狀態驅動redis

Kubernetes對象就是應用,資源和策略sql

Object Spec and Status

每一個對象都有兩個嵌套的字段Object Spec 和 Object Status
Object Spec描述desired的狀態Object Status 描述當前狀態. Object Status -》match Object Spec

Kubernetes Control Plane就是要讓 object’s actual state => object's desired state

參考

Name / NameSpace

Labels and Selectors

Labels are key/value pairs that are attached to objects, such as pods. Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users, but which do not directly imply semantics to the core system.
不惟一
Via a label selector, the client/user can identify a set of objects. The label selector is the core grouping primitive in Kubernetes.
The API currently supports two types of selectors: equality-based (如:environment = production)and set-based(如:environment in (production, qa)).

API

例子見 kubernetes.io/docs/concep…

label 可用在 LIST and WATCH filtering;Set references in API objects

Set references in API objects的例子

Some Kubernetes objects, such as services and replicationcontrollers, also use label selectors to specify sets of other resources, such as pods.可是支持equality-based requirement selectors

"selector": {
    "component" : "redis",
}複製代碼

Newer resources, such as Job, Deployment, Replica Set, and Daemon Set, support set-based requirements as well.這些資源,同時支持set-based requirements

selector:
  matchLabels:
    component: redis
  matchExpressions:
    - {key: tier, operator: In, values: [cache]}
    - {key: environment, operator: NotIn, values: [dev]}複製代碼

另外一個使用場景事用label來選擇node

Annotations

做用是Attaching metadata to objects

和label有區別:
You can use either labels or annotations to attach metadata to Kubernetes objects. Labels can be used to select objects and to find collections of objects that satisfy certain conditions. In contrast, annotations are not used to identify and select objects. The metadata in an annotation can be small or large, structured or unstructured, and can include characters not permitted by labels.

The Kubernetes API

Complete API details are documented using Swagger v1.2 and OpenAPI(就是Swagger 2.0).

API versioning

如:/api/v1, 根據穩定性分爲 stabel(v1), alpha (v1alpha1), beta (v2beta3)

API groups

爲了方便extend Kubernetes API
Currently there are several API groups in use:

  1. the core (oftentimes called 「legacy」, due to not having explicit group name) group, which is at REST path /api/v1 and is not specified as part of the apiVersion field, e.g. apiVersion: v1.
  2. the named groups are at REST path /apis/$GROUP_NAME/$VERSION, and use apiVersion: $GROUP_NAME/$VERSION (e.g. apiVersion: batch/v1, 再好比:/apis/apps/v1beta2/).

擴展api目前有兩種方式: CustomResourceDefinition 和 kube-aggregator

某個api group能夠在apiserver啓動的時候被打開或者
關閉, 好比

--runtime-config=extensions/v1beta1/deployments=false,extensions/v1beta1/ingress=false複製代碼

API Conventions

這部分來自 github.com/kubernetes/…

kinds能夠分爲三類

  • Objects represent a persistent entity in the system.Examples: Pod, ReplicationController, Service, Namespace, Node
  • Lists are collections of resources of one (usually) or more (occasionally) kinds.Examples: PodLists, ServiceLists, NodeLists
  • Simple: used for specific actions on objects and for non-persistent entities.Many simple resources are "subresources",如/binding;/status;/scale;一個資源的小部分
Resources

All JSON objects returned by an API MUST have the following fields:

  • kind: a string that identifies the schema this object should have
  • apiVersion: a string that identifies the version of the schema the object should have
Objects
object內容 說明
Metadata MUST: namespace,name,uid; SHOULD: resourceVersion,generation,creationTimestamp,deletionTimestamp,labels,annotations
Spec and Status status (current) -> Spec(desired);A /status subresource MUST be provided to enable system components to update statuses of resources they manage; Status常是Conditions
References to related objects ObjectReference type
Lists and Simple kinds

Differing Representations

Verbs on Resources

github.com/kubernetes/…

PATCH比較特別,支持三種patch

  • JSON Patch
  • Merge Patch
  • Strategic Merge Patch

Idempotency

All compatible Kubernetes APIs MUST support "name idempotency" and respond with an HTTP status code 409
"confict"

Optional vs. Required

Optional fields have the following properties:

  • They have +optional struct tag in Go.
  • They are a pointer type in the Go definition or have a built-in nil value
  • The API server should allow POSTing and PUTing a resource with this field unset

使用 +optional 而不是omitempty

Defaulting

Late Initialization

Concurrency Control and Consistency

使用resourceVersion來作Concurrency Control
All Kubernetes resources have a "resourceVersion" field as part of their metadata.
Kubernetes leverages the concept of resource versions to achieve optimistic concurrency.
The resourceVersion is changed by the server every time an object is modified.

Serialization Format

Units

Selecting Fields

Object references

HTTP Status codes

Response Status Kind

什麼什麼api會返回status kind類型
Kubernetes will always return the Status kind from any API endpoint when an error occurs. Clients SHOULD handle these types of objects when appropriate.

  • A Status kind will be returned by the API in two cases:
  • When an operation is not successful (i.e. when the server would return a non 2xx HTTP status code).
    When a HTTP DELETE call is successful.
$ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana

> GET /api/v1/namespaces/default/pods/grafana HTTP/1.1
> User-Agent: curl/7.26.0
> Host: 10.240.122.184
> Accept: */*
> Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc
>

< HTTP/1.1 404 Not Found
< Content-Type: application/json
< Date: Wed, 20 May 2015 18:10:42 GMT
< Content-Length: 232
<
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "pods \"grafana\" not found",
  "reason": "NotFound",
  "details": {
    "name": "grafana",
    "kind": "pods"
  },
  "code": 404
}複製代碼

Events

Naming conventions

Label, selector, and annotation conventions

WebSockets and SPDY

The API therefore exposes certain operations over upgradeable HTTP connections (described in RFC 2817) via the WebSocket and SPDY protocols.
支持兩種協議

  • Streamed channels: Kubernetes supports a SPDY based framing protocol that leverages SPDY channels and a WebSocket framing protocol that multiplexes multiple channels onto the same stream by prefixing each binary chunk with a byte indicating its channel
  • Streaming response: HTTP Chunked Transfer-Encoding

Validation

Kubernetes Architecture

Nodes

Node Status 描述
Addresses HostName/ExternalIP/InternalIP
Condition OutOfDisk / Ready / MemoryPressure / DiskPressure / NetworkUnavailable
Capacity
Info

Management

Node Controller
The node controller is a Kubernetes master component which manages various aspects of nodes.

做用:

  • assigning a CIDR block to the node when it is registered
  • keeping the node controller’s internal list of nodes up to date with the cloud provider’s list of available machines
  • monitoring the nodes’ health
  • Starting in Kubernetes 1.6, the NodeController is also responsible for evicting pods that are running on nodes with NoExecute
  • Starting in version 1.8, the node controller can be made responsible for creating taints that represent Node conditions.

Master-Node communication

Concepts Underlying the Cloud Controller Manager

The CCM consolidates all of the cloud-dependent logic from the preceding three components to create a single point of integration with the cloud. The new architecture with the CCM looks like this

image
image

TODO

Extending the Kubernetes API

Custom Resources

Custom resources

Custom controllers

CustomResourceDefinitions

API server aggregation

Extending the Kubernetes API with the aggregation layer

Containers

Images

Updating Images

The default pull policy is IfNotPresent which causes the Kubelet to not pull an image if it already exists.
若是要強制拉取,使用imagePullPolicy: Always, 推薦的作法是 "Vxx + IfNotPresent", 而不是"latest + Always",由於不知道正在運行的是什麼版本,可是實際上pull是調用docker這樣的runtime去pull, 即便Always也不會重複下載大量數據,由於layer已經存在來,從這方面講Always是無害的。

Using a Private Registry

可用:
Using Google Container Registry
Using AWS EC2 Container Registry
Using Azure Container Registry (ACR)

Configuring Nodes to Authenticate to a Private Repository

經過$HOME/.docker/config.json (過時問題??)

Pre-pulling Images

Specifying ImagePullSecrets on a Pod

Creating a Secret with a Docker Config
$ kubectl create secret docker-registry myregistrykey --docker-server=DOCKER_REGISTRY_SERVER --docker-username=DOCKER_USER --docker-password=DOCKER_PASSWORD --docker-email=DOCKER_EMAIL
secret "myregistrykey" created.複製代碼
Bypassing kubectl create secrets

不經過kubectl也能夠從.docker/config.json的內容,用yaml建立secrets

Referring to an imagePullSecrets on a Pod

怎麼使用建立出來的imagePullSecrets
能夠在podspec裏面指定,也能夠經過serviceaccount自動完成這個設定。

You can use this in conjunction with a per-node .docker/config.json. The credentials will be merged. This approach will work on Google Container Engine (GKE).

apiVersion: v1
kind: Pod
metadata:
  name: foo
  namespace: awesomeapps
spec:
  containers:
    - name: foo
      image: janedoe/awesomeapp:v1
  imagePullSecrets:
    - name: myregistrykey複製代碼

Use Cases

使用場景,值得注意的是 AlwaysPullImages admission controller,這個有時候要打開,好比多租戶的狀況,不然有可能獲取別人的鏡像。

Container Environment Variables

Container information

  • pod information等不少元數據信息能夠經過 downward API 掛成環境變量
  • secret也能夠掛成環境變量
  • pod spec中自定義的環境變量

具體多種掛在方式 元數據->container裏面的文件/環境變量,參考 kubernetes.io/docs/tasks/… 和相關文檔

Cluster information

建立的時候存在的service host/port做爲變量都會掛在container裏面(目前看是這個namespace的),這個特性保證了即便沒開dns addon,也能夠訪問service,固然這種方式不可靠。

Container Lifecycle Hooks

Container Hooks

Hook Details

如今有兩種 PostStart; PreStop,若是hook調用hangs,Pod狀態變化會阻塞。

  • PostStart:executes immediately after a container is created. 不保證在ENTRYPOINT前面執行
  • PreStop: called immediately before a container is terminated, 同步執行,最多能夠執行的時間和grace period 有關

Hook Handler Implementations

支持Exec,HTTP兩種方式

Hook Handler Execution

  • Hook handler calls are synchronous within the context of the Pod containing the Container. This means that for a PostStart hook, the Container ENTRYPOINT and hook fire asynchronously. However, if the hook takes too long to run or hangs, the Container cannot reach a running state.
  • The behavior is similar for a PreStop hook. If the hook hangs during execution, the Pod phase stays in a Terminating state and is killed after terminationGracePeriodSeconds of pod ends. If a PostStart or PreStop hook fails, it kills the Container.

從上面的特色能夠看出,PostStart; PreStop的目前的設計都是針對很是輕量級的命令,若是不是能夠考慮用initcontainer,defercontainer(還沒實現,有issue)

Hook delivery guarantees

通常只會發一次,可是不保證

Debugging Hook Handlers

If a handler fails for some reason, it broadcasts an event.
You can see these events by running kubectl describe pod

Workloads

Pods

Pod Overview

Pod是什麼:部署的最小單位; 涵蓋了一個或多個application container,(共用的)存儲資源,網絡IP,options
A Pod encapsulates an application container (or, in some cases, multiple containers), storage resources, a unique network IP, and options that govern how the container(s) should run. A Pod represents a unit of deployment: a single instance of an application in Kubernetes, which might consist of either a single container or a small number of containers that are tightly coupled and that share resources.
參考:
blog.kubernetes.io/2015/06/the… (一個pod多個container的use case:Sidecar (git, log...), Ambassador (proxy, 透明代理),Adapter (exporter)...)
blog.kubernetes.io/2016/06/con…

Understanding Pods

image
image

How Pods manage multiple Containers

一個例子:

image
image

multiple Containers共享:

  • Networking
  • Storage

Working with Pods

Pods are designed as relatively ephemeral, disposable entities.Pods do not, by themselves, self-heal,Kubernetes uses a higher-level abstraction, called a Controller, that handles the work of managing the relatively disposable Pod instances.

Pods and Controllers

A Controller can create and manage multiple Pods for you, handling replication and rollout and providing self-healing capabilities at cluster scope. For example, if a Node fails, the Controller might automatically replace the Pod by scheduling an identical replacement on a different Node.

Some examples of Controllers that contain one or more pods include:

  • Deployment
  • StatefulSet
  • DaemonSet

Pod Templates

Controllers use Pod Templates to make actual pods.
沒有 desired state of all replicas,不像pod,會規定desired state of all containers belonging to the pod.

Pod Lifecycle

Pod phase

A Pod’s status field is a PodStatus object, which has a phase field.

可能的狀態 說明
Pending The Pod has been accepted by the Kubernetes system, but one or more of the Container images has not been created.
Running The Pod has been bound to a node, and all of the Containers have been created. At least one Container is still running, or is in the process of starting or restarting
Succeeded All Containers in the Pod have terminated in success, and will not be restarted.
Failed All Containers in the Pod have terminated, and at least one Container has terminated in failure.
Unknown

pod 終止

  1. 用戶發送刪除pod的命令,默認優雅刪除時期是30秒;
  2. 在Pod超過該優雅刪除期限後API server就會更新Pod的狀態爲「dead」;
  3. 在客戶端命令行上顯示的Pod狀態爲「terminating」;
  4. 跟第三步同時,當kubelet發現pod被標記爲「terminating」狀態時,開始中止pod進程:
    1. 若是在pod中定義了preStop hook,在中止pod前會被調用。若是在優雅刪除期限過時後,preStop hook依然在運行,第二步會再增長2秒的優雅時間;
    2. 向Pod中的進程發送TERM信號;
  5. 跟第三步同時,該Pod將從該service的端點列表中刪除,再也不是replication controller的一部分。關閉的慢的pod將繼續處理load balancer轉發的流量;
  6. 過了優雅週期後,將向Pod中依然運行的進程發送SIGKILL信號而殺掉進程。
  7. Kublete會在API server中完成Pod的的刪除,經過將優雅週期設置爲0(當即刪除)。Pod在API中消失,而且在客戶端也不可見。

Pod conditions

A Pod has a PodStatus, which has an array of PodConditions.Each element of the PodCondition array has a type field and a status field.

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2017-10-28T06:30:03Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2017-10-28T06:30:13Z
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2017-10-28T06:30:03Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://dd82608cabe226247bcbc8d5fbce6121edf935320486c41046481000dbb7784f
    image: deis/brigade-api:latest
    imageID: docker-pullable://deis/brigade-api@sha256:943cf822adddf6869ff02d2e1a55cbb19c96d01be41e88d1d56bc16a50f5c91f
    lastState: {}
    name: brigade
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2017-10-28T06:30:06Z複製代碼

Container probes

A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kublet calls a Handler implemented by the Container.

三種檢測方式:

  • ExecAction
  • TCPSocketAction
  • HTTPGetAction

三種結果: Success,Failure,Unknown
兩種類型:livenessProbe(和restart policy相關),readinessProbe

When should you use liveness or readiness probes?

todo

Pod and Container status

Restart policy

Pod lifetime

  • Use a Job for Pods that are expected to terminate, for example, batch computations. Jobs are appropriate only for Pods with restartPolicy equal to OnFailure or Never.
  • Use a ReplicationController, ReplicaSet, or Deployment for Pods that are not expected to terminate, for example, web servers. ReplicationControllers are appropriate only for Pods with a restartPolicy of Always.
  • Use a DaemonSet for Pods that need to run one per machine, because they provide a machine-specific system service.

image
image

Examples

pod 只有一個container

這裏比較值得注意的是若是pod設計成run to complete的,那麼restartPolicy不能用Always

當前pod phase container發生事件 pod restartPolicy 對container的動做 log pod phase
Running exits with success Always Restart Container Log completion event Running
Running exits with success OnFailure - Log completion event Succeeded
Running exits with success Never - Log completion event Succeeded
Running exits with failure Always Restart Container Log failure event Running
Running exits with failure OnFailure Restart Container Log failure event Running
Running exits with failure Never - Log failure event Failed
Running oom Always Restart Container Log OOM event Running
Running oom OnFailure Restart Container Log OOM event Running
Running oom Never - Log OOM event Failed
pod 只有兩個container
當前pod phase container1發生事件 pod restartPolicy 對container的動做 log pod phase
Running exits with failure Always Restart Container Log failure event Running
Running exits with failure OnFailure Restart Container Log failure event Running
Running exits with failure Never - Log failure event Running, 若是container2也退出 =》Failed

Init Containers

經常使用來作set-up,或者等待set-up
Init Containers are exactly like regular Containers, except:

  • They always run to completion.
  • Each one must complete successfully before the next one is started.

Detailed behavior

  • A Pod cannot be Ready until all Init Containers have succeeded.
  • If the Pod is restarted, all Init Containers must execute again.
  • readinessProbe什麼的不能使用
  • Use activeDeadlineSeconds on the Pod and livenessProbe on the Container to prevent Init Containers from failing forever.

Pod Preset

pod preset,是一種給pod注入元數據的方法。
使用pod preset會決定對某一類的pod,在Admission controller那裏透明的對pod spec進行修改,給pod動態的注入依賴的一些信息,如env,mount volumns

表現:
當PodPreset被應用於一個或者多個Pod,Kubernetes修改pod的spec。對於Env,EnvFrom和VolumeMounts,Kubernetes修改了Pod裏面全部容器的spec;對於Volume Kubernetes修改了Pod Spec。

例子:

kind: PodPreset
apiVersion: settings.k8s.io/v1alpha1
metadata:
  name: allow-database
  namespace: myns
spec:
  selector:
    matchLabels:
      role: frontend
  env:
    - name: DB_PORT
      value: "6379"
  volumeMounts:
    - mountPath: /cache
      name: cache-volume
  volumes:
    - name: cache-volume
      emptyDir: {}複製代碼

參考www.jianshu.com/p/83fe99a5e…

Pod 安全策略

包含 PodSecurityPolicy 的 許可控制,容許控制集羣資源的建立和修改,基於這些資源在集羣範圍內被許可的能力。
若是某個策略可以匹配上,該 Pod 就被接受。若是請求與 PSP 不匹配,則 Pod 被拒絕

jimmysong.io/kubernetes-…

Disruptions

Voluntary and Involuntary Disruptions

  • unavoidable cases 即 involuntary disruptions to an application. =>好比: hardware failure,kernel panic,node disappears,eviction of a pod due to the node being out-of-resources.等等

  • voluntary disruptions => 好比: deleting/updating the deployment/pod, Draining a node for repair or upgrade or cluster down.

Dealing with Disruptions

如何減輕Involuntary Disruptions的影響: 指名要的資源, Replicate and spread.

  • Ensure your pod requests the resources it needs.
  • Replicate your application if you need higher availability
  • For even higher availability when running replicated applications, spread applications across racks (using anti-affinity) or across zones (if using a multi-zone cluster.)

How Disruption Budgets Work

在Kubernetes中,爲了保證業務不中斷或業務SLA不降級,須要將應用進行集羣化部署。經過PodDisruptionBudget控制器能夠設置應用POD集羣處於運行狀態最低個數,也能夠設置應用POD集羣處於運行狀態的最低百分比,這樣能夠保證在主動銷燬應用POD的時候,不會一次性銷燬太多的應用POD,從而保證業務不中斷或業務SLA不降級。

使用那種調用Eviction API 的工具而不是直接刪除POD,由於Eviction API 會respect Pod Disruption Budgets,好比 kubectl drain命令。

  • PDBs cannot prevent involuntary disruptions from occurring, but they do count against the budget.
  • Pods which are deleted or unavailable due to a rolling upgrade to an application do count against the disruption budget,, but controllers (like deployment and stateful-set) are not limited by PDBs when doing rolling upgrades – the handling of failures during application updates is configured in the controller spec.
  • When a pod is evicted using the eviction API, it is gracefully terminated

參考:
www.kubernetes.org.cn/2486.html
ju.outofmemory.cn/entry/32756…

PDB Example

Separating Cluster Owner and Application Owner Roles

How to perform Disruptive Actions on your Cluster

Write disruption tolerant applications and use PDBs

Controllers

Replica Sets

通常不直接用,而是經過Deployments.
mainly used by Deployments as a mechanism to orchestrate pod creation, deletion and updates.

When to use a ReplicaSet

A ReplicaSet ensures that a specified number of pod replicas are running at any given time.

Working with ReplicaSets

一些操做:

  • kubectl delete. Kubectl will scale the ReplicaSet to zero and wait for it to delete each pod before deleting the ReplicaSet itself
  • --cascade=false會只刪除ReplicaSets,不刪pod
  • 經過修改pod的label,能夠Isolating pods from a ReplicaSet,remove以後會被replaced automatically
  • scale: .spec.replica
  • ReplicaSet can also be a target for Horizontal Pod Autoscalers (HPA). 自動scale

Replication Controller

略,如今不推薦了。

Deployments

A Deployment controller provides declarative updates for Pods and ReplicaSets.

Use case

  • Create a Deployment to rollout a ReplicaSet. The ReplicaSet creates Pods in the background. Check the status of the rollout to see if it succeeds or not.
  • Declare the new state of the Pods by updating the PodTemplateSpec of the Deployment. A new ReplicaSet is created and the Deployment manages moving the Pods from the old ReplicaSet to the new one at a controlled rate. Each new ReplicaSet updates the revision of the Deployment.
  • Rollback to an earlier Deployment revision if the current state of the Deployment is not stable. Each rollback updates the revision of the Deployment.
  • Scale up the Deployment to facilitate more load.
  • Pause the Deployment to apply multiple fixes to its PodTemplateSpec and then resume it to start a new rollout.
  • Use the status of the Deployment as an indicator that a rollout has stuck.
  • Clean up older ReplicaSets that you don’t need anymore

Create

Pod-template-hash label: this label ensures that child ReplicaSets of a Deployment do not overlap. It is generated by hashing the PodTemplate of the ReplicaSet and using the resulting hash as the label value that is added to the ReplicaSet selector, Pod template labels, and in any existing Pods that the ReplicaSet might have.

Update

Deployment can ensure that only a certain number of Pods may be down while they are being updated. By default, it ensures that at least 1 less than the desired number of Pods are up (1 max unavailable).

rollout, rollout history/status, undo......

Scaling

Proportional scaling: RollingUpdate (maxSurge,maxUnavailable)可能短暫大於預期數量

$ kubectl get deploy
NAME                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment     10        10        10           10          50s

$ kubectl set image deploy/nginx-deployment nginx=nginx:sometag
deployment "nginx-deployment" image updated

$ kubectl get rs
NAME                          DESIRED   CURRENT   READY     AGE
nginx-deployment-1989198191   5         5         0         9s
nginx-deployment-618515232    8         8         8         1m複製代碼

Pausing and Resuming

Deployment status

Clean up Policy

You can set .spec.revisionHistoryLimit field in a Deployment to specify how many old ReplicaSets for this Deployment you want to retain

注意:目前不支持Canary Deployment,推薦用multiple Deployment來實現

StatefulSets

since 1.5 取代PetSets,特色是:Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods.
stateful意味着:

  • Stable, unique network identifiers.
  • Stable, persistent storage.
  • Ordered, graceful deployment and scaling. (deployment的滾動沒有這麼嚴格)
  • Ordered, graceful deletion and termination.

Components

components of a StatefulSet.例子

  • A Headless Service (帶selector), named nginx, is used to control the network domain.這種service不帶lb,kube-proxy不處理,dns直接返回後端endpoint
  • The StatefulSet, named web, has a Spec that indicates that 3 replicas of the nginx container will be launched in unique Pods.
  • The volumeClaimTemplates will provide stable storage using PersistentVolumes provisioned by a PersistentVolume Provisioner.
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx # has to match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 3 # by default is 1
  template:
    metadata:
      labels:
        app: nginx # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: gcr.io/google_containers/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: my-storage-class
      resources:
        requests:
          storage: 1Gi複製代碼

Pod Identity

  • Ordinal Index:each Pod in the StatefulSet will be assigned an integer ordinal, in the range [0,N), that is unique over the Set
  • Stable Network ID:The pattern for the constructed hostname is $(statefulset name)-$(ordinal). The example above will create three Pods named web-0,web-1,web-2. A StatefulSet can use a Headless Service to control the domain of its Pods.
  • Stable Storage:PersistentVolume
Cluster Domain Service (ns/name) StatefulSet (ns/name) StatefulSet Domain Pod DNS Pod Hostname
cluster.local default/nginx default/web nginx.default.svc.cluster.local web-{0..N-1}.nginx.default.svc.cluster.local web-{0..N-1}
cluster.local foo/nginx foo/web nginx.foo.svc.cluster.local web-{0..N-1}.nginx.foo.svc.cluster.local web-{0..N-1}
kube.local foo/nginx foo/web nginx.foo.svc.kube.local web-{0..N-1}.nginx.foo.svc.kube.local web-{0..N-1}

Deployment and Scaling Guarantees

  • For a StatefulSet with N replicas, when Pods are being deployed, they are created sequentially, in order from {0..N-1}.
  • When Pods are being deleted, they are terminated in reverse order, from {N-1..0}.
  • Before a scaling operation is applied to a Pod, all of its predecessors must be Running and Ready.
  • Before a Pod is terminated, all of its successors must be completely shutdown.

In Kubernetes 1.7 and later, StatefulSet allows you to relax its ordering guarantees while preserving its uniqueness and identity guarantees via its .spec.podManagementPolicy field.

Update Strategies

On Delete;Rolling Updates;Partitions

Daemon Sets

一個node跑一個pod,做爲一個deamon

Alternatives to DaemonSet

  • Init Scripts
  • Static Pods: create Pods by writing a file to a certain directory watched by Kubelet.

Garbage Collection

Owners and dependents

When you delete an object, you can specify whether the object’s dependents are also deleted automatically. Deleting dependents automatically is called cascading deletion.There are two modes of cascading deletion: background and foreground.
前臺刪除:根對象首先進入 「刪除中」 狀態。=> 垃圾收集器會刪除對象的全部 Dependent。 => 刪除 Owner 對象。
後臺刪除:Kubernetes 會當即刪除 Owner 對象,而後垃圾收集器會在後臺刪除這些 Dependent。

Deployments必須使用propagationPolicy: Foreground
自定義資源目前不支持垃圾回收

Setting the cascading deletion policy

To control the cascading deletion policy, set the deleteOptions.propagationPolicy field on your owner object. Possible values include 「Orphan」, 「Foreground」, or 「Background」.
The default garbage collection policy for many controller resources is orphan, including ReplicationController, ReplicaSet, StatefulSet, DaemonSet, and Deployment.

Jobs - Run to Completion

todo

Cron Jobs

todo

Configuration

Configuration Best Practices

這個優勢像effective k8s了:

General Config Tips

  • 配置要帶版本,能夠回滾
  • YMAL比JSON好
  • ALL IN ONE YAML
  • Don’t specify default values unnecessarily – simple and minimal configs will reduce errors.
  • Put an object description in an annotation to allow better introspection.

Services

  • 先建立service,後建立rc, This lets the scheduler spread the pods that comprise the service.
  • Don’t use hostPort (使用a NodePort service) and hostNetwork unless it is absolutely necessary
  • Use headless services for easy service discovery when you don’t need kube-proxy load balancing.

Using Labels

todo

Container Images

Using kubectl

  • Use kubectl create -f where possible.
  • Use kubectl run and expose to quickly create and expose single container Deployments.

Managing Compute Resources for Containers

Resource requests and limits

todo

Assigning Pods to Nodes

nodeSelector

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  nodeSelector:
    disktype: ssd複製代碼

Interlude: built-in node labels

kubernetes.io/hostname
failure-domain.beta.kubernetes.io/zone
failure-domain.beta.kubernetes.io/region
beta.kubernetes.io/instance-type
beta.kubernetes.io/os
beta.kubernetes.io/arch複製代碼

Affinity and anti-affinity

  • nodeAffinity
    • requiredDuringSchedulingIgnoredDuringExecution
    • preferredDuringSchedulingIgnoredDuringExecution
  • podAffinity
    • requiredDuringSchedulingIgnoredDuringExecution
    • preferredDuringSchedulingIgnoredDuringExecution
  • podAntiAffinity
    • requiredDuringSchedulingIgnoredDuringExecution
    • preferredDuringSchedulingIgnoredDuringExecution
apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: gcr.io/google_containers/pause:2.0複製代碼

Taints and Tolerations

Node affinity, 寫在pod上,描述但願什麼node。

Secrets

Organizing Cluster Access Using kubeconfig Files

Pod Priority and Preemption

Cluster Administration

Managing Resources

Organizing resource configurations

  • 不一樣resource能夠寫在一個yml文件裏
  • kubectl create/delete.. -f 文件,url,文件夾 (--recursive) 建立,刪除
  • kubectl get 獲取
  • kubectl label/annotate 標註
  • kubectl scale/autoscale/apply/edit/patch/replace 更新

Cluster Networking

  • Highly-coupled container-to-container communications: this is solved by pods and localhost communications.
  • Pod-to-Pod communications: this is the primary focus of this document.
  • Pod-to-Service communications: this is covered by services.
  • External-to-Service communications: this is covered by services.

Kubernetes model

  • all containers can communicate with all other containers without NAT
  • all nodes can communicate with all containers (and vice-versa) without NAT
  • the IP that a container sees itself as is the same IP that others see it as

實現:Contiv,Contrail,Flannel,GCE,L2 networks and linux bridging,Nuage,OpenVSwitch,OVN,Calico,Romana,Weave Net

Network Plugins

  • CNI plugins: adhere to the appc/CNI specification, designed for interoperability.
  • Kubenet plugin: implements basic cbr0 using the bridge and host-local CNI plugins

Logging and Monitoring Cluster Activity

Auditing

Kubernetes audit is part of kube-apiserver logging all requests coming to the server.

Resource Usage Monitoring

image
image

Configuring Out Of Resource Handling

Eviction Policy

The kubelet can pro-actively monitor for and prevent against total starvation of a compute resource. In those cases, the kubelet can pro-actively fail one or more pods in order to reclaim the starved resource. When the kubelet fails a pod, it terminates all containers in the pod, and the PodPhase is transitioned to Failed.
Eviction Thresholds:

  • A soft eviction threshold pairs an eviction threshold with a required administrator specified grace period
  • A hard eviction threshold has no grace period, and if observed, the kubelet will take immediate action to reclaim the associated starved resource

Using Multiple Clusters

Federation

Federation makes it easy to manage multiple clusters. It does so by providing 2 major building blocks:
- Sync resources across clusters: Federation provides the ability to keep resources in multiple clusters in sync. This can be used, for example, to ensure that the same deployment exists in multiple clusters.
- Cross cluster discovery: It provides the ability to auto-configure DNS servers and load balancers with backends from all clusters. This can be used, for example, to ensure that a global VIP or DNS record can be used to access backends from multiple clusters.

Setting up Cluster Federation with Kubefed

Cross-cluster Service Discovery using Federated Services

Guaranteed Scheduling For Critical Add-On Pods

Rescheduler ensures that critical add-ons are always scheduled. If the scheduler determines that no node has enough free resources to run the critical add-on pod given the pods that are already running in the cluster the rescheduler tries to free up space for the add-on by evicting some pods; then the scheduler will schedule the add-on pod.
能夠設置一個臨時的taint "CriticalAddonsOnly",只用來部署Critical Add-On Pod,防止其餘pod調度上去

Static Pods

Static pods are managed directly by kubelet daemon on a specific node, without API server observing it. It does not have associated any replication controller, kubelet daemon itself watches it and restarts it when it crashes. There is no health check though. Static pods are always bound to one kubelet daemon and always run on the same node with it.
Kubelet automatically creates so-called mirror pod on Kubernetes API server for each static pod, so the pods are visible there, but they cannot be controlled from the API server.

If you are running clustered Kubernetes and are using static pods to run a pod on every node, you should probably be using a DaemonSet!

能夠經過--pod-manifest-path 或者 --manifest-url設置

Using Sysctls in a Kubernetes Cluster

  • In Linux, the sysctl interface allows an administrator to modify kernel parameters at runtime. Parameters are available via the /proc/sys/ virtual process file system.
  • A number of sysctls are namespaced in today’s Linux kernels. This means that they can be set independently for each pod on a node.

Safe sysctl: In addition to proper namespacing a safe sysctl must be properly isolated between pods on the same node.

Accessing Clusters

//訪問restapi 方式
// 1. proxy
kubectl proxy --port=8083 &
curl localhost:8083/api

// 2.直接訪問
$ APISERVER=$(kubectl config view | grep server | cut -f 2- -d ":" | tr -d " ")
$ TOKEN=$(kubectl describe secret $(kubectl get secrets | grep default | cut -f1 -d ' ') | grep -E '^token' | cut -f2 -d':' | tr -d '\t')
$ curl $APISERVER/api --header "Authorization: Bearer $TOKEN" --insecure複製代碼

several options for connecting to nodes, pods and services from outside the cluster:

  • Access services through public IPs: Use a service with type NodePort or LoadBalancer to make the service reachable outside the cluster. See
  • Access services, nodes, or pods using the Proxy Verb : Does apiserver authentication and authorization prior to accessing the remote service. Use this if the services are not secure enough to expose to the internet, or to gain access to ports on the node IP, or for debugging.
  • Access from a node or pod in the cluster : Run a pod, and then connect to a shell in it using kubectl exec. Connect to other nodes, pods, and services from that shell.
//Discovering builtin services
kubectl cluster-info複製代碼

Kubernetes proxy種類

  1. The kubectl proxy: - runs on a user’s desktop or in a pod - proxies from a localhost address to the Kubernetes apiserver - client to proxy uses HTTP - proxy to apiserver uses HTTPS - locates apiserver - adds authentication headers
  2. The apiserver proxy: - is a bastion built into the apiserver - connects a user outside of the cluster to cluster IPs which otherwise might not be reachable - runs in the apiserver processes - client to proxy uses HTTPS (or http if apiserver so configured) - proxy to target may use HTTP or HTTPS as chosen by proxy using available information - can be used to reach a Node, Pod, or Service - does load balancing when used to reach a Service
  3. The kube proxy: - runs on each node - proxies UDP and TCP - does not understand HTTP - provides load balancing - is just used to reach services
  4. A Proxy/Load-balancer in front of apiserver(s): - existence and implementation varies from cluster to cluster (e.g. nginx) - sits between all clients and one or more apiservers - acts as load balancer if there are several apiservers.
  5. Cloud Load Balancers on external services: - are provided by some cloud providers (e.g. AWS ELB, Google Cloud Load Balancer) - are created automatically when the Kubernetes service has type LoadBalancer - use UDP/TCP only - implementation varies by cloud provider.

Authenticating Across Clusters with kubeconfig

Storage

Volumes

Persistent Volumes

Services, Load Balancing, and Networking

  • Pod 是mortal的,可是Pod IP addresses cannot be relied upon to be stable over time
  • 因此要使用Services
  • Service is (usually) determined by a Label Selector
  • For Kubernetes-native applications, Kubernetes offers a simple Endpoints API that is updated whenever the set of Pods in a Service changes. For non-native applications, Kubernetes offers a virtual-IP-based bridge to Services which redirects to the backend Pods
  • 建立service會用selector建立endpoint選擇後端,也能夠不用selector,手動建立同名endpoint,或者使用type: ExternalName轉發流量到external service
  • 除了ExternalName,service的virtual IP由kube-proxy實現
  • Ingress 7層->Services 4層
  • port和nodePort都是service的端口,前者暴露給集羣內客戶訪問服務
  • service的負載均衡有兩種模式,流量過kubeproxy或者iptables
  • {SVCNAME}_SERVICE_HOST,{SVCNAME}_SERVICE_POR等環境變量會被注入pod
  • 設置spec.clusterIP = None => Headless service => 域名則對於全部Endpoints
  • ServiceType: ClusterIP(default), NodePort(會在每一個node上都開一個端口->service), LoadBalancer(依賴iaas,會有一個EXTERNAL-IP), ExternalName
  • 哪一種service均可以暴露到externalip 上
    kind: Service
    apiVersion: v1
    metadata:
    name: my-service
    spec:
    selector:
      app: MyApp
    ports:
      - protocol: TCP
        port: 80   # service暴露的port
        targetPort: 9376 #默認 = port 指向的port複製代碼
kind: Service
apiVersion: v1
metadata:
  name: my-service
  namespace: prod
spec:
  type: ExternalName
  externalName: my.database.example.com複製代碼

DNS Pods and Services

  • 支持 my-svc.my-namespace.svc.cluster.local, pod-ip-address.my-namespace.pod.cluster.local
  • 默認的如kubernetes

Connecting Applications with Services

tutorial

Ingress Resources

tutorial

Network Policies

tutorial

kubectl exec -ti busybox -- nslookup kubernetes.default複製代碼

Connecting Applications with Services

相關文章
相關標籤/搜索