Kubernetes集羣「體檢」之Polaris

「本文已參與好文召集令活動,點擊查看:後端、大前端雙賽道投稿,2萬元獎池等你挑戰!前端

1 Polaris簡介

隨着 Kubernetes 的普遍使用,如何保證集羣穩定運行,成爲了開發和運維團隊關注的焦點。在集羣中部署應用時,像忘記配置資源請求或忘記配置限制這樣簡單的事情可能就會破壞自動伸縮,甚至致使工做負載耗盡資源。這樣種種的配置問題經常致使生產中斷,爲了不它們咱們用 Polaris 來預防。Polaris是fairwinds開發的一款開源的kubernetes集羣健康檢查組件。經過分析集羣中的部署配置,從而發現並避免影響集羣穩定性、可靠性、可伸縮性和安全性的配置問題。python

2 Polaris功能

Polaris是一款經過分析部署配置,從而發現集羣中存在的問題的健康檢查組件。固然,Polaris的目標可不單單只是發現問題,同時也提供避免問題的解決方案,確保集羣處於健康狀態。下面將會介紹Polaris的主要功能: Polaris 包含3個組件,分別實現了不一樣的功能:react

  • Dashboard - 以圖表的形式查看當前Kubernetes workloads的工做狀態和優化點。
  • Webhook - 阻止在集羣中安裝不符合標準的應用
  • CLI - 檢查本地的yaml文件,可結合CI/CD使用

2.1 Dashboard

Dashboard是polaris提供的可視化工具,能夠查看Kubernetes workloads狀態的概覽以及優化點。也能夠按類別、名稱空間和工做負載查看。nginx

2.1.1 概覽集羣狀態

  • 查看集羣健康評分
  • 查看集羣檢查結果
  • 查看集羣版本、節點、pod、名稱空間數量
# kubectl apply -f https://github.com/fairwindsops/polaris/releases/latest/download/dashboard.yaml
# kubectl port-forward --namespace polaris svc/polaris-dashboard 8080:80
複製代碼

按類別查看檢查結果git

  • Health Checks
  • Images
  • Networking
  • Resources
  • Security

按名稱空間查看檢查結果github

2.1.2 檢查本地yaml文件運行

polaris dashboard --port 8080 --audit-path=/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx
複製代碼

2.2 Webhook

Polaris能夠做爲一個admission controller運行,做爲一個validating webhook。它接受與儀表板相同的配置,並能夠運行相同的驗證。這個webhook將拒絕任何觸發驗證錯誤的workloads 。這代表了Polaris更大的目標,不單單是經過儀表板的可見性來鼓勵更好的配置,而是經過這個webhook來實際執行它。Polaris不會修復workloads,只會阻止他們。web

  • 使用和dashboard相同的配置
  • 阻止全部部署配置不經過的應用安裝到集羣
  • 不單單可以查看集羣當前存在的缺陷,還能預防缺陷

2.3 CLI

在命令行上也可使用Polaris來審計本地文件或正在運行的集羣。這對於在CI/CD管道的基礎設施代碼上運行Polaris特別有幫助。若是Polaris給出的審計分數低於某個閾值,或者出現任何錯誤,可以使用命令行標誌來致使CI/CD失敗。json

  • 檢查本地文件或正在運行的集羣
  • 能夠結合CI/CD,部署配置校驗不經過時直接讓CI/CD失敗

3 安裝與使用

polaris支持kubectl, helm and local binary三種安裝方式,本文選擇最簡單的安裝方式,分別介紹三個組件的安裝後端

3.1 Dashboard安裝

Helm安全

添加helm charts倉庫

helm repo add reactiveops-stable https://charts.reactiveops.com/stable 
複製代碼

更新charts倉庫並安裝Dashboard組件

helm upgrade --install polaris reactiveops-stable/polaris --namespace polaris 
複製代碼

若是須要在本地查看Dashboard儀表盤,可使用如下命令,進行本地端口轉發

kubectl port-forward --namespace polaris svc/polaris-dashboard 8080:80 
複製代碼

3.2 Webhook安裝

在集羣中安裝Webhook組件後,將會阻止不符合標準的應用部署在集羣中。

helm

添加helm charts倉庫

helm repo add reactiveops-stable https://charts.reactiveops.com/stable 
複製代碼

更新charts倉庫並安裝Webhook組件

helm upgrade --install polaris reactiveops-stable/polaris --namespace polaris \
  --set webhook.enable=true --set dashboard.enable=false 
複製代碼

3.3 CLI安裝

若是須要在本地測試polaris,能夠下載二進制文件安裝 releases page,也可使用 Homebrew安裝:

brew tap reactiveops/tap
brew install reactiveops/tap/polaris
polaris --version
複製代碼

使用CLI檢查本地配置文件

polaris --audit --audit-path ./deploy/ 
複製代碼

能夠將掃描結果保存到yaml文件中

polaris --audit --output-format yaml > report.yaml 
複製代碼

4 使用Polaris

上面簡單的介紹了,polaris的安裝與基本使用。可是,若是要根據咱們項目的實際狀況來結合polaris,使用默認配置就不能知足需求了。因此咱們還須要知道如何定義polaris檢查規則的配置文件,實現自定義配置。 在自定義配置polaris以前,咱們須要先了解一下polaris檢查的等級以及支持的檢查類型。 polaris檢查的嚴重等級分爲errorwarningignore ,polaris不會檢查ignore等級的配置項。 polaris支持的檢查類型有:Health ChecksImagesNetworkingResourcesSecurity,下面咱們將一一介紹:

4.1 健康檢查(Health Checks)

Polaris 支持校驗pods中是否存在readiness和liveiness探針

key default description
readinessProbeMissing warning 沒有爲Pod配置readiness探針時失敗
livenessProbeMissing warning 沒有爲Pod配置liveness探針時失敗
tagNotSpecified danger 沒有爲鏡像指定tag或者指定tag爲latest時失敗
pullPolicyNotAlways warning 當鏡像拉取策略不是 always時失敗
priorityClassNotSet ignore 當沒有爲Pod配置priorityClassName 時失敗
multipleReplicasForDeployment ignore DeploymentReplicas爲1時失敗
missingPodDisruptionBudget ignore

4.2 資源

polaris支持校驗內存、cpu使用限制是否配置

key default description
cpuRequestsMissing warning 沒有配置 resources.requests.cpu 時失敗
memoryRequestsMissing warning 沒有配置 resources.requests.memory 時失敗
cpuLimitsMissing warning 沒有配置 resources.limits.cpu 時失敗
memoryLimitsMissing warning 沒有配置 resources.limits.memory 時失敗

對於內存、cpu等資源配置,還能夠配置範圍檢查。只有當配置在指定區間內才能夠經過檢查。

limits:
  type: object
  required:
  - memory
  - cpu
    properties:
    memory:
      type: string
      resourceMinimum: 100M
      resourceMaximum: 6G
    cpu:
      type: string
      resourceMinimum: 100m
      resourceMaximum: "2" 
複製代碼

4.3 安全

key default description
hostIPCSet danger Fails when hostIPC attribute is configured.
hostPIDSet danger Fails when hostPID attribute is configured.
notReadOnlyRootFilesystem warning Fails when securityContext.readOnlyRootFilesystem is not true.
privilegeEscalationAllowed danger Fails when securityContext.allowPrivilegeEscalation is true.
runAsRootAllowed warning Fails when securityContext.runAsNonRoot is not true.
runAsPrivileged danger Fails when securityContext.privileged is true.
insecureCapabilities warning Fails when securityContext.capabilities includes one of the capabilities listed here(opens new window)
dangerousCapabilities danger Fails when securityContext.capabilities includes one of the capabilities listed here(opens new window)
hostNetworkSet warning Fails when hostNetwork attribute is configured.
hostPortSet warning Fails when hostPort attribute is configured.
tlsSettingsMissing warning Fails when an Ingress lacks TLS settings.

4.4 自定義掃描規則

根據上文的介紹,咱們已經能夠根據項目的實際狀況,定義本身的掃描配置。若是以爲polaris提供的檢查規則不知足需求的話,咱們還能夠自定義檢查規則。 好比:咱們能夠自定義規則檢查鏡像來源,當鏡像來自quay.io拋出警告

checks:
  imageRegistry: warning
customChecks:
  imageRegistry:
    successMessage: Image comes from allowed registries
    failureMessage: Image should not be from disallowed registry
    category: Images
    target: Container # target can be "Container" or "Pod"
    schema:
      '$schema': http://json-schema.org/draft-07/schema
      type: object
      properties:
        image:
          type: string
          not:
            pattern: ^quay.io 

複製代碼

也能夠指定檢查項

checks:
  cpuRequestsMissing: danger
  memoryRequestsMissing: danger
  cpuLimitsMissing: danger
  memoryLimitsMissing: danger
複製代碼
polaris audit -c check_config.yaml --.......
複製代碼

5 檢查結果

{
  "PolarisOutputVersion": "1.0",
  "AuditTime": "2021-07-01T15:07:00+08:00",
  "SourceType": "Path",
  "SourceName": "/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx",
  "DisplayName": "/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx",
  "ClusterInfo": {
    "Version": "unknown",
    "Nodes": 0,
    "Pods": 0,
    "Namespaces": 0,
    "Controllers": 1
  },
  "Results": [
    {
      "Name": "nginx-config",
      "Namespace": "",
      "Kind": "ConfigMap",
      "Results": {},
      "PodResult": null,
      "CreatedTime": "0001-01-01T00:00:00Z"
    },
    {
      "Name": "nginx-deployment",
      "Namespace": "",
      "Kind": "Deployment",
      "Results": {},
      "PodResult": {
        "Name": "",
        "Results": {},
        "ContainerResults": [
          {
            "Name": "nginx",
            "Results": {
              "cpuLimitsMissing": {
                "ID": "cpuLimitsMissing",
                "Message": "CPU limits should be set",
                "Details": null,
                "Success": false,
                "Severity": "danger",
                "Category": "Efficiency"
              },
              "cpuRequestsMissing": {
                "ID": "cpuRequestsMissing",
                "Message": "CPU requests should be set",
                "Details": null,
                "Success": false,
                "Severity": "danger",
                "Category": "Efficiency"
              },
              "memoryLimitsMissing": {
                "ID": "memoryLimitsMissing",
                "Message": "Memory limits should be set",
                "Details": null,
                "Success": false,
                "Severity": "danger",
                "Category": "Efficiency"
              },
              "memoryRequestsMissing": {
                "ID": "memoryRequestsMissing",
                "Message": "Memory requests should be set",
                "Details": null,
                "Success": false,
                "Severity": "danger",
                "Category": "Efficiency"
              }
            }
          }
        ]
      },
      "CreatedTime": "0001-01-01T00:00:00Z"
    }
  ],
  "Score": 0
}
複製代碼

6 Python 處理檢查結果

當對一個集羣運行Pollaris檢查後,返回的結果是json,不夠直觀,咱們使用Python,處理結果後輸出到excel表格中,方便查看

import yaml
import os
import xlsxwriter

# config
fileNamePath = os.path.split(os.path.realpath(__file__))[0]
config = os.path.join(fileNamePath,'check_config.yaml')
cluster_config = os.path.join(fileNamePath,'cluster_list.yaml')

# variable
scan_controller_type = ["Deployment", "DaemonSet", "StatefulSet"]
def read_cluster():
    f = open(cluster_config,'r',encoding='utf-8')
    cont = f.read()
    return yaml.load(cont, Loader=yaml.FullLoader)


def generate_report(cluster_id: str):
    scan_command = f"polaris audit -c {config} --kubeconfig ~/.kube/config --only-show-failed-tests true --output-file result/{cluster_id}.yaml"
    try:
        os.system(scan_command)
    except Exception as e:
        print(e)

def format_data(cluster):
    cluster_report = os.path.join(fileNamePath, 'result/{}.yaml'.format(cluster))
    f = open(cluster_report, 'r', encoding='utf-8')
    cont = f.read()
    x = yaml.load(cont, Loader=yaml.FullLoader)
    data_result = x["Results"]
    data_list = []
    for item in data_result:
        if item["Kind"] in scan_controller_type and item['PodResult']["ContainerResults"][0]["Results"]:
            pod_scan_result = []
            for pod_result in item['PodResult']["ContainerResults"]:
                pod_name = pod_result["Name"]
                pod_scan_result.append([item for item in pod_result["Results"]])
                obj = [cluster, item["Kind"], item["Namespace"], item["Name"], pod_name, str(pod_scan_result[0])]
                data_list.append(obj)
    return data_list

def excel_config(workbook):
    column_name = ['ClusterID', 'Kind', 'NameSpace', 'Name', 'PodName', 'Scan Result']

    merge_format = workbook.add_format({
        'font_size': 22,
        'bold': True,
        'font_color': '#FFFFFF',
        'border': 1,
        'font_name':u'蘋方-簡',
        'align': 'center',
        'valign': 'vcenter',
        'fg_color': '#0174DF'
    })
    Title_format = workbook.add_format({
        'font_size': 18,
        'border': 1,
        'bold': True,
        'align': 'center',
        'font_name': u'蘋方-簡',
        'valign': 'vcenter',
    })
    data_format = workbook.add_format({
        'font_size': 16,
        'border': 1,
        'align': 'center',
        'font_name': u'蘋方-簡',
        'valign': 'vcenter',
    })
    return column_name, merge_format, Title_format, data_format


def generate_excel():
    workbook = xlsxwriter.Workbook("scan_result.xlsx")
    column_name, merge_format, Title_format, data_format = excel_config(workbook)
    for cluster in read_cluster()["clusters"]:
        print(f"Scan cluster start: {cluster}")
        generate_report(cluster)
        worksheet = workbook.add_worksheet(cluster)
        worksheet.merge_range('A1:F1', f'集羣 {cluster} Requests/Limits 掃描結果', merge_format)
        worksheet.set_column('A:F', 35)
        worksheet.set_column('F:F', 130)
        worksheet.set_row(0, 50)
        global ECSNUM
        ECSNUM = 3
        scan_result = format_data(cluster)
        if scan_result != None:
            worksheet.write_row('A2', column_name, Title_format)
            # 若是結不爲空,則表明有資源,則寫入數據
            for item in scan_result:
                worksheet.write_row('A' + str(ECSNUM), item, data_format)
                ECSNUM += 1
        # 不然,表明該地域無資源,寫入 NULL
        else:
            worksheet.merge_range('A3:F3', 'NOT Found INFO', data_format)

    workbook.close()

if __name__ == '__main__':
    generate_excel()

複製代碼

相關文章
相關標籤/搜索