簡介:做爲CNCF[成員]( https://landscape.cncf.io/car...[Weave Flagger](flagger.app)提供了持續集成和持續交付的各項能力。Flagger將漸進式發佈總結爲3類: - **灰度發佈/金絲雀發佈(Canary)**:用於漸進式切流到灰度版本(progressive traffic shifting) - **A/B測試(A/B Testing)**:用於根據請求信息將
做爲CNCF成員,Weave Flagger提供了持續集成和持續交付的各項能力。Flagger將漸進式發佈總結爲3類:git
本篇將介紹Flagger on ASM的漸進式灰度發佈實踐。github
執行以下命令部署flagger(完整腳本參見:demo\_canary.sh)。web
alias k="kubectl --kubeconfig $USER_CONFIG" alias h="helm --kubeconfig $USER_CONFIG" cp $MESH_CONFIG kubeconfig k -n istio-system create secret generic istio-kubeconfig --from-file kubeconfig k -n istio-system label secret istio-kubeconfig istio/multiCluster=true h repo add flagger https://flagger.app h repo update k apply -f $FLAAGER_SRC/artifacts/flagger/crd.yaml h upgrade -i flagger flagger/flagger --namespace=istio-system \ --set crd.create=false \ --set meshProvider=istio \ --set metricsServer=http://prometheus:9090 \ --set istio.kubeconfig.secretName=istio-kubeconfig \ --set istio.kubeconfig.key=kubeconfig
在灰度發佈過程當中,Flagger會請求ASM更新用於灰度流量配置的VirtualService,這個VirtualService會使用到命名爲public-gateway
的Gateway。爲此咱們建立相關Gateway配置文件public-gateway.yaml
以下:json
apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: public-gateway namespace: istio-system spec: selector: istio: ingressgateway servers: - port: number: 80 name: http protocol: HTTP hosts: - "*"
執行以下命令部署Gateway:api
kubectl --kubeconfig "$MESH_CONFIG" apply -f resources_canary/public-gateway.yaml
flagger-loadtester是灰度發佈階段,用於探測灰度POD實例的應用。bash
執行以下命令部署flagger-loadtester:cookie
kubectl --kubeconfig "$USER_CONFIG" apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main"
咱們首先使用Flagger發行版自帶的HPA配置(這是一個運維級的HPA),待完成完整流程後,咱們再使用應用級的HPA。app
執行以下命令部署PodInfo及其HPA:less
kubectl --kubeconfig "$USER_CONFIG" apply -k "https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main"
Canary是基於Flagger進行灰度發佈的核心CRD,詳見How it works。咱們首先部署以下Canary配置文件podinfo-canary.yaml
,完成完整的漸進式灰度流程,而後在此基礎上引入應用維度的監控指標,來進一步實現應用有感知的漸進式灰度發佈。運維
apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec: # deployment reference targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo # the maximum time in seconds for the canary deployment # to make progress before it is rollback (default 600s) progressDeadlineSeconds: 60 # HPA reference (optional) autoscalerRef: apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler name: podinfo service: # service port number port: 9898 # container port number or name (optional) targetPort: 9898 # Istio gateways (optional) gateways: - public-gateway.istio-system.svc.cluster.local # Istio virtual service host names (optional) hosts: - '*' # Istio traffic policy (optional) trafficPolicy: tls: # use ISTIO_MUTUAL when mTLS is enabled mode: DISABLE # Istio retry policy (optional) retries: attempts: 3 perTryTimeout: 1s retryOn: "gateway-error,connect-failure,refused-stream" analysis: # schedule interval (default 60s) interval: 1m # max number of failed metric checks before rollback threshold: 5 # max traffic percentage routed to canary # percentage (0-100) maxWeight: 50 # canary increment step # percentage (0-100) stepWeight: 10 metrics: - name: request-success-rate # minimum req success rate (non 5xx responses) # percentage (0-100) thresholdRange: min: 99 interval: 1m - name: request-duration # maximum req duration P99 # milliseconds thresholdRange: max: 500 interval: 30s # testing (optional) webhooks: - name: acceptance-test type: pre-rollout url: http://flagger-loadtester.test/ timeout: 30s metadata: type: bash cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token" - name: load-test url: http://flagger-loadtester.test/ timeout: 5s metadata: cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"
執行以下命令部署Canary:
kubectl --kubeconfig "$USER_CONFIG" apply -f resources_canary/podinfo-canary.yaml
部署Canary後,Flagger會將名爲podinfo
的Deployment複製爲podinfo-primary
,並將podinfo-primary
擴容至HPA定義的最小POD數量。而後逐步將名爲podinfo
的這個Deployment的POD數量將縮容至0。也就是說,podinfo
將做爲灰度版本的Deployment,podinfo-primary
將做爲生產版本的Deployment。
同時,建立3個服務——podinfo
、podinfo-primary
和podinfo-canary
,前二者指向podinfo-primary
這個Deployment,最後者指向podinfo
這個Deployment。
podinfo
執行以下命令,將灰度Deployment的版本從3.1.0
升級到3.1.1
:
kubectl --kubeconfig "$USER_CONFIG" -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1
此時,Flagger將開始執行如本系列第一篇所述的漸進式灰度發佈流程,這裏再簡述主要流程以下:
咱們能夠經過以下命令觀察這個漸進式切流的過程:
while true; do kubectl --kubeconfig "$USER_CONFIG" -n test describe canary/podinfo; sleep 10s;done
輸出的日誌信息示意以下:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Synced 39m flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation Normal Synced 38m (x2 over 39m) flagger all the metrics providers are available! Normal Synced 38m flagger Initialization done! podinfo.test Normal Synced 37m flagger New revision detected! Scaling up podinfo.test Normal Synced 36m flagger Starting canary analysis for podinfo.test Normal Synced 36m flagger Pre-rollout check acceptance-test passed Normal Synced 36m flagger Advance podinfo.test canary weight 10 Normal Synced 35m flagger Advance podinfo.test canary weight 20 Normal Synced 34m flagger Advance podinfo.test canary weight 30 Normal Synced 33m flagger Advance podinfo.test canary weight 40 Normal Synced 29m (x4 over 32m) flagger (combined from similar events): Promotion completed! Scaling down podinfo.test
相應的Kiali視圖(可選),以下圖所示:
到此,咱們完成了一個完整的漸進式灰度發佈流程。以下是擴展閱讀。
在完成上述漸進式灰度發佈流程的基礎上,咱們接下來再來看上述Canary配置中,關於HPA的配置。
autoscalerRef: apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler name: podinfo
這個名爲podinfo
的HPA是Flagger自帶的配置,當灰度Deployment的CPU利用率達到99%
時擴容。完整配置以下:
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: podinfo spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo minReplicas: 2 maxReplicas: 4 metrics: - type: Resource resource: name: cpu target: type: Utilization # scale up if usage is above # 99% of the requested CPU (100m) averageUtilization: 99
咱們在前面一篇中講述了應用級擴縮容的實踐,在此,咱們將其應用於灰度發佈的過程當中。
執行以下命令部署感知應用請求數量的HPA,實如今QPS達到10
時進行擴容(完整腳本參見:advanced\_canary.sh):
kubectl --kubeconfig "$USER_CONFIG" apply -f resources_hpa/requests_total_hpa.yaml
相應地,Canary配置更新爲:
autoscalerRef: apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler name: podinfo-total
podinfo
執行以下命令,將灰度Deployment的版本從3.1.0
升級到3.1.1
:
kubectl --kubeconfig "$USER_CONFIG" -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1
命令觀察這個漸進式切流的過程:
while true; do k -n test describe canary/podinfo; sleep 10s;done
在漸進式灰度發佈過程當中(在出現Advance podinfo.test canary weight 10
信息後,見下圖),咱們使用以下命令,從入口網關發起請求以增長QPS:
INGRESS_GATEWAY=$(kubectl --kubeconfig $USER_CONFIG -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}') hey -z 20m -c 2 -q 10 http://$INGRESS_GATEWAY
使用以下命令觀察漸進式灰度發佈進度:
watch kubectl --kubeconfig $USER_CONFIG get canaries --all-namespaces
使用以下命令觀察hpa的副本數變化:
watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total
結果以下圖所示,在漸進式灰度發佈過程當中,當切流到30%的某一時刻,灰度Deployment的副本數爲4:
在完成上述灰度中的應用級擴縮容的基礎上,最後咱們再來看上述Canary配置中,關於metrics的配置:
analysis: metrics: - name: request-success-rate # minimum req success rate (non 5xx responses) # percentage (0-100) thresholdRange: min: 99 interval: 1m - name: request-duration # maximum req duration P99 # milliseconds thresholdRange: max: 500 interval: 30s # testing (optional)
到目前爲止,Canary中使用的metrics配置一直是Flagger的兩個內置監控指標:請求成功率(request-success-rate
)和請求延遲(request-duration
)。以下圖所示,Flagger中不一樣平臺對內置監控指標的定義,其中,istio使用的是本系列第一篇介紹的Mixerless Telemetry相關的遙測數據。
爲了展現灰度發佈過程當中,遙測數據爲驗證灰度環境帶來的更多靈活性,咱們再次以istio_requests_total
爲例,建立一個名爲not-found-percentage
的MetricTemplate,統計請求返回404錯誤碼的數量佔請求總數的比例。
配置文件metrics-404.yaml
以下(完整腳本參見:advanced\_canary.sh):
apiVersion: flagger.app/v1beta1 kind: MetricTemplate metadata: name: not-found-percentage namespace: istio-system spec: provider: type: prometheus address: http://prometheus.istio-system:9090 query: | 100 - sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}", response_code!="404" }[{{ interval }}] ) ) / sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}" }[{{ interval }}] ) ) * 100
執行以下命令建立上述MetricTemplate:
k apply -f resources_canary2/metrics-404.yaml
相應地,Canary中metrics的配置更新爲:
analysis: metrics: - name: "404s percentage" templateRef: name: not-found-percentage namespace: istio-system thresholdRange: max: 5 interval: 1m
最後,咱們一次執行完整的實驗腳本。腳本advanced_canary.sh
示意以下:
#!/usr/bin/env sh SCRIPT_PATH="$( cd "$(dirname "$0")" >/dev/null 2>&1 pwd -P )/" cd "$SCRIPT_PATH" || exit source config alias k="kubectl --kubeconfig $USER_CONFIG" alias m="kubectl --kubeconfig $MESH_CONFIG" alias h="helm --kubeconfig $USER_CONFIG" echo "#### I Bootstrap ####" echo "1 Create a test namespace with Istio sidecar injection enabled:" k delete ns test m delete ns test k create ns test m create ns test m label namespace test istio-injection=enabled echo "2 Create a deployment and a horizontal pod autoscaler:" k apply -f $FLAAGER_SRC/kustomize/podinfo/deployment.yaml -n test k apply -f resources_hpa/requests_total_hpa.yaml k get hpa -n test echo "3 Deploy the load testing service to generate traffic during the canary analysis:" k apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main" k get pod,svc -n test echo "......" sleep 40s echo "4 Create a canary custom resource:" k apply -f resources_canary2/metrics-404.yaml k apply -f resources_canary2/podinfo-canary.yaml k get pod,svc -n test echo "......" sleep 120s echo "#### III Automated canary promotion ####" echo "1 Trigger a canary deployment by updating the container image:" k -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1 echo "2 Flagger detects that the deployment revision changed and starts a new rollout:" while true; do k -n test describe canary/podinfo; sleep 10s;done
使用以下命令執行完整的實驗腳本:
sh progressive_delivery/advanced_canary.sh
實驗結果示意以下:
#### I Bootstrap #### 1 Create a test namespace with Istio sidecar injection enabled: namespace "test" deleted namespace "test" deleted namespace/test created namespace/test created namespace/test labeled 2 Create a deployment and a horizontal pod autoscaler: deployment.apps/podinfo created horizontalpodautoscaler.autoscaling/podinfo-total created NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo-total Deployment/podinfo <unknown>/10 (avg) 1 5 0 0s 3 Deploy the load testing service to generate traffic during the canary analysis: service/flagger-loadtester created deployment.apps/flagger-loadtester created NAME READY STATUS RESTARTS AGE pod/flagger-loadtester-76798b5f4c-ftlbn 0/2 Init:0/1 0 1s pod/podinfo-689f645b78-65n9d 1/1 Running 0 28s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/flagger-loadtester ClusterIP 172.21.15.223 <none> 80/TCP 1s ...... 4 Create a canary custom resource: metrictemplate.flagger.app/not-found-percentage created canary.flagger.app/podinfo created NAME READY STATUS RESTARTS AGE pod/flagger-loadtester-76798b5f4c-ftlbn 2/2 Running 0 41s pod/podinfo-689f645b78-65n9d 1/1 Running 0 68s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/flagger-loadtester ClusterIP 172.21.15.223 <none> 80/TCP 41s ...... #### III Automated canary promotion #### 1 Trigger a canary deployment by updating the container image: deployment.apps/podinfo image updated 2 Flagger detects that the deployment revision changed and starts a new rollout: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Synced 10m flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation Normal Synced 9m23s (x2 over 10m) flagger all the metrics providers are available! Normal Synced 9m23s flagger Initialization done! podinfo.test Normal Synced 8m23s flagger New revision detected! Scaling up podinfo.test Normal Synced 7m23s flagger Starting canary analysis for podinfo.test Normal Synced 7m23s flagger Pre-rollout check acceptance-test passed Normal Synced 7m23s flagger Advance podinfo.test canary weight 10 Normal Synced 6m23s flagger Advance podinfo.test canary weight 20 Normal Synced 5m23s flagger Advance podinfo.test canary weight 30 Normal Synced 4m23s flagger Advance podinfo.test canary weight 40 Normal Synced 23s (x4 over 3m23s) flagger (combined from similar events): Promotion completed! Scaling down podinfo.test
本文內容由阿里雲實名註冊用戶自發貢獻,版權歸原做者全部,阿里雲開發者社區不擁有其著做權,亦不承擔相應法律責任。具體規則請查看《阿里雲開發者社區用戶服務協議》和《阿里雲開發者社區知識產權保護指引》。若是您發現本社區中有涉嫌抄襲的內容,填寫侵權投訴表單進行舉報,一經查實,本社區將馬上刪除涉嫌侵權內容。