容器監控實踐—Cortex

時間 2019-12-06

標籤容器監控實踐 cortex 简体版

原文原文鏈接

一.概述

cortex：一個支持多租戶、水平擴展的prometheus服務。前端

當時調研cortex實際上是由於看到了Weave Cloud這個商業產品中的監控模塊介紹，weave也叫weave works，官方地址是：https://cloud.weave.works，是一個專一於容器微服務的paas平臺。linux

WeaveCloud在監控模塊最大化利用了Prometheus，並在其基礎上添加了不少組件，實現了多租戶管理、高可用的監控集羣。其使用的核心監控組件就是cortex。nginx

本文主要分享的是cortex的運行機制，關於Weave Cloud的產品定位和功能能夠看下後續的文章:[商業方案-weave work]()git

Cortex是一個CNCF的沙盒項目，目前被幾個線上產品使用：Weave Cloud、GrafanaCloud和FreshTracks.iogithub

爲何不直接運行Prometheus，而用Cortex？shell

ps:來自cortex kubecon大會演講json

做爲服務，cortex提供了鑑權和訪問控制
數據永久保留，狀態可以被管理
提供持久化、高可用、伸縮性
提供更好的查詢效率，尤爲是長查詢

二.主要功能

針對以上需求，Cortex提供的主要功能或特點以下：bootstrap

支持多租戶：Prometheus自己沒有的租戶概念。這意味着，它沒法對特定於租戶的數據訪問和資源使用配額，提供任何形式的細粒度控制。Cortex能夠從多個獨立的prometheus實例中獲取數據，並按照租戶管理。
長期存儲：基於遠程寫入機制，支持四種開箱即用的長期存儲系統：AWS DynamoDB、AWS S三、Apache Cassandra和Google Cloud Bigtable。
全局視圖：提供全部prometheus server 整合後的時間序列數據的單一，一致的「全局」視圖。
高可用：提供服務實例的水平擴展、聯邦集羣等
最大化利用了Prometheus

類似的競品：api

Prometheus + InfluxDB：使用InfluxData
Prometheus + Thanos：長期存儲、全局視圖
Timbala：多副本、全局視圖，做者是Matt Bostock
M3DB：自動擴縮容，來自uber

產品形態

ps:來自weave work上試用監控模塊時的截圖緩存

1.安裝監控的agent:

2.概覽視圖

3.資源監控面板

4.監控詳情頁面

5.添加監控

6.配置報警

在k8s集羣中部署所須要的yaml列表爲：

[https://github.com/weaveworks...](https://github.com/weaveworks...
)

部署的agent時的腳本內容是：

#!/bin/sh
set -e
# Create a temporary file for the bootstrap binary
TMPFILE="$(mktemp -qt weave_bootstrap.XXXXXXXXXX)" || exit 1
finish(){
  # Send only when this script errors out
  # Filter out the bootstrap errors
  if [ $? -ne 111 ] && [ $? -ne 0 ]; then
    curl -s >/dev/null 2>/dev/null -H "Accept: application/json" -H "Authorization: Bearer $token" -X POST -d \
        '{"type": "onboarding_failed", "messages": {"browser": { "type": "onboarding_failed", "text": "Installation of Weave Cloud agents did not finish."}}}' \
        https://cloud.weave.works/api/notification/external/events || true
  fi
  # Arrange for the bootstrap binary to be deleted
  rm -f "$TMPFILE"
}
# Call finish function on exit
trap finish EXIT
# Parse command-line arguments
for arg in "$@"; do
    case $arg in
        --token=*)
            token=$(echo $arg | cut -d '=' -f 2)
            ;;
    esac
done
if [ -z "$token" ]; then
    echo "error: please specify the instance token with --token=<TOKEN>"
    exit 1
fi
# Notify installation has started
curl -s >/dev/null 2>/dev/null -H "Accept: application/json" -H "Authorization: Bearer $token" -X POST -d \
    '{"type": "onboarding_started", "messages": {"browser": { "type": "onboarding_started", "text": "Installation of Weave Cloud agents has started"}}}' \
    https://cloud.weave.works/api/notification/external/events || true
# Get distribution
unamestr=$(uname)
if [ "$unamestr" = 'Darwin' ]; then
    dist='darwin'
elif [ "$unamestr" = 'Linux' ]; then
    dist='linux'
else
  echo "This OS is not supported"
  exit 1
fi
# Download the bootstrap binary
echo "Downloading the Weave Cloud installer...  "
curl -Ls "https://get.weave.works/bootstrap?dist=$dist" >> "$TMPFILE"
# Make the bootstrap binary executable
chmod +x "$TMPFILE"
# Execute the bootstrap binary
"$TMPFILE" "--scheme=https" "--wc.launcher=get.weave.works" "--wc.hostname=cloud.weave.works" "--report-errors" "$@"

三.實現原理

Cortex與Prometheus的交互圖：

原理圖：

Cortex中各組件的做用：

Retrieval：採集組件，運行在用戶k8s集羣上，從用戶應用中拉取監控指標，並把這些數據推送給雲平臺的服務
Frontend: 負載均衡/路由轉發/權限認證，接受Retrieval發送來的請求，這裏用的nginx
Distributor：分發器，把用戶推送來的監控指標，按照用戶id、指標名稱、標籤作一致性hash，而後並行交給後面的多個ingester處理(grpc交互)。是監控數據寫入的第一站
Ingester：處理器，將監控數據保存到promtheus中，高度定製了MemorySeriesStorage模塊，分塊存儲、寫入內存並索引（使用AWS的DynamoDB產品），最後寫入磁盤
讀寫分離：ingest和query分開爲兩個服務

Cortex由多個可水平擴展的微服務組成。每一個微服務使用最合適的技術進行水平縮放; 大多數是無狀態的，而有些（即Retrieval）是半有狀態的而且依賴於一致性哈希

Prometheus實例從各類目標中抓取樣本，而後將它們推送到Cortex（使用Prometheus的遠程寫入API),並對發送的Protocol Buffers序列化數據進行Snappy壓縮。

Cortex要求每一個HTTP請求都帶有一個header，用於指定請求的租戶ID。請求身份驗證和受權由外部反向代理處理。

傳入的樣本（來自Prometheus的寫入）由Distributor處理，而傳入的讀取（PromQL查詢）由查詢前端處理。

查詢緩存：

查詢時會緩存存查詢結果，並在後續查詢中複用它們。若是緩存的結果不完整，則查詢前端計算所需的子查詢並在下游查詢器上並行執行它們。

併發查詢：

查詢做業接受來自查詢器的gRPC流請求，爲了實現高可用性，建議您運行多個前端，且前端數量少於查詢器數量。在大多數狀況下，兩個應該足夠了。