Grafana 備份恢復教程

原文連接:fuckcloudnative.io/posts/how-t…node

目前咱們 k8s 集羣的 Grafana 使用 ceph 做爲持久化存儲,一但我將 Grafana 的 Deployment 刪除重建以後,以前的全部數據都會丟失,重建的 PV 會映射到後端存儲的新位置。萬幸的是,我真的手欠重建了,尚未提早備份。。。萬幸個鬼啊我。python

在我歷經 250 分鐘重建 Dashboard 以後,內心久久不能平靜,一句 MMP 差點就要脫口而出。linux

1. 低級方案

再這樣下去我真的要變成 250 了,這怎麼能忍,立馬打開 Google 研究了一把 Grafana 備份的各類騷操做,發現大部分備份方案都是經過 shell 腳本調用 Grafana 的 API 來導出各類配置。備份腳本大部分都集中在這個 gist 中:git

我挑選出幾個比較好用的,你們也能夠自行挑選其餘的。github

導出腳本

#!/bin/bash

# Usage:
#
# export_grafana_dashboards.sh https://admin:REDACTED@grafana.dedevsecops.com

create_slug () {
  echo "$1" | iconv -t ascii//TRANSLIT | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z
}

full_url=$1
username=$(echo "${full_url}" | cut -d/ -f 3 | cut -d: -f 1)
base_url=$(echo "${full_url}" | cut -d@ -f 2)
folder=$(create_slug "${username}-${base_url}")

mkdir "${folder}"
for db_uid in $(curl -s "${full_url}/api/search" | jq -r .[].uid); do
  db_json=$(curl -s "${full_url}/api/dashboards/uid/${db_uid}")
  db_slug=$(echo "${db_json}" | jq -r .meta.slug)
  db_title=$(echo "${db_json}" | jq -r .dashboard.title)
  filename="${folder}/${db_slug}.json"
  echo "Exporting \"${db_title}\" to \"${filename}\"..."
  echo "${db_json}" | jq -r . > "${filename}"
done
echo "Done"
複製代碼

這個腳本比較簡單,直接導出了全部 Dashboard 的 json 配置,也沒有標記目錄信息,若是你用它導出的配置來恢復 Grafana,全部的 Dashboard 都會導入到 Grafana 的 General 目錄下,不太友好。web

導入腳本

grafana-dashboard-importer.shdocker

#!/bin/bash
#
# add the "-x" option to the shebang line if you want a more verbose output
#
#
OPTSPEC=":hp:t:k:"

show_help() {
cat << EOF Usage: $0 [-p PATH] [-t TARGET_HOST] [-k API_KEY] Script to import dashboards into Grafana -p Required. Root path containing JSON exports of the dashboards you want imported. -t Required. The full URL of the target host -k Required. The API key to use on the target host -h Display this help and exit. EOF
}

###### Check script invocation options ######
while getopts "$OPTSPEC" optchar; do
    case "$optchar" in
        h)
            show_help
            exit
            ;;
        p)
            DASH_DIR="$OPTARG";;
        t)
            HOST="$OPTARG";;
        k)
            KEY="$OPTARG";;
        \?)
          echo "Invalid option: -$OPTARG" >&2
          exit 1
          ;;
        :)
          echo "Option -$OPTARG requires an argument." >&2
          exit 1
          ;;
    esac
done

if [ -z "$DASH_DIR" ] || [ -z "$HOST" ] || [ -z "$KEY" ]; then
    show_help
    exit 1
fi

# set some colors for status OK, FAIL and titles
SETCOLOR_SUCCESS="echo -en \\033[0;32m"
SETCOLOR_FAILURE="echo -en \\033[1;31m"
SETCOLOR_NORMAL="echo -en \\033[0;39m"
SETCOLOR_TITLE_PURPLE="echo -en \\033[0;35m" # purple

# usage log "string to log" "color option"
function log_success() {
   if [ $# -lt 1 ]; then
       ${SETCOLOR_FAILURE}
       echo "Not enough arguments for log function! Expecting 1 argument got $#"
       exit 1
   fi

   timestamp=$(date "+%Y-%m-%d %H:%M:%S %Z")

   ${SETCOLOR_SUCCESS}
   printf "[%s] $1\n" "$timestamp"
   ${SETCOLOR_NORMAL}
}

function log_failure() {
   if [ $# -lt 1 ]; then
       ${SETCOLOR_FAILURE}
       echo "Not enough arguments for log function! Expecting 1 argument got $#"
       exit 1
   fi

   timestamp=$(date "+%Y-%m-%d %H:%M:%S %Z")

   ${SETCOLOR_FAILURE}
   printf "[%s] $1\n" "$timestamp"
   ${SETCOLOR_NORMAL}
}

function log_title() {
   if [ $# -lt 1 ]; then
       ${SETCOLOR_FAILURE}
       log_failure "Not enough arguments for log function! Expecting 1 argument got $#"
       exit 1
   fi

   ${SETCOLOR_TITLE_PURPLE}
   printf "|-------------------------------------------------------------------------|\n"
   printf "|%s|\n" "$1";
   printf "|-------------------------------------------------------------------------|\n"
   ${SETCOLOR_NORMAL}
}

if [ -d "$DASH_DIR" ]; then
    DASH_LIST=$(find "$DASH_DIR" -mindepth 1 -name \*.json)
    if [ -z "$DASH_LIST" ]; then
        log_title "----------------- $DASH_DIR contains no JSON files! -----------------"
        log_failure "Directory $DASH_DIR does not appear to contain any JSON files for import. Check your path and try again."
        exit 1
    else
        FILESTOTAL=$(echo "$DASH_LIST" | wc -l)
        log_title "----------------- Starting import of $FILESTOTAL dashboards -----------------"
    fi
else
    log_title "----------------- $DASH_DIR directory not found! -----------------"
    log_failure "Directory $DASH_DIR does not exist. Check your path and try again."
    exit 1
fi

NUMSUCCESS=0
NUMFAILURE=0
COUNTER=0

for DASH_FILE in $DASH_LIST; do
    COUNTER=$((COUNTER + 1))
    echo "Import $COUNTER/$FILESTOTAL: $DASH_FILE..."
    RESULT=$(cat "$DASH_FILE" | jq '. * {overwrite: true, dashboard: {id: null}}' | curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $KEY" "$HOST"/api/dashboards/db -d @-)
    if [[ "$RESULT" == *"success"* ]]; then
        log_success "$RESULT"
        NUMSUCCESS=$((NUMSUCCESS + 1))
    else
        log_failure "$RESULT"
        NUMFAILURE=$((NUMFAILURE + 1))
    fi
done

log_title "Import complete. $NUMSUCCESS dashboards were successfully imported. $NUMFAILURE dashboard imports failed.";
log_title "------------------------------ FINISHED ---------------------------------";
複製代碼

導入腳本須要目標機器上的 Grafana 已經啓動,並且須要提供管理員 API Key。登陸 Grafana Web 界面,打開 API Keys:shell

新建一個 API Key,角色選擇 Admin,過時時間本身調整:macos

導入方式:json

$ ./grafana-dashboard-importer.sh -t http://<grafana_svc_ip>:<grafana_svc_port> -k <api_key> -p <backup folder>
複製代碼

其中 -p 參數指定的是以前導出的 json 所在的目錄。

目前的方案痛點在於只能備份 Dashboard,不能備份其餘的配置(例如,數據源、用戶、祕鑰等),並且沒有將 Dashboard 和目錄對應起來,即不支持備份 Folder。下面介紹一個比較完美的備份恢復方案,支持全部配置的備份恢復,簡直不要太香。

2. 高級方案

更高級的方案已經有人寫好了,項目地址是:

該備份工具支持如下幾種配置:

  • 目錄
  • Dashboard
  • 數據源
  • Grafana 告警頻道(Alert Channel)
  • 組織(Organization)
  • 用戶(User)

使用方法很簡單,跑個容器就行了嘛,不過做者提供的 Dockerfile 我不是很滿意,本身修改了點內容:

FROM alpine:latest

LABEL maintainer="grafana-backup-tool Docker Maintainers https://fuckcloudnative.io"

ENV ARCHIVE_FILE ""

RUN echo "@edge http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories; \ apk --no-cache add python3 py3-pip py3-cffi py3-cryptography ca-certificates bash git; \ git clone https://github.com/ysde/grafana-backup-tool /opt/grafana-backup-tool; \ cd /opt/grafana-backup-tool; \ pip3 --no-cache-dir install .; \ chown -R 1337:1337 /opt/grafana-backup-tool

WORKDIR /opt/grafana-backup-tool

USER 1337
複製代碼

只有 Dockerfile 不行,還得經過 CI/CD 自動構建並推送到 docker.io。不要問我用什麼,固然是白嫖 GitHub Actionworkflow 內容以下:

#=================================================
# https://github.com/yangchuansheng/docker-image
# Description: Build and push grafana-backup-tool Docker image
# Lisence: MIT
# Author: Ryan
# Blog: https://fuckcloudnative.io
#=================================================

name: Build and push grafana-backup-tool Docker image

# Controls when the action will run. Triggers the workflow on push or pull request
# events but only for the master branch
on:
  push:
    branches: [ master ]
    paths: 
      - 'grafana-backup-tool/Dockerfile'
      - '.github/workflows/grafana-backup-tool.yml'
  pull_request:
    branches: [ master ]
    paths: 
      - 'grafana-backup-tool/Dockerfile'
  #watch:
    #types: started

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  build:
    # The type of runner that the job will run on
    runs-on: ubuntu-latest

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
    # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
    - uses: actions/checkout@v2

    - name: Set up QEMU
      uses: docker/setup-qemu-action@v1

    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v1

    - name: Login to DockerHub
      uses: docker/login-action@v1 
      with:
        username: ${{ secrets.DOCKER_USERNAME }}
        password: ${{ secrets.DOCKER_PASSWORD }}
        
    - name: Login to GitHub Package Registry
      env:
        username: ${{ github.repository_owner }}
        password: ${{ secrets.GHCR_TOKEN }}
      run: echo ${{ env.password }} | docker login ghcr.io -u ${{ env.username }} --password-stdin  

    # Runs a single command using the runners shell
    - name: Build and push Docker images to docker.io and ghcr.io
      uses: docker/build-push-action@v2
      with:
        file: 'grafana-backup-tool/Dockerfile'
        platforms: linux/386,linux/amd64,linux/arm/v6,linux/arm/v7,linux/arm64,linux/ppc64le,linux/s390x
        context: grafana-backup-tool
        push: true
        tags: | yangchuansheng/grafana-backup-tool:latest ghcr.io/yangchuansheng/grafana-backup-tool:latest 
    #- name: Update repo description
      #uses: peter-evans/dockerhub-description@v2
      #env:
        #DOCKERHUB_USERNAME: ${{ secrets.DOCKER_USERNAME }}
        #DOCKERHUB_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
        #DOCKERHUB_REPOSITORY: yangchuansheng/grafana-backup-tool
        #README_FILEPATH: grafana-backup-tool/readme.md
複製代碼

這裏我不打算解釋 workflow 的內容,有點基礎的應該都能看懂,實在不行,之後我會單獨寫文章解釋(又能夠繼續水文了~)。這個 workflow 實現的功能就是自動構建各個 CPU 架構的鏡像,並推送到 docker.ioghcr.io,特麼的真香!

就問爽不爽?

你能夠直接關注個人倉庫:

構建好鏡像後,就能夠直接運行容器來進行備份和恢復操做了。若是你想在集羣內操做,能夠經過 Deployment 或 Job 來實現;若是你想在本地或 k8s 集羣外操做,能夠選擇 docker run,我不反對,你也能夠選擇 docker-compose,這都沒問題。但我要告訴你一個更騷的辦法,能夠騷到讓你沒法自拔。

首先須要在本地或集羣外安裝 Podman,若是操做系統是 Win10,能夠考慮經過 WSL 來安裝;若是操做系統是 Linux,那就不用說了;若是操做系統是 MacOS,請參考個人上篇文章:在 macOS 中使用 Podman

裝好了 Podman 以後,就能夠進行騷操做了,請睜大眼睛。

先編寫一個 Deployment 配置清單(什麼?Deployment?是的,你沒聽錯):

grafana-backup-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-backup
  labels:
    app: grafana-backup
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana-backup
  template:
    metadata:
      labels:
        app: grafana-backup
    spec:
      containers:
      - name: grafana-backup
        image: yangchuansheng/grafana-backup-tool:latest
        imagePullPolicy: IfNotPresent
        command: ["/bin/bash"]
        tty: true
        stdin: true
        env:
        - name: GRAFANA_TOKEN
          value: "eyJr0NkFBeWV1QVpMNjNYWXA3UXNOM2JWMWdZOTB2ZFoiLCJuIjoiYWRtaW4iLCJpZCI6MX0="
        - name: GRAFANA_URL
          value: "http://<grafana_ip>:<grafana_port>"
        - name: GRAFANA_ADMIN_ACCOUNT
          value: "admin"
        - name: GRAFANA_ADMIN_PASSWORD
          value: "admin"
        - name: VERIFY_SSL
          value: "False"
        volumeMounts:
        - mountPath: /opt/grafana-backup-tool
          name: data
      volumes:
      - name: data
        hostPath:
          path: /mnt/manifest/grafana/backup
複製代碼

這裏面的環境變量根據本身的實際狀況修改,必定不要照抄個人!

不要一臉懵逼,我先來解釋一下爲何要準備這個 Deployment 配置清單,由於 Podman 能夠直接經過這個配置清單運行容器,命令以下:

$ podman play kube grafana-backup-deployment.yaml
複製代碼

我第一次見到這個操做的時候也不由連連我艹,這也能夠?確實能夠,不過呢,Podman 只是將其翻譯一下,跑個容器而已,並非真正運行 Deployment,由於它沒有控制器啊,可是,仍是真香!

想象一下,你能夠將 k8s 集羣中的配置清單拿到本地或測試機器直接跑,不再用 k8s 集羣準備一份 yaml,docker-compose 再準備一份 yaml 了,一份 yaml 走天下,服不服?

docker-compose 混到今天這個地步,也是蠻可憐的。

細心的讀者應該能發現上面的配置清單有點奇怪,Dockerfile 也有點奇怪。Dockerfile 中沒有寫 CMDENTRYPOINT,Deployment 中直接將啓動命令設置爲 bash,這是由於在我以前測試的過程當中發現該鏡像啓動的容器有點問題,它會陷入一個循環,備份完了以後又會繼續備份,不斷重複,致使備份目錄下生成了一坨壓縮包。目前還沒找到比較好的解決辦法,只能將容器的啓動命令設置爲 bash,等容器運行後再進入容器進行備份操做:

$ podman pod ls
POD ID        NAME                  STATUS   CREATED        # OF CONTAINERS INFRA ID
728aec216d66  grafana-backup-pod-0  Running  3 minutes ago  2                92aa0824fe7d

$ podman ps
CONTAINER ID  IMAGE                                      COMMAND    CREATED        STATUS            PORTS   NAMES
b523fa8e4819  yangchuansheng/grafana-backup-tool:latest  /bin/bash  3 minutes ago  Up 3 minutes ago          grafana-backup-pod-0-grafana-backup
92aa0824fe7d  k8s.gcr.io/pause:3.2                                  3 minutes ago  Up 3 minutes ago          728aec216d66-infra

$ podman exec -it grafana-backup-pod-0-grafana-backup bash
bash-5.0$ grafana-backup save
...
...
########################################

backup folders at: _OUTPUT_/folders/202012111556
backup datasources at: _OUTPUT_/datasources/202012111556
backup dashboards at: _OUTPUT_/dashboards/202012111556
backup alert_channels at: _OUTPUT_/alert_channels/202012111556
backup organizations at: _OUTPUT_/organizations/202012111556
backup users at: _OUTPUT_/users/202012111556

created archive at: _OUTPUT_/202012111556.tar.gz
複製代碼

默認狀況下會備份全部的組件,你也能夠指定備份的組件:

$ grafana-backup save --components=<folders,dashboards,datasources,alert-channels,organizations,users>
複製代碼

好比,我只想備份 Dashboards 和 Folders:

$ grafana-backup save --components=folders,dashboards
複製代碼

固然,你也能夠所有備份,恢復的時候再選擇本身想恢復的組件:

$ grafana-backup restore --components=folders,dashboards
複製代碼

至此,不再用怕 Dashboard 被改掉或刪除啦。

最後提醒一下,Prometheus Operator 項目中的 Grafana 經過 Provisioning 的方式預導入了一些默認的 Dashboards,這原本沒有什麼問題,但 grafana-backup-tool 工具沒法忽略跳過已經存在的配置,若是恢復的過程當中遇到已經存在的配置,會直接報錯退出。原本這也很好解決,通常狀況下到 Grafana Web 界面中刪除全部的 Dashboard 就行了,但經過 Provisioning 導入的 Dashboard 是沒法刪除的,這就很尷尬了。

在做者修復這個 bug 以前,要想解決這個問題,有兩個辦法:

第一個辦法是在恢復以前將 Grafana Deployment 中關於 Provisioning 的配置所有刪除,就是這些配置:

volumeMounts:
        - mountPath: /etc/grafana/provisioning/datasources
          name: grafana-datasources
          readOnly: false
        - mountPath: /etc/grafana/provisioning/dashboards
          name: grafana-dashboards
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/apiserver
          name: grafana-dashboard-apiserver
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/cluster-total
          name: grafana-dashboard-cluster-total
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/controller-manager
          name: grafana-dashboard-controller-manager
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-cluster
          name: grafana-dashboard-k8s-resources-cluster
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-namespace
          name: grafana-dashboard-k8s-resources-namespace
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-node
          name: grafana-dashboard-k8s-resources-node
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-pod
          name: grafana-dashboard-k8s-resources-pod
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-workload
          name: grafana-dashboard-k8s-resources-workload
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-workloads-namespace
          name: grafana-dashboard-k8s-resources-workloads-namespace
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/kubelet
          name: grafana-dashboard-kubelet
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/namespace-by-pod
          name: grafana-dashboard-namespace-by-pod
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/namespace-by-workload
          name: grafana-dashboard-namespace-by-workload
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/node-cluster-rsrc-use
          name: grafana-dashboard-node-cluster-rsrc-use
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/node-rsrc-use
          name: grafana-dashboard-node-rsrc-use
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/nodes
          name: grafana-dashboard-nodes
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/persistentvolumesusage
          name: grafana-dashboard-persistentvolumesusage
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/pod-total
          name: grafana-dashboard-pod-total
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/prometheus-remote-write
          name: grafana-dashboard-prometheus-remote-write
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/prometheus
          name: grafana-dashboard-prometheus
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/proxy
          name: grafana-dashboard-proxy
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/scheduler
          name: grafana-dashboard-scheduler
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/statefulset
          name: grafana-dashboard-statefulset
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/workload-total
          name: grafana-dashboard-workload-total
          readOnly: false
...
...
      volumes:
      - name: grafana-datasources
        secret:
          secretName: grafana-datasources
      - configMap:
          name: grafana-dashboards
        name: grafana-dashboards
      - configMap:
          name: grafana-dashboard-apiserver
        name: grafana-dashboard-apiserver
      - configMap:
          name: grafana-dashboard-cluster-total
        name: grafana-dashboard-cluster-total
      - configMap:
          name: grafana-dashboard-controller-manager
        name: grafana-dashboard-controller-manager
      - configMap:
          name: grafana-dashboard-k8s-resources-cluster
        name: grafana-dashboard-k8s-resources-cluster
      - configMap:
          name: grafana-dashboard-k8s-resources-namespace
        name: grafana-dashboard-k8s-resources-namespace
      - configMap:
          name: grafana-dashboard-k8s-resources-node
        name: grafana-dashboard-k8s-resources-node
      - configMap:
          name: grafana-dashboard-k8s-resources-pod
        name: grafana-dashboard-k8s-resources-pod
      - configMap:
          name: grafana-dashboard-k8s-resources-workload
        name: grafana-dashboard-k8s-resources-workload
      - configMap:
          name: grafana-dashboard-k8s-resources-workloads-namespace
        name: grafana-dashboard-k8s-resources-workloads-namespace
      - configMap:
          name: grafana-dashboard-kubelet
        name: grafana-dashboard-kubelet
      - configMap:
          name: grafana-dashboard-namespace-by-pod
        name: grafana-dashboard-namespace-by-pod
      - configMap:
          name: grafana-dashboard-namespace-by-workload
        name: grafana-dashboard-namespace-by-workload
      - configMap:
          name: grafana-dashboard-node-cluster-rsrc-use
        name: grafana-dashboard-node-cluster-rsrc-use
      - configMap:
          name: grafana-dashboard-node-rsrc-use
        name: grafana-dashboard-node-rsrc-use
      - configMap:
          name: grafana-dashboard-nodes
        name: grafana-dashboard-nodes
      - configMap:
          name: grafana-dashboard-persistentvolumesusage
        name: grafana-dashboard-persistentvolumesusage
      - configMap:
          name: grafana-dashboard-pod-total
        name: grafana-dashboard-pod-total
      - configMap:
          name: grafana-dashboard-prometheus-remote-write
        name: grafana-dashboard-prometheus-remote-write
      - configMap:
          name: grafana-dashboard-prometheus
        name: grafana-dashboard-prometheus
      - configMap:
          name: grafana-dashboard-proxy
        name: grafana-dashboard-proxy
      - configMap:
          name: grafana-dashboard-scheduler
        name: grafana-dashboard-scheduler
      - configMap:
          name: grafana-dashboard-statefulset
        name: grafana-dashboard-statefulset
      - configMap:
          name: grafana-dashboard-workload-total
        name: grafana-dashboard-workload-total
複製代碼

第二個辦法就是刪除 Prometheus Operator 自帶的 Grafana,本身經過 Helm 或者 manifest 部署不使用 Provisioning 的 Grafana。

若是你既不想刪除 Provisioning 的配置,也不想本身部署 Grafana,那隻能使用上文提到的低級方案了。

相關文章
相關標籤/搜索