Rook 1.5.1 部署Ceph實操經驗分享

時間 2020-11-27

標籤 html node nginx git github web shell json centos api 欄目 HTML 简体版

原文原文鏈接

1、Rook概述

1.1 Rook簡介

Rook 是一個開源的cloud-native storage編排, 提供平臺和框架；爲各類存儲解決方案提供平臺、框架和支持，以便與雲原生環境本地集成。目前主要專用於Cloud-Native環境的文件、塊、對象存儲服務。它實現了一個自我管理的、自我擴容的、自我修復的分佈式存儲服務。html

Rook支持自動部署、啓動、配置、分配（provisioning）、擴容/縮容、升級、遷移、災難恢復、監控，以及資源管理。爲了實現全部這些功能，Rook依賴底層的容器編排平臺，例如 kubernetes、CoreOS 等。。node

Rook 目前支持Ceph、NFS、Minio Object Store、Edegefs、Cassandra、CockroachDB 存儲的搭建。nginx

項目地址：https://github.com/rook/rookgit

網站：https://rook.io/github

web

1.2 Rook組件

Rook的主要組件有三個，功能以下：shell

Rook Operatorjson
- Rook與Kubernetes交互的組件
- 整個Rook集羣只有一個
Agent or Drivercentos
- Flex Driver
已經被淘汰的驅動方式，在安裝以前，請確保k8s集羣版本是否支持CSI，若是不支持，或者不想用CSI，選擇flex.api
- Ceph CSI Driver
默認所有節點安裝，你能夠經過 node affinity 去指定節點
Device discovery

發現新設備是否做爲存儲，能夠在配置文件ROOK_ENABLE_DISCOVERY_DAEMON設置 enable 開啓。

1.3 Rook & Ceph框架

Rook 如何集成在kubernetes 如圖：

使用Rook部署Ceph集羣的架構圖以下：

部署的Ceph系統能夠提供下面三種Volume Claim服務：

Block Storage：目前最穩定；
FileSystem：須要部署MDS，有內核要求；
Object：部署RGW；

2、ROOK 部署

2.1 準備工做

2.1.1 版本要求

kubernetes v.11 以上

2.1.2 存儲要求

rook部署的ceph 是不支持lvm direct直接做爲osd存儲設備的，若是想要使用lvm，可使用pvc的形式實現。方法在後面的ceph安裝會提到

爲了配置 Ceph 存儲集羣，至少須要如下本地存儲選項之一：

原始設備（無分區或格式化的文件系統）
原始分區（無格式文件系統）能夠 lsblk -f查看，若是 FSTYPE不爲空說明有文件系統
可經過 block 模式從存儲類別得到 PV

2.1.3 系統要求

本次安裝環境

kubernetes 1.18
centos7.8
kernel 5.4.65-200.el7.x86_64
calico 3.16

2.1.3.1 須要安裝lvm包

sudo yum install -y lvm2

2.1.3.2 內核要求

RBD

通常發行版的內核都編譯有，但你最好肯定下：

foxchan@~$ lsmod|grep rbd
rbd                   114688  0 
libceph               368640  1 rbd

能夠用如下命令放到開機啓動項裏

cat > /etc/sysconfig/modules/rbd.modules << EOF
modprobe rbd
EOF

CephFS

若是你想使用cephfs,內核最低要求是4.17。

2.2 部署ROOK

Github上下載Rook最新release

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits

安裝公共部分

cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml

安裝operator

kubectl apply -f operator.yaml

若是放到生產環境，請提早規劃好。operator的配置在ceph安裝後不能修改，不然rook會刪除集羣並重建。

修改內容以下：

# 啓用cephfs 
ROOK_CSI_ENABLE_CEPHFS: "true"
# 開啓內核驅動替換ceph-fuse
CSI_FORCE_CEPHFS_KERNEL_CLIENT: "true"
#修改csi鏡像爲私有倉，加速部署時間
ROOK_CSI_CEPH_IMAGE: "harbor.foxchan.com/google_containers/cephcsi/cephcsi:v3.1.2"
ROOK_CSI_REGISTRAR_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-node-driver-registrar:v2.0.1"
ROOK_CSI_RESIZER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-resizer:v1.0.0"
ROOK_CSI_PROVISIONER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-provisioner:v2.0.0"
ROOK_CSI_SNAPSHOTTER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-snapshotter:v3.0.0"
ROOK_CSI_ATTACHER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-attacher:v3.0.0"
# 能夠設置NODE_AFFINITY 來指定csi 部署的節點
# 我把plugin 和 provisioner分開了，具體調度方式看你集羣資源。
CSI_PROVISIONER_NODE_AFFINITY: "app.rook.role=csi-provisioner"
CSI_PLUGIN_NODE_AFFINITY: "app.rook.plugin=csi"
#修改metrics端口，能夠不改，我由於集羣網絡是host，爲了不端口衝突
# Configure CSI CSI Ceph FS grpc and liveness metrics port
CSI_CEPHFS_GRPC_METRICS_PORT: "9491"
CSI_CEPHFS_LIVENESS_METRICS_PORT: "9481"
# Configure CSI RBD grpc and liveness metrics port
CSI_RBD_GRPC_METRICS_PORT: "9490"
CSI_RBD_LIVENESS_METRICS_PORT: "9480"
# 修改rook鏡像，加速部署時間
image: harbor.foxchan.com/google_containers/rook/ceph:v1.5.1
# 指定節點作存儲
        - name: DISCOVER_AGENT_NODE_AFFINITY
          value: "app.rook=storage"
# 開啓設備自動發現
        - name: ROOK_ENABLE_DISCOVERY_DAEMON
          value: "true"

2.3 部署ceph集羣

cluster.yaml文件裏的內容須要修改，必定要適配本身的硬件狀況，請詳細閱讀配置文件裏的註釋，避免我踩過的坑。

修改內容以下：

此文件的配置，除了增刪osd設備外，其餘的修改都要重裝ceph集羣才能生效，因此請提早規劃好集羣。若是修改後不卸載ceph直接apply，會觸發ceph集羣重裝，致使集羣異常掛掉

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
# 命名空間的名字，同一個命名空間只支持一個集羣
  name: rook-ceph
  namespace: rook-ceph
spec:
# ceph版本說明
# v13 is mimic, v14 is nautilus, and v15 is octopus.
  cephVersion:
#修改ceph鏡像，加速部署時間
    image: harbor.foxchan.com/google_containers/ceph/ceph:v15.2.5
# 是否容許不支持的ceph版本
    allowUnsupported: false
#指定rook數據在節點的保存路徑
  dataDirHostPath: /data/rook
# 升級時若是檢查失敗是否繼續
  skipUpgradeChecks: false
# 從1.5開始，mon的數量必須是奇數
  mon:
    count: 3
# 是否容許在單個節點上部署多個mon pod
    allowMultiplePerNode: false
  mgr:
    modules:
    - name: pg_autoscaler
      enabled: true
# 開啓dashboard，禁用ssl，指定端口是7000，你能夠默認https配置。我是爲了ingress配置省事。
  dashboard:
    enabled: true
    port: 7000
    ssl: false
# 開啓prometheusRule
  monitoring:
    enabled: true
# 部署PrometheusRule的命名空間，默認此CR所在命名空間
    rulesNamespace: rook-ceph
# 開啓網絡爲host模式，解決沒法使用cephfs pvc的bug
  network:
    provider: host
# 開啓crash collector，每一個運行了Ceph守護進程的節點上建立crash collector pod
  crashCollector:
    disable: false
# 設置node親緣性，指定節點安裝對應組件
  placement:
    mon:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: ceph-mon
              operator: In
              values:
              - enabled

    osd:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: ceph-osd
              operator: In
              values:
              - enabled

    mgr:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: ceph-mgr
              operator: In
              values:
              - enabled 
# 存儲的設置，默認都是true，意思是會把集羣全部node的設備清空初始化。
  storage: # cluster level storage configuration and selection
    useAllNodes: false     #關閉使用全部Node
    useAllDevices: false   #關閉使用全部設備
    nodes:
    - name: "192.168.1.162"  #指定存儲節點主機
      devices:
      - name: "nvme0n1p1"    #指定磁盤爲nvme0n1p1
    - name: "192.168.1.163"
      devices:
      - name: "nvme0n1p1"
    - name: "192.168.1.164"
      devices:
      - name: "nvme0n1p1"
    - name: "192.168.1.213"
      devices:
      - name: "nvme0n1p1"

更多 cluster 的 CRD 配置參考：

https://github.com/rook/rook/blob/master/Documentation/ceph-cluster-crd.md

執行安裝

kubectl apply -f cluster.yaml
# 須要等一段時間，全部pod都已正常啓動
[foxchan@k8s-master ceph]$ kubectl get pods -n rook-ceph 
NAME                                                      READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-b5tlr                                    3/3     Running     0          19h
csi-cephfsplugin-mjssm                                    3/3     Running     0          19h
csi-cephfsplugin-provisioner-5cf5ffdc76-mhdgz             6/6     Running     0          19h
csi-cephfsplugin-provisioner-5cf5ffdc76-rpdl8             6/6     Running     0          19h
csi-cephfsplugin-qmvkc                                    3/3     Running     0          19h
csi-cephfsplugin-tntzd                                    3/3     Running     0          19h
csi-rbdplugin-4p75p                                       3/3     Running     0          19h
csi-rbdplugin-89mzz                                       3/3     Running     0          19h
csi-rbdplugin-cjcwr                                       3/3     Running     0          19h
csi-rbdplugin-ndjcj                                       3/3     Running     0          19h
csi-rbdplugin-provisioner-658dd9fbc5-fwkmc                6/6     Running     0          19h
csi-rbdplugin-provisioner-658dd9fbc5-tlxd8                6/6     Running     0          19h
prometheus-rook-prometheus-0                              2/2     Running     1          3d17h
rook-ceph-mds-myfs-a-5cbcdc6f9c-7mdsv                     1/1     Running     0          19h
rook-ceph-mds-myfs-b-5f4cc54b87-m6m6f                     1/1     Running     0          19h
rook-ceph-mgr-a-f98d4455b-bwhw7                           1/1     Running     0          20h
rook-ceph-mon-a-5d445d4b8d-lmg67                          1/1     Running     1          20h
rook-ceph-mon-b-769c6fd76f-jrlc8                          1/1     Running     0          20h
rook-ceph-mon-c-6bfd8954f5-tbsnd                          1/1     Running     0          20h
rook-ceph-operator-7d8cc65dc-8wtl8                        1/1     Running     0          20h
rook-ceph-osd-0-c558ff759-bzbgw                           1/1     Running     0          20h
rook-ceph-osd-1-5c97d69d78-dkxbb                          1/1     Running     0          20h
rook-ceph-osd-2-7dddc7fd56-p58mw                          1/1     Running     0          20h
rook-ceph-osd-3-65ff985c7d-9gfgj                          1/1     Running     0          20h
rook-ceph-osd-prepare-192.168.1.213-pw5gr                 0/1     Completed   0          19h
rook-ceph-osd-prepare-192.168.1.162-wtkm8                 0/1     Completed   0          19h
rook-ceph-osd-prepare-192.168.1.163-b86r2                 0/1     Completed   0          19h
rook-ceph-osd-prepare-192.168.1.164-tj79t                 0/1     Completed   0          19h
rook-discover-89v49                                       1/1     Running     0          20h
rook-discover-jdzhn                                       1/1     Running     0          20h
rook-discover-sl9bv                                       1/1     Running     0          20h
rook-discover-wg25w                                       1/1     Running     0          20h

2.4 增刪osd

2.4.1 添加相關label

kubectl label nodes 192.168.1.165 app.rook=storage
kubectl label nodes 192.168.1.165 ceph-osd=enabled

2.4.2 修改cluster.yaml

nodes:
    - name: "192.168.1.162"
      devices:
      - name: "nvme0n1p1" 
    - name: "192.168.1.163"
      devices:
      - name: "nvme0n1p1"
    - name: "192.168.1.164"
      devices:
      - name: "nvme0n1p1"
    - name: "192.168.17.213"
      devices:
      - name: "nvme0n1p1"
  #添加165的磁盤信息 
    - name: "192.168.1.165"
      devices:
      - name: "nvme0n1p1"

2.4.3 apply cluster.yaml

kubectl apply -f cluster.yaml

2.4.4 刪除osd

cluster.yaml去掉相關節點，再apply

2.5 安裝dashboard

這是我本身的traefik ingress，yaml目錄裏有不少dashboard暴露方式，自行選擇

dashboard已經在前述的步驟中包含了，這裏只須要把dashboard service的服務暴露出來。有多種方法，我使用的是ingress的方式來暴露：

apiVersion: traefik.containo.us/v1alpha1
kind: Ingre***oute
metadata:
  name: traefik-ceph-dashboard
  annotations:
    kubernetes.io/ingress.class: traefik-v2.3
spec:
  entryPoints:
    - web
  routes:
  - match: Host(`ceph.foxchan.com`)
    kind: Rule
    services:
    - name: rook-ceph-mgr-dashboard
      namespace: rook-ceph
      port: 7000
    middlewares:
      - name: gs-ipwhitelist

登陸 dashboard 須要安全訪問。Rook 在運行 Rook Ceph 集羣的名稱空間中建立一個默認用戶，admin 並生成一個稱爲的祕密rook-ceph-dashboard-admin-password

要檢索生成的密碼，能夠運行如下命令：

kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

2.6 安裝toolbox

執行下面的命令：

kubectl apply -f toolbox.yaml

成功後，可使用下面的命令來肯定toolbox的pod已經啓動成功：

kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"

而後可使用下面的命令登陸該pod，執行各類ceph命令：

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

好比：

ceph status
ceph osd status
ceph df
rados df

刪除toolbox

kubectl -n rook-ceph delete deploy/rook-ceph-tools

2.7 prometheus監控

監控部署很簡單，利用Prometheus Operator，獨立部署一套prometheus

安裝prometheus operator

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.40.0/bundle.yaml

安裝prometheus

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph/monitoring
kubectl create -f service-monitor.yaml
kubectl create -f prometheus.yaml
kubectl create -f prometheus-service.yaml

默認是nodeport方式暴露

echo "http://$(kubectl -n rook-ceph -o jsonpath={.status.hostIP} get pod prometheus-rook-prometheus-0):30900"

開啓Prometheus Alerts

此操做必須在ceph集羣安裝以前

安裝rbac

kubectl create -f cluster/examples/kubernetes/ceph/monitoring/rbac.yaml

確保cluster.yaml 開啓

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
[...]
spec:
[...]
  monitoring:
    enabled: true
    rulesNamespace: "rook-ceph"
[...]

Grafana Dashboards

Grafana 版本大於等於 7.2.0

推薦一下dashboard

2.8 刪除ceph集羣

刪除ceph集羣前，請先清理相關pod

刪除塊存儲和文件存儲

kubectl delete -n rook-ceph cephblockpool replicapool
kubectl delete storageclass rook-ceph-block
kubectl delete -f csi/cephfs/filesystem.yaml
kubectl delete storageclass csi-cephfs rook-ceph-block

刪除operator和相關crd

kubectl delete -f operator.yaml
kubectl delete -f common.yaml
kubectl delete -f crds.yaml

清除主機上的數據

刪除Ceph集羣后，在以前部署Ceph組件節點的/data/rook/目錄，會遺留下Ceph集羣的配置信息。

若以後再部署新的Ceph集羣，先把以前Ceph集羣的這些信息刪除，否則啓動monitor會失敗；

# cat clean-rook-dir.sh
hosts=(
  192.168.1.213
  192.168.1.162
  192.168.1.163
  192.168.1.164
)

for host in ${hosts[@]} ; do
  ssh $host "rm -rf /data/rook/*"
done

清除device

#!/usr/bin/env bash
DISK="/dev/nvme0n1p1"
# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
# You will have to run this step for all disks.
sgdisk --zap-all $DISK
# hdd 用如下命令
dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
# ssd 用如下命令
blkdiscard $DISK

# These steps only have to be run once on each node
# If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks.
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
# ceph-volume setup can leave ceph-<UUID> directories in /dev (unnecessary clutter)
rm -rf /dev/ceph-*

若是由於某些緣由致使刪除ceph集羣卡主，能夠先執行如下命令，再刪除ceph集羣就不會卡主了

kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge

2.9 rook升級

2.9.1 小版本升級

Rook v1.5.0 to Rook v1.5.1

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits
cd $YOUR_ROOK_REPO/cluster/examples/kubernetes/ceph/
kubectl apply -f common.yaml -f crds.yaml
kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.5.1

2.9.2 跨版本升級

Rook v1.4.x to Rook v1.5.x.

準備

設置環境變量

# Parameterize the environment
export ROOK_SYSTEM_NAMESPACE="rook-ceph"
export ROOK_NAMESPACE="rook-ceph"

升級以前須要保證集羣健康

全部pod 是running

kubectl -n $ROOK_NAMESPACE get pods

經過tool 查看ceph集羣狀態是否正常

TOOLS_POD=$(kubectl -n $ROOK_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}')
kubectl -n $ROOK_NAMESPACE exec -it $TOOLS_POD -- ceph status

cluster:
    id:     194d139f-17e7-4e9c-889d-2426a844c91b
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 25h)
    mgr: a(active, since 5h)
    mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay
    osd: 4 osds: 4 up (since 25h), 4 in (since 25h)

  task status:
    scrub status:
        mds.myfs-a: idle
        mds.myfs-b: idle

  data:
    pools:   4 pools, 97 pgs
    objects: 2.08k objects, 7.6 GiB
    usage:   26 GiB used, 3.3 TiB / 3.3 TiB avail
    pgs:     97 active+clean

  io:
    client:   1.2 KiB/s rd, 2 op/s rd, 0 op/s wr

升級operator

一、升級common和crd

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits
cd rook/cluster/examples/kubernetes/ceph
kubectl apply -f common.yaml -f crds.yaml

二、升級 Ceph CSI versions

能夠修改cm來本身制定鏡像版本，若是是默認的配置，無需修改

kubectl -n rook-ceph get configmap rook-ceph-operator-config

ROOK_CSI_CEPH_IMAGE: "harbor.foxchan.com/google_containers/cephcsi/cephcsi:v3.1.1"
ROOK_CSI_REGISTRAR_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-node-driver-registrar:v2.0.1"
ROOK_CSI_PROVISIONER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-provisioner:v2.0.0"
ROOK_CSI_SNAPSHOTTER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-snapshotter:v3.0.0"
ROOK_CSI_ATTACHER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-attacher:v3.0.0"
ROOK_CSI_RESIZER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-resizer:v1.0.0"

三、升級 Rook Operator

kubectl -n $ROOK_SYSTEM_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.5.1

四、等待集羣升級完畢

watch --exec kubectl -n $ROOK_NAMESPACE get deployments -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'

五、驗證集羣升級完畢

kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq

升級ceph 版本

若是集羣狀態不監控，operator會拒絕升級

一、升級ceph鏡像

NEW_CEPH_IMAGE='ceph/ceph:v15.2.5'
CLUSTER_NAME=rook-ceph  
kubectl -n rook-ceph patch CephCluster rook-ceph --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"$NEW_CEPH_IMAGE\"}}}"

二、觀察pod 升級

watch --exec kubectl -n $ROOK_NAMESPACE get deployments -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}'

三、查看ceph集羣是否正常

kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{"ceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}' | sort | uniq

3、部署塊存儲

3.1 建立pool和StorageClass

# 定義一個塊存儲池
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  # 每一個數據副本必須跨越不一樣的故障域分佈，若是設置爲host，則保證每一個副本在不一樣機器上
  failureDomain: host
  # 副本數量
  replicated:
    size: 3
    # Disallow setting pool with replica 1, this could lead to data loss without recovery.
    # Make sure you're *ABSOLUTELY CERTAIN* that is what you want
    requireSafeReplicaSize: true
    # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
    # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
    #targetSizeRatio: .5
---
# 定義一個StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: rook-ceph-block
# 該SC的Provisioner標識，rook-ceph前綴即當前命名空間
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
    # clusterID 就是集羣所在的命名空間名
    # If you change this namespace, also change the namespace below where the secret namespaces are defined
    clusterID: rook-ceph

    # If you want to use erasure coded pool with RBD, you need to create
    # two pools. one erasure coded and one replicated.
    # You need to specify the replicated pool here in the `pool` parameter, it is
    # used for the metadata of the images.
    # The erasure coded pool must be set as the `dataPool` parameter below.
    #dataPool: ec-data-pool
    # RBD鏡像在哪一個池中建立
    pool: replicapool

    # RBD image format. Defaults to "2".
    imageFormat: "2"

    # 指定image特性，CSI RBD目前僅僅支持layering
    imageFeatures: layering

    # Ceph admin 管理憑證配置,由operator 自動生成
    # in the same namespace as the cluster.
    csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
    csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
    csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
    csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
    # 卷的文件系統類型，默認ext4，不建議xfs，由於存在潛在的死鎖問題（超融合設置下卷被掛載到相同節點做爲OSD時）
    csi.storage.k8s.io/fstype: ext4
# uncomment the following to use rbd-nbd as mounter on supported nodes
# **IMPORTANT**: If you are using rbd-nbd as the mounter, during upgrade you will be hit a ceph-csi
# issue that causes the mount to be disconnected. You will need to follow special upgrade steps
# to restart your application pods. Therefore, this option is not recommended.
#mounter: rbd-nbd
allowVolumeExpansion: true
reclaimPolicy: Delete

3.2 demo示例

推薦pvc 和應用寫到一個yaml裏面

#建立pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-demo-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-block
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: csirbd-demo-pod
  labels:
    test-cephrbd: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      test-cephrbd: "true"
  template:
    metadata:
      labels:
        test-cephrbd: "true"
    spec:
      containers:
       - name: web-server-rbd
         image: harbor.foxchan.com/sys/nginx:1.19.4-alpine
         volumeMounts:
           - name: mypvc
             mountPath: /usr/share/nginx/html
      volumes:
       - name: mypvc
         persistentVolumeClaim:
           claimName: rbd-demo-pvc
           readOnly: false

4、部署文件系統

4.1 建立CephFS

CephFS的CSI驅動使用Quotas來強制應用PVC聲明的大小，僅僅4.17+內核才能支持CephFS quotas。

若是內核不支持，並且你須要配額管理，配置Operator環境變量 CSI_FORCE_CEPHFS_KERNEL_CLIENT: false來啓用FUSE客戶端。

使用FUSE客戶端時，升級Ceph集羣時應用Pod會斷開mount，須要重啓才能再次使用PV。

apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: myfs
  namespace: rook-ceph
spec:
  # The metadata pool spec. Must use replication.
  metadataPool:
    replicated:
      size: 3
      requireSafeReplicaSize: true
    parameters:
      # Inline compression mode for the data pool
      # Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression
      compression_mode: none
        # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
      # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
      #target_size_ratio: ".5"
  # The list of data pool specs. Can use replication or erasure coding.
  dataPools:
    - failureDomain: host
      replicated:
        size: 3
        # Disallow setting pool with replica 1, this could lead to data loss without recovery.
        # Make sure you're *ABSOLUTELY CERTAIN* that is what you want
        requireSafeReplicaSize: true
      parameters:
        # Inline compression mode for the data pool
        # Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression
        compression_mode: none
          # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
        # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
        #target_size_ratio: ".5"
  # Whether to preserve filesystem after CephFilesystem CRD deletion
  preserveFilesystemOnDelete: true
  # The metadata service (mds) configuration
  metadataServer:
    # The number of active MDS instances
    activeCount: 1
    # Whether each active MDS instance will have an active standby with a warm metadata cache for faster failover.
    # If false, standbys will be available, but will not have a warm cache.
    activeStandby: true
    # The affinity rules to apply to the mds deployment
    placement:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: app.storage
              operator: In
              values:
              - rook-ceph
    #  topologySpreadConstraints:
    #  tolerations:
    #  - key: mds-node
    #    operator: Exists
    #  podAffinity:
      podAntiAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
         - labelSelector:
             matchExpressions:
             - key: ceph-mds
               operator: In
               values:
               - enabled
            # topologyKey: kubernetes.io/hostname will place MDS across different hosts
           topologyKey: kubernetes.io/hostname
         preferredDuringSchedulingIgnoredDuringExecution:
         - weight: 100
           podAffinityTerm:
             labelSelector:
               matchExpressions:
               - key: ceph-mds
                 operator: In
                 values:
                  - enabled
              # topologyKey: */zone can be used to spread MDS across different AZ
              # Use <topologyKey: failure-domain.beta.kubernetes.io/zone> in k8s cluster if your cluster is v1.16 or lower
              # Use <topologyKey: topology.kubernetes.io/zone>  in k8s cluster is v1.17 or upper
             topologyKey: topology.kubernetes.io/zone
    # A key/value list of annotations
    annotations:
    #  key: value
    # A key/value list of labels
    labels:
    #  key: value
    resources:
    # The requests and limits set here, allow the filesystem MDS Pod(s) to use half of one CPU core and 1 gigabyte of memory
    #  limits:
    #    cpu: "500m"
    #    memory: "1024Mi"
    #  requests:
    #    cpu: "500m"
    #    memory: "1024Mi"
    # priorityClassName: my-priority-class

4.2 建立StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
  # clusterID is the namespace where operator is deployed.
  clusterID: rook-ceph

  # CephFS filesystem name into which the volume shall be created
  fsName: myfs

  # Ceph pool into which the volume shall be created
  # Required for provisionVolume: "true"
  pool: myfs-data0

  # Root path of an existing CephFS volume
  # Required for provisionVolume: "false"
  # rootPath: /absolute/path

  # The secrets contain Ceph admin credentials. These are generated automatically by the operator
  # in the same namespace as the cluster.
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

  # (optional) The driver can use either ceph-fuse (fuse) or ceph kernel client (kernel)
  # If omitted, default volume mounter will be used - this is determined by probing for ceph-fuse
  # or by setting the default mounter explicitly via --volumemounter command-line argument.
  #使用kernel client
  mounter: kernel
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
  # uncomment the following line for debugging
  #- debug

4.3 建立pvc

在建立cephfs 的pvc 發現一直處於pending狀態，社區有人認爲是網絡組件的差別，目前個人calico沒法成功，只能改成host模式，flannel能夠。

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-cephfs

4.4 demo示例

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-demo-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-cephfs
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: csicephfs-demo-pod
  labels:
    test-cephfs: "true"
spec:
  replicas: 2
  selector:
    matchLabels:
      test-cephfs: "true"
  template:
    metadata:
      labels:
        test-cephfs: "true"
    spec:
      containers:
      - name: web-server
        image: harbor.foxchan.com/sys/nginx:1.19.4-alpine
        imagePullPolicy: Always
        volumeMounts:
        - name: mypvc
          mountPath: /usr/share/nginx/html
      volumes:
      - name: mypvc
        persistentVolumeClaim:
          claimName: cephfs-demo-pvc
          readOnly: false