Rook 是一個開源的cloud-native storage編排, 提供平臺和框架;爲各類存儲解決方案提供平臺、框架和支持,以便與雲原生環境本地集成。目前主要專用於Cloud-Native環境的文件、塊、對象存儲服務。它實現了一個自我管理的、自我擴容的、自我修復的分佈式存儲服務。html
Rook支持自動部署、啓動、配置、分配(provisioning)、擴容/縮容、升級、遷移、災難恢復、監控,以及資源管理。爲了實現全部這些功能,Rook依賴底層的容器編排平臺,例如 kubernetes、CoreOS 等。。node
Rook 目前支持Ceph、NFS、Minio Object Store、Edegefs、Cassandra、CockroachDB 存儲的搭建。nginx
項目地址:https://github.com/rook/rookgit
網站:https://rook.io/github
<!--more-->web
Rook的主要組件有三個,功能以下:shell
Rook Operatorjson
Agent or Drivercentos
已經被淘汰的驅動方式,在安裝以前,請確保k8s集羣版本是否支持CSI,若是不支持,或者不想用CSI,選擇flex.api
默認所有節點安裝,你能夠經過 node affinity 去指定節點
Device discovery
發現新設備是否做爲存儲,能夠在配置文件ROOK_ENABLE_DISCOVERY_DAEMON
設置 enable 開啓。
Rook 如何集成在kubernetes 如圖:
使用Rook部署Ceph集羣的架構圖以下:
部署的Ceph系統能夠提供下面三種Volume Claim
服務:
kubernetes v.11 以上
rook部署的ceph 是不支持lvm direct直接做爲osd存儲設備的,若是想要使用lvm,可使用pvc的形式實現。方法在後面的ceph安裝會提到
爲了配置 Ceph 存儲集羣,至少須要如下本地存儲選項之一:
本次安裝環境
sudo yum install -y lvm2
RBD
通常發行版的內核都編譯有,但你最好肯定下:
foxchan@~$ lsmod|grep rbd rbd 114688 0 libceph 368640 1 rbd
能夠用如下命令放到開機啓動項裏
cat > /etc/sysconfig/modules/rbd.modules << EOF modprobe rbd EOF
CephFS
若是你想使用cephfs,內核最低要求是4.17。
Github上下載Rook最新release
git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits
安裝公共部分
cd rook/cluster/examples/kubernetes/ceph kubectl create -f crds.yaml -f common.yaml
安裝operator
kubectl apply -f operator.yaml
若是放到生產環境,請提早規劃好。operator的配置在ceph安裝後不能修改,不然rook會刪除集羣並重建。
修改內容以下:
# 啓用cephfs ROOK_CSI_ENABLE_CEPHFS: "true" # 開啓內核驅動替換ceph-fuse CSI_FORCE_CEPHFS_KERNEL_CLIENT: "true" #修改csi鏡像爲私有倉,加速部署時間 ROOK_CSI_CEPH_IMAGE: "harbor.foxchan.com/google_containers/cephcsi/cephcsi:v3.1.2" ROOK_CSI_REGISTRAR_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-node-driver-registrar:v2.0.1" ROOK_CSI_RESIZER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-resizer:v1.0.0" ROOK_CSI_PROVISIONER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-provisioner:v2.0.0" ROOK_CSI_SNAPSHOTTER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-snapshotter:v3.0.0" ROOK_CSI_ATTACHER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-attacher:v3.0.0" # 能夠設置NODE_AFFINITY 來指定csi 部署的節點 # 我把plugin 和 provisioner分開了,具體調度方式看你集羣資源。 CSI_PROVISIONER_NODE_AFFINITY: "app.rook.role=csi-provisioner" CSI_PLUGIN_NODE_AFFINITY: "app.rook.plugin=csi" #修改metrics端口,能夠不改,我由於集羣網絡是host,爲了不端口衝突 # Configure CSI CSI Ceph FS grpc and liveness metrics port CSI_CEPHFS_GRPC_METRICS_PORT: "9491" CSI_CEPHFS_LIVENESS_METRICS_PORT: "9481" # Configure CSI RBD grpc and liveness metrics port CSI_RBD_GRPC_METRICS_PORT: "9490" CSI_RBD_LIVENESS_METRICS_PORT: "9480" # 修改rook鏡像,加速部署時間 image: harbor.foxchan.com/google_containers/rook/ceph:v1.5.1 # 指定節點作存儲 - name: DISCOVER_AGENT_NODE_AFFINITY value: "app.rook=storage" # 開啓設備自動發現 - name: ROOK_ENABLE_DISCOVERY_DAEMON value: "true"
cluster.yaml文件裏的內容須要修改,必定要適配本身的硬件狀況,請詳細閱讀配置文件裏的註釋,避免我踩過的坑。
修改內容以下:
此文件的配置,除了增刪osd設備外,其餘的修改都要重裝ceph集羣才能生效,因此請提早規劃好集羣。若是修改後不卸載ceph直接apply,會觸發ceph集羣重裝,致使集羣異常掛掉
apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: # 命名空間的名字,同一個命名空間只支持一個集羣 name: rook-ceph namespace: rook-ceph spec: # ceph版本說明 # v13 is mimic, v14 is nautilus, and v15 is octopus. cephVersion: #修改ceph鏡像,加速部署時間 image: harbor.foxchan.com/google_containers/ceph/ceph:v15.2.5 # 是否容許不支持的ceph版本 allowUnsupported: false #指定rook數據在節點的保存路徑 dataDirHostPath: /data/rook # 升級時若是檢查失敗是否繼續 skipUpgradeChecks: false # 從1.5開始,mon的數量必須是奇數 mon: count: 3 # 是否容許在單個節點上部署多個mon pod allowMultiplePerNode: false mgr: modules: - name: pg_autoscaler enabled: true # 開啓dashboard,禁用ssl,指定端口是7000,你能夠默認https配置。我是爲了ingress配置省事。 dashboard: enabled: true port: 7000 ssl: false # 開啓prometheusRule monitoring: enabled: true # 部署PrometheusRule的命名空間,默認此CR所在命名空間 rulesNamespace: rook-ceph # 開啓網絡爲host模式,解決沒法使用cephfs pvc的bug network: provider: host # 開啓crash collector,每一個運行了Ceph守護進程的節點上建立crash collector pod crashCollector: disable: false # 設置node親緣性,指定節點安裝對應組件 placement: mon: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ceph-mon operator: In values: - enabled osd: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ceph-osd operator: In values: - enabled mgr: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ceph-mgr operator: In values: - enabled # 存儲的設置,默認都是true,意思是會把集羣全部node的設備清空初始化。 storage: # cluster level storage configuration and selection useAllNodes: false #關閉使用全部Node useAllDevices: false #關閉使用全部設備 nodes: - name: "192.168.1.162" #指定存儲節點主機 devices: - name: "nvme0n1p1" #指定磁盤爲nvme0n1p1 - name: "192.168.1.163" devices: - name: "nvme0n1p1" - name: "192.168.1.164" devices: - name: "nvme0n1p1" - name: "192.168.1.213" devices: - name: "nvme0n1p1"
更多 cluster 的 CRD 配置參考:
執行安裝
kubectl apply -f cluster.yaml # 須要等一段時間,全部pod都已正常啓動 [foxchan@k8s-master ceph]$ kubectl get pods -n rook-ceph NAME READY STATUS RESTARTS AGE csi-cephfsplugin-b5tlr 3/3 Running 0 19h csi-cephfsplugin-mjssm 3/3 Running 0 19h csi-cephfsplugin-provisioner-5cf5ffdc76-mhdgz 6/6 Running 0 19h csi-cephfsplugin-provisioner-5cf5ffdc76-rpdl8 6/6 Running 0 19h csi-cephfsplugin-qmvkc 3/3 Running 0 19h csi-cephfsplugin-tntzd 3/3 Running 0 19h csi-rbdplugin-4p75p 3/3 Running 0 19h csi-rbdplugin-89mzz 3/3 Running 0 19h csi-rbdplugin-cjcwr 3/3 Running 0 19h csi-rbdplugin-ndjcj 3/3 Running 0 19h csi-rbdplugin-provisioner-658dd9fbc5-fwkmc 6/6 Running 0 19h csi-rbdplugin-provisioner-658dd9fbc5-tlxd8 6/6 Running 0 19h prometheus-rook-prometheus-0 2/2 Running 1 3d17h rook-ceph-mds-myfs-a-5cbcdc6f9c-7mdsv 1/1 Running 0 19h rook-ceph-mds-myfs-b-5f4cc54b87-m6m6f 1/1 Running 0 19h rook-ceph-mgr-a-f98d4455b-bwhw7 1/1 Running 0 20h rook-ceph-mon-a-5d445d4b8d-lmg67 1/1 Running 1 20h rook-ceph-mon-b-769c6fd76f-jrlc8 1/1 Running 0 20h rook-ceph-mon-c-6bfd8954f5-tbsnd 1/1 Running 0 20h rook-ceph-operator-7d8cc65dc-8wtl8 1/1 Running 0 20h rook-ceph-osd-0-c558ff759-bzbgw 1/1 Running 0 20h rook-ceph-osd-1-5c97d69d78-dkxbb 1/1 Running 0 20h rook-ceph-osd-2-7dddc7fd56-p58mw 1/1 Running 0 20h rook-ceph-osd-3-65ff985c7d-9gfgj 1/1 Running 0 20h rook-ceph-osd-prepare-192.168.1.213-pw5gr 0/1 Completed 0 19h rook-ceph-osd-prepare-192.168.1.162-wtkm8 0/1 Completed 0 19h rook-ceph-osd-prepare-192.168.1.163-b86r2 0/1 Completed 0 19h rook-ceph-osd-prepare-192.168.1.164-tj79t 0/1 Completed 0 19h rook-discover-89v49 1/1 Running 0 20h rook-discover-jdzhn 1/1 Running 0 20h rook-discover-sl9bv 1/1 Running 0 20h rook-discover-wg25w 1/1 Running 0 20h
kubectl label nodes 192.168.1.165 app.rook=storage kubectl label nodes 192.168.1.165 ceph-osd=enabled
nodes: - name: "192.168.1.162" devices: - name: "nvme0n1p1" - name: "192.168.1.163" devices: - name: "nvme0n1p1" - name: "192.168.1.164" devices: - name: "nvme0n1p1" - name: "192.168.17.213" devices: - name: "nvme0n1p1" #添加165的磁盤信息 - name: "192.168.1.165" devices: - name: "nvme0n1p1"
kubectl apply -f cluster.yaml
cluster.yaml去掉相關節點,再apply
這是我本身的traefik ingress,yaml目錄裏有不少dashboard暴露方式,自行選擇
dashboard已經在前述的步驟中包含了,這裏只須要把dashboard service的服務暴露出來。有多種方法,我使用的是ingress的方式來暴露:
apiVersion: traefik.containo.us/v1alpha1 kind: Ingre***oute metadata: name: traefik-ceph-dashboard annotations: kubernetes.io/ingress.class: traefik-v2.3 spec: entryPoints: - web routes: - match: Host(`ceph.foxchan.com`) kind: Rule services: - name: rook-ceph-mgr-dashboard namespace: rook-ceph port: 7000 middlewares: - name: gs-ipwhitelist
登陸 dashboard 須要安全訪問。Rook 在運行 Rook Ceph 集羣的名稱空間中建立一個默認用戶,admin 並生成一個稱爲的祕密rook-ceph-dashboard-admin-password
要檢索生成的密碼,能夠運行如下命令:
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
執行下面的命令:
kubectl apply -f toolbox.yaml
成功後,可使用下面的命令來肯定toolbox的pod已經啓動成功:
kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"
而後可使用下面的命令登陸該pod,執行各類ceph命令:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
好比:
ceph status
ceph osd status
ceph df
rados df
刪除toolbox
kubectl -n rook-ceph delete deploy/rook-ceph-tools
監控部署很簡單,利用Prometheus Operator,獨立部署一套prometheus
安裝prometheus operator
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.40.0/bundle.yaml
安裝prometheus
git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.git cd rook/cluster/examples/kubernetes/ceph/monitoring kubectl create -f service-monitor.yaml kubectl create -f prometheus.yaml kubectl create -f prometheus-service.yaml
默認是nodeport方式暴露
echo "http://$(kubectl -n rook-ceph -o jsonpath={.status.hostIP} get pod prometheus-rook-prometheus-0):30900"
開啓Prometheus Alerts
此操做必須在ceph集羣安裝以前
安裝rbac
kubectl create -f cluster/examples/kubernetes/ceph/monitoring/rbac.yaml
確保cluster.yaml 開啓
apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: name: rook-ceph namespace: rook-ceph [...] spec: [...] monitoring: enabled: true rulesNamespace: "rook-ceph" [...]
Grafana Dashboards
Grafana 版本大於等於 7.2.0
推薦一下dashboard
刪除ceph集羣前,請先清理相關pod
刪除塊存儲和文件存儲
kubectl delete -n rook-ceph cephblockpool replicapool kubectl delete storageclass rook-ceph-block kubectl delete -f csi/cephfs/filesystem.yaml kubectl delete storageclass csi-cephfs rook-ceph-block
刪除operator和相關crd
kubectl delete -f operator.yaml kubectl delete -f common.yaml kubectl delete -f crds.yaml
清除主機上的數據
刪除Ceph集羣后,在以前部署Ceph組件節點的/data/rook/目錄,會遺留下Ceph集羣的配置信息。
若以後再部署新的Ceph集羣,先把以前Ceph集羣的這些信息刪除,否則啓動monitor會失敗;
# cat clean-rook-dir.sh hosts=( 192.168.1.213 192.168.1.162 192.168.1.163 192.168.1.164 ) for host in ${hosts[@]} ; do ssh $host "rm -rf /data/rook/*" done
清除device
#!/usr/bin/env bash DISK="/dev/nvme0n1p1" # Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean) # You will have to run this step for all disks. sgdisk --zap-all $DISK # hdd 用如下命令 dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync # ssd 用如下命令 blkdiscard $DISK # These steps only have to be run once on each node # If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks. ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove % # ceph-volume setup can leave ceph-<UUID> directories in /dev (unnecessary clutter) rm -rf /dev/ceph-*
若是由於某些緣由致使刪除ceph集羣卡主,能夠先執行如下命令, 再刪除ceph集羣就不會卡主了
kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge
Rook v1.5.0 to Rook v1.5.1
git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits cd $YOUR_ROOK_REPO/cluster/examples/kubernetes/ceph/ kubectl apply -f common.yaml -f crds.yaml kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.5.1
Rook v1.4.x to Rook v1.5.x.
設置環境變量
# Parameterize the environment export ROOK_SYSTEM_NAMESPACE="rook-ceph" export ROOK_NAMESPACE="rook-ceph"
升級以前須要保證集羣健康
全部pod 是running
kubectl -n $ROOK_NAMESPACE get pods
經過tool 查看ceph集羣狀態是否正常
TOOLS_POD=$(kubectl -n $ROOK_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') kubectl -n $ROOK_NAMESPACE exec -it $TOOLS_POD -- ceph status
cluster: id: 194d139f-17e7-4e9c-889d-2426a844c91b health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 25h) mgr: a(active, since 5h) mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay osd: 4 osds: 4 up (since 25h), 4 in (since 25h) task status: scrub status: mds.myfs-a: idle mds.myfs-b: idle data: pools: 4 pools, 97 pgs objects: 2.08k objects, 7.6 GiB usage: 26 GiB used, 3.3 TiB / 3.3 TiB avail pgs: 97 active+clean io: client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr
一、 升級common和crd
git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits cd rook/cluster/examples/kubernetes/ceph kubectl apply -f common.yaml -f crds.yaml
二、升級 Ceph CSI versions
能夠修改cm來本身制定鏡像版本,若是是默認的配置,無需修改
kubectl -n rook-ceph get configmap rook-ceph-operator-config ROOK_CSI_CEPH_IMAGE: "harbor.foxchan.com/google_containers/cephcsi/cephcsi:v3.1.1" ROOK_CSI_REGISTRAR_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-node-driver-registrar:v2.0.1" ROOK_CSI_PROVISIONER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-provisioner:v2.0.0" ROOK_CSI_SNAPSHOTTER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-snapshotter:v3.0.0" ROOK_CSI_ATTACHER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-attacher:v3.0.0" ROOK_CSI_RESIZER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-resizer:v1.0.0"
三、升級 Rook Operator
kubectl -n $ROOK_SYSTEM_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.5.1
四、等待集羣 升級完畢
watch --exec kubectl -n $ROOK_NAMESPACE get deployments -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
五、驗證集羣升級完畢
kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
若是集羣狀態不監控,operator會拒絕升級
一、升級ceph鏡像
NEW_CEPH_IMAGE='ceph/ceph:v15.2.5' CLUSTER_NAME=rook-ceph kubectl -n rook-ceph patch CephCluster rook-ceph --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"$NEW_CEPH_IMAGE\"}}}"
二、觀察pod 升級
watch --exec kubectl -n $ROOK_NAMESPACE get deployments -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}'
三、查看ceph集羣是否正常
kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{"ceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}' | sort | uniq
# 定義一個塊存儲池 apiVersion: ceph.rook.io/v1 kind: CephBlockPool metadata: name: replicapool namespace: rook-ceph spec: # 每一個數據副本必須跨越不一樣的故障域分佈,若是設置爲host,則保證每一個副本在不一樣機器上 failureDomain: host # 副本數量 replicated: size: 3 # Disallow setting pool with replica 1, this could lead to data loss without recovery. # Make sure you're *ABSOLUTELY CERTAIN* that is what you want requireSafeReplicaSize: true # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size #targetSizeRatio: .5 --- # 定義一個StorageClass apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rook-ceph-block # 該SC的Provisioner標識,rook-ceph前綴即當前命名空間 provisioner: rook-ceph.rbd.csi.ceph.com parameters: # clusterID 就是集羣所在的命名空間名 # If you change this namespace, also change the namespace below where the secret namespaces are defined clusterID: rook-ceph # If you want to use erasure coded pool with RBD, you need to create # two pools. one erasure coded and one replicated. # You need to specify the replicated pool here in the `pool` parameter, it is # used for the metadata of the images. # The erasure coded pool must be set as the `dataPool` parameter below. #dataPool: ec-data-pool # RBD鏡像在哪一個池中建立 pool: replicapool # RBD image format. Defaults to "2". imageFormat: "2" # 指定image特性,CSI RBD目前僅僅支持layering imageFeatures: layering # Ceph admin 管理憑證配置,由operator 自動生成 # in the same namespace as the cluster. csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # 卷的文件系統類型,默認ext4,不建議xfs,由於存在潛在的死鎖問題(超融合設置下卷被掛載到相同節點做爲OSD時) csi.storage.k8s.io/fstype: ext4 # uncomment the following to use rbd-nbd as mounter on supported nodes # **IMPORTANT**: If you are using rbd-nbd as the mounter, during upgrade you will be hit a ceph-csi # issue that causes the mount to be disconnected. You will need to follow special upgrade steps # to restart your application pods. Therefore, this option is not recommended. #mounter: rbd-nbd allowVolumeExpansion: true reclaimPolicy: Delete
推薦pvc 和應用寫到一個yaml裏面
#建立pvc apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rbd-demo-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: rook-ceph-block --- apiVersion: apps/v1 kind: Deployment metadata: name: csirbd-demo-pod labels: test-cephrbd: "true" spec: replicas: 1 selector: matchLabels: test-cephrbd: "true" template: metadata: labels: test-cephrbd: "true" spec: containers: - name: web-server-rbd image: harbor.foxchan.com/sys/nginx:1.19.4-alpine volumeMounts: - name: mypvc mountPath: /usr/share/nginx/html volumes: - name: mypvc persistentVolumeClaim: claimName: rbd-demo-pvc readOnly: false
CephFS的CSI驅動使用Quotas來強制應用PVC聲明的大小,僅僅4.17+內核才能支持CephFS quotas。
若是內核不支持,並且你須要配額管理,配置Operator環境變量 CSI_FORCE_CEPHFS_KERNEL_CLIENT: false來啓用FUSE客戶端。
使用FUSE客戶端時,升級Ceph集羣時應用Pod會斷開mount,須要重啓才能再次使用PV。
apiVersion: ceph.rook.io/v1 kind: CephFilesystem metadata: name: myfs namespace: rook-ceph spec: # The metadata pool spec. Must use replication. metadataPool: replicated: size: 3 requireSafeReplicaSize: true parameters: # Inline compression mode for the data pool # Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression compression_mode: none # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size #target_size_ratio: ".5" # The list of data pool specs. Can use replication or erasure coding. dataPools: - failureDomain: host replicated: size: 3 # Disallow setting pool with replica 1, this could lead to data loss without recovery. # Make sure you're *ABSOLUTELY CERTAIN* that is what you want requireSafeReplicaSize: true parameters: # Inline compression mode for the data pool # Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression compression_mode: none # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size #target_size_ratio: ".5" # Whether to preserve filesystem after CephFilesystem CRD deletion preserveFilesystemOnDelete: true # The metadata service (mds) configuration metadataServer: # The number of active MDS instances activeCount: 1 # Whether each active MDS instance will have an active standby with a warm metadata cache for faster failover. # If false, standbys will be available, but will not have a warm cache. activeStandby: true # The affinity rules to apply to the mds deployment placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: app.storage operator: In values: - rook-ceph # topologySpreadConstraints: # tolerations: # - key: mds-node # operator: Exists # podAffinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: ceph-mds operator: In values: - enabled # topologyKey: kubernetes.io/hostname will place MDS across different hosts topologyKey: kubernetes.io/hostname preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: ceph-mds operator: In values: - enabled # topologyKey: */zone can be used to spread MDS across different AZ # Use <topologyKey: failure-domain.beta.kubernetes.io/zone> in k8s cluster if your cluster is v1.16 or lower # Use <topologyKey: topology.kubernetes.io/zone> in k8s cluster is v1.17 or upper topologyKey: topology.kubernetes.io/zone # A key/value list of annotations annotations: # key: value # A key/value list of labels labels: # key: value resources: # The requests and limits set here, allow the filesystem MDS Pod(s) to use half of one CPU core and 1 gigabyte of memory # limits: # cpu: "500m" # memory: "1024Mi" # requests: # cpu: "500m" # memory: "1024Mi" # priorityClassName: my-priority-class
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rook-cephfs provisioner: rook-ceph.cephfs.csi.ceph.com parameters: # clusterID is the namespace where operator is deployed. clusterID: rook-ceph # CephFS filesystem name into which the volume shall be created fsName: myfs # Ceph pool into which the volume shall be created # Required for provisionVolume: "true" pool: myfs-data0 # Root path of an existing CephFS volume # Required for provisionVolume: "false" # rootPath: /absolute/path # The secrets contain Ceph admin credentials. These are generated automatically by the operator # in the same namespace as the cluster. csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # (optional) The driver can use either ceph-fuse (fuse) or ceph kernel client (kernel) # If omitted, default volume mounter will be used - this is determined by probing for ceph-fuse # or by setting the default mounter explicitly via --volumemounter command-line argument. #使用kernel client mounter: kernel reclaimPolicy: Delete allowVolumeExpansion: true mountOptions: # uncomment the following line for debugging #- debug
在建立cephfs 的pvc 發現一直處於pending狀態,社區有人認爲是網絡組件的差別,目前個人calico沒法成功,只能改成host模式,flannel能夠。
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cephfs-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 1Gi storageClassName: rook-cephfs
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cephfs-demo-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 1Gi storageClassName: rook-cephfs --- apiVersion: apps/v1 kind: Deployment metadata: name: csicephfs-demo-pod labels: test-cephfs: "true" spec: replicas: 2 selector: matchLabels: test-cephfs: "true" template: metadata: labels: test-cephfs: "true" spec: containers: - name: web-server image: harbor.foxchan.com/sys/nginx:1.19.4-alpine imagePullPolicy: Always volumeMounts: - name: mypvc mountPath: /usr/share/nginx/html volumes: - name: mypvc persistentVolumeClaim: claimName: cephfs-demo-pvc readOnly: false
官方issue
https://github.com/rook/rook/issues/5751
https://github.com/rook/rook/issues/2047
解決方式:
能夠手動建立本地pvc,把lvm掛載上在作osd設備。若是手動閒麻煩,可使用local-path-provisioner
官方issue
https://github.com/rook/rook/issues/6183
https://github.com/rook/rook/issues/4006
解決方式:
更換k8s 網絡組件,或者把ceph集羣網絡開啓host