CephFS是個與POSIX標準兼容的文件系統,坐於基於對象的 Ceph 存儲集羣之上,其內的文件被映射到 Ceph 存儲集羣內的對象。客戶端能夠把此文件系統掛載在內核對象或用戶空間文件系統( FUSE )上。文件目錄和其餘元數據存儲在ceph的RADOS中,而MDS緩存元信息和文件目錄信息。
主要特色以下:node
cephfs架構圖以下:python
1,多client掛載
ceph rbd支持同一個node下多pod的掛載;
cephfs支持跨node的多pod的掛載,實現共享;
2,性能
ceph rbd讀取寫入延遲低,I/O帶寬表現良好;
cephfs讀取延遲低,寫入延遲差一點,I/O帶寬表現良好,尤爲是block size較大一些的文件;存儲空間使用過大可能出現性能瓶頸;
3,配額管理
ceph rbd支持;
cephfs有條件支持;linux
須要提早部署好ceph集羣,見:https://blog.51cto.com/leejia/2499684git
cephfs須要至少一個mds(Ceph Metadata Server)服務用來存放cepfs服務依賴元數據信息,有條件的能夠建立2個會自動成爲主備。在ceph1上建立mds服務:github
# ceph-deploy mds create ceph1 ceph2
一個cephfs須要至少兩個RADOS存儲池,一個用於數據、一個用於元數據。配置這些存儲池時需考慮:vim
ceph osd pool create cephfs-data 128 128 ceph osd pool create cephfs-metadata 128 128 ceph fs new cephfs cephfs-metadata cephfs-data
建立完成以後,查看mds和fs的狀態:後端
# ceph mds stat e6: 1/1/1 up {0=ceph2=up:active}, 1 up:standby # ceph fs ls name: cephfs, metadata pool: cephfs-metadata, data pools: [cephfs-data ]
在ceph1上建立一個能訪問cephfs的用戶api
# ceph auth get-or-create client.cephfs mon "allow r" mds "allow rw" osd "allow rw pool=cephfs-data, allow rw pool=cephfs-metadata" [client.cephfs] key = AQD1LfVffTlMHBAAove2EgMyJ8flMNYZG9VbTA== # ceph auth get client.cephfs exported keyring for client.cephfs [client.cephfs] key = AQD1LfVffTlMHBAAove2EgMyJ8flMNYZG9VbTA== caps mds = "allow rw" caps mon = "allow r" caps osd = "allow rw pool=cephfs-data, allow rw pool=cephfs-metadata"
把cephfs用戶對應的key訪問如文件,並使用mount命令來掛載。卸載的話,經過umount命令直接卸載便可:緩存
# echo "AQD1LfVffTlMHBAAove2EgMyJ8flMNYZG9VbTA==" >> /tmp/lee.secret # mount -t ceph 172.18.2.172:6789,172.18.2.178:6789,172.18.2.189:6789:/ /mnt -o name=admin,secretfile=/tmp/lee.secret # df -h|grep mnt 172.18.2.172:6789,172.18.2.178:6789,172.18.2.189:6789:/ 586G 98G 488G 17% /mnt
如今cephfs已經部署完成,如今須要考慮k8s對接cephfs使用的問題了。k8s使用cephfs進行數據持久化時,主要有三種方式:架構
咱們接下來介紹經過storageclass資源來動態分配pv的方法。
storageclass通常由管理員建立,它做爲存儲資源的抽象定義,對用戶設置的PVC申請屏蔽後端存儲的細節操做,一方面減小了用戶對於存儲資源細節的關注,另外一方面減輕了管理員手工管理PV的工做,由系統根據spec自動完成PV的建立和綁定,實現了動態的資源供應。而且,storageclass是不受namespace限制的。
storageclass的關鍵組成:
動態建立流程圖(來源於kubernetes in action):
因爲k8s沒有內置cephfs的provisioner,故須要安裝第三方的。咱們先來簡單看下此provisioner的架構:
主要有兩部分:
安裝
# git clone https://github.com/kubernetes-retired/external-storage.git # cd external-storage/ceph/cephfs/deploy/ # NAMESPACE=kube-system # sed -r -i "s/namespace: [^ ]+/namespace: $NAMESPACE/g" ./rbac/*.yaml # sed -i "/PROVISIONER_SECRET_NAMESPACE/{n;s/value:.*/value: $NAMESPACE/;}" rbac/deployment.yaml # kubectl -n $NAMESPACE apply -f ./rbac
過幾分鐘檢查是否安裝成功
# kubectl get pods -n kube-system|grep 'cephfs-provisioner' cephfs-provisioner-6c4dc5f646-swncq 1/1 Running 0 1h
咱們複用ceph rdb存儲的secret做爲cephfs的secret:
# vim sc.yaml kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: cephfs provisioner: ceph.com/cephfs parameters: monitors: 172.18.2.172:6789,172.18.2.178:6789,172.18.2.189:6789 adminId: admin adminSecretNamespace: "kube-system" adminSecretName: ceph-secret # kubectl apply -f sc.yaml # kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE ceph-rbd (default) ceph.com/rbd Delete Immediate false 216d cephfs ceph.com/cephfs Delete Immediate false 2h
建立pvc並配置對應的storageclass,並確保pvc的status爲Bound,表明storageclass建立和綁定pv成功:
# vim pvc.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: claim-local spec: accessModes: - ReadWriteMany resources: requests: storage: 1Gi storageClassName: "cephfs" # kubectl apply -f pvc.yaml # kubectl get pvc|grep claim-local NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE claim-local Bound pvc-d30fda86-acfd-48e2-b7bc-568f6148332f 1Gi RWX cephfs 25s
建立一個綁定此pvc的pod,名字爲cephfs-pv-pod1:
# vim pod.yaml kind: Pod apiVersion: v1 metadata: name: cephfs-pv-pod1 spec: containers: - name: cephfs-pv-busybox1 image: busybox command: ["sleep", "60000"] volumeMounts: - mountPath: "/mnt/cephfs" name: cephfs-vol1 readOnly: false volumes: - name: cephfs-vol1 persistentVolumeClaim: claimName: claim-local # kubectl apply -f pod.yaml # kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cephfs-pv-pod1 1/1 Running 0 33s 10.101.26.40 work4 <none> <none>
咱們發現cephfs-pv-pod1被調度到了work4,故咱們給work1添加一個label,而後建立一個cephfs-pv-pod2並設置label把此pod調度到work1:
# kubectl label nodes work1 type=test # kubectl get nodes --show-labels|grep work1 work1 Ready <none> 237d v1.18.2 app=dashboard,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=work1,kubernetes.io/os=linux,type=test # vim pod.yaml kind: Pod apiVersion: v1 metadata: name: cephfs-pv-pod2 spec: containers: - name: cephfs-pv-busybox1 image: busybox command: ["sleep", "60000"] volumeMounts: - mountPath: "/mnt/cephfs" name: cephfs-vol1 readOnly: false volumes: - name: cephfs-vol1 persistentVolumeClaim: claimName: claim-local nodeSelector: type: test # kubectl apply -f pod.yaml # kubectl get pods -o wide|grep cephfs cephfs-pv-pod1 1/1 Running 0 8m39s 10.101.26.40 work4 <none> <none> cephfs-pv-pod2 1/1 Running 0 34s 10.99.1.167 work1 <none> <none>
咱們看到兩個被調度到不一樣node的pod已經運行正常了,如今咱們在cephfs-pv-pod1的存儲寫入數據,而後查看cephfs-pv-pod2的存儲是否正常同步:
# kubectl exec -it cephfs-pv-pod1 sh / # echo "test" >> /mnt/cephfs/1.txt # kubectl exec -it cephfs-pv-pod2 sh / # cat /mnt/cephfs/1.txt test
咱們發現cephfs存儲已經正常掛載和使用了,至此k8s對接cephfs完成。
https://docs.ceph.com/en/latest/cephfs/quota/
https://www.twblogs.net/a/5baf8cda2b7177781a0f2989
https://github.com/kubernetes-retired/external-storage/tree/master/ceph/cephfs