ceph做爲一個統一的分佈式存儲系統,提供了高性能,高可用性,高擴展性。ceph的統一體如今其能夠提供文件系統、塊存儲、對象存儲,在雲環境中,一般採用ceph做爲後端存儲來保證數據的高可用性。
ceph發表於2004年,隨後開源給社區。進過十幾年發展,目前獲得衆多雲廠商支持並普遍使用。好比openstack\kubernetes\虛擬機系統等。。。
架構圖:html
一、部署到裸機,做爲獨立的存儲集羣,爲kubernetes提供存儲服務(線上環境推薦)
二、部署到kubernetes集羣之上,使用Rook管理ceph。Rook是一個能夠提供Ceph集羣管理能力的Operator,它使用CRD控制器來對ceph的資源進行部署和管理。相比部署到裸機。更接近kubernetes,可是屬於一個新的東西,穩定性和故障處理難易程度有不肯定性,生成環境須要自行評估。
三、做爲測試,本文就採用rook來部署ceph集羣。
先看下架構圖
從以上兩張官方給的圖能夠看出,
Rook Operator是核心組件,它主要用來其中存儲集羣,並監控存儲守護進程,確保存儲集羣的健康。
Rook Agent運行到每個存儲節點,並配置了FlexVolume插件和Kubernetes 的存儲卷控制框架(CSI)進行集成。
Rook 用 Kubernetes Pod 的形式,部署 Ceph的MON、OSD 以及 MGR守護進程。
四、部署ceph前,須要確保你的服務器有空閒的硬盤給ceph集羣使用,通常是三塊或者更多,若是隻是測試最少一塊。
以下所示,sdb就是給ceph用的node
fdisk -l Disk /dev/sdb: 107.4 GB, 107374182400 bytes, 209715200 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/sda: 107.4 GB, 107374182400 bytes, 209715200 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x0001ce60 Device Boot Start End Blocks Id System /dev/sda1 * 2048 2099199 1048576 83 Linux /dev/sda2 2099200 209715199 103808000 8e Linux LVM
安裝root-operator,本文部署到命名空間:rook
一、部署common資源nginx
[root@k8s-master001 rook]# kubectl apply -f common.yaml namespace/rook created Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition 。。中間省略N行 clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role created
二、部署operator資源算法
[root@k8s-master001 rook]# kubectl label node k8s-master003 app.storage=rook-ceph node/k8s-master003 labeled [root@k8s-master001 rook]# kubectl label node k8s-master002 app.storage=rook-ceph node/k8s-master002 labeled [root@k8s-master001 rook]# kubectl label node k8s-master001 app.storage=rook-ceph node/k8s-master001 labeled [root@k8s-master001 rook]# kubectl apply -f operator.yaml configmap/rook-ceph-operator-config created deployment.apps/rook-ceph-operator created [root@k8s-master001 rook]# kubectl get po -n rook NAME READY STATUS RESTARTS AGE rook-ceph-operator-87f875bbc-zz9lb 0/1 Pending 0 106s 再次查看,知道所有Pod爲Running狀態,表示安裝成功 若是不是Runing狀態,可使用例如:kubectl describe po rook-discover-5qrc6 -n rook 查看詳情,通常狀況多是鏡像下載失敗,若是是其餘狀況,請根據實際狀況自行解決。。。 [root@k8s-master001 rook]# kubectl get po -n rook NAME READY STATUS RESTARTS AGE rook-ceph-operator-87f875bbc-zz9lb 1/1 Running 3 27m rook-discover-5qrc6 1/1 Running 0 3m42s rook-discover-fzfz5 1/1 Running 0 3m52s rook-discover-fzg7r 1/1 Running 0 20m
三、建立ceph集羣
這裏須要根據實際狀況修改cluster.yaml文件
設置ceph存儲節點,使用的硬盤,不然會把系統可用的磁盤都格式化了,這裏須要設置以下後端
storage: # cluster level storage configuration and selection useAllNodes: false useAllDevices: false #deviceFilter: config: # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore. # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB # journalSizeMB: "1024" # uncomment if the disks are 20 GB or smaller # osdsPerDevice: "1" # this value can be overridden at the node or device level # encryptedDevice: "true" # the default value for this option is "false" # Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named # nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label. nodes: - name: "10.26.25.20" #這個地方最好寫hostname devices: - name: "sdb" - name: "10.26.25.21" devices: - name: "sdb" - name: "10.26.25.22" devices: - name: "sdb"
修改節點情和性,把ceph安裝到固定標籤的節點,這裏使用app.storage=rook-ceph這個標籤。api
placement: all: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: app.storage operator: In values: - rook-ceph
執行部署命令,這個環節須要下載ceph的一些鏡像,根據網絡狀況,耗時可能會比較長。。。bash
[root@k8s-master001 rook]# kubectl apply -f cluster.yaml cephcluster.ceph.rook.io/rook-ceph created [root@k8s-master001 rook]# kubectl get po -n rook NAME READY STATUS RESTARTS AGE csi-cephfsplugin-2fsl9 3/3 Running 0 6m54s csi-cephfsplugin-4r5cg 3/3 Running 0 6m55s csi-cephfsplugin-htdjs 3/3 Running 0 6m54s csi-cephfsplugin-provisioner-7646976d94-9kfd6 5/5 Running 1 6m53s csi-cephfsplugin-provisioner-7646976d94-rbztr 5/5 Running 0 6m53s csi-rbdplugin-56jpj 3/3 Running 0 6m59s csi-rbdplugin-8h25h 3/3 Running 0 6m59s csi-rbdplugin-provisioner-55c946c8c-d25g4 6/6 Running 2 6m58s csi-rbdplugin-provisioner-55c946c8c-g77s8 6/6 Running 1 6m57s csi-rbdplugin-z4qpw 3/3 Running 0 6m59s rook-ceph-crashcollector-k8s-master001-6975bdf888-bpm7r 1/1 Running 0 2m6s rook-ceph-crashcollector-k8s-master002-746b76cd87-5xzz4 1/1 Running 0 3m18s rook-ceph-crashcollector-k8s-master003-5b54f4496-hntgb 1/1 Running 0 2m34s rook-ceph-mgr-a-58594cfb7d-l7wjg 1/1 Running 0 2m7s rook-ceph-mon-a-84b755686-c6cxr 1/1 Running 0 3m18s rook-ceph-mon-b-776469c655-d5jb7 1/1 Running 0 3m1s rook-ceph-mon-c-64648fbd69-n5jh4 1/1 Running 0 2m35s rook-ceph-operator-87f875bbc-cgvwm 1/1 Running 3 7m35s rook-discover-d9fpp 1/1 Running 0 7m31s rook-discover-kxmdx 1/1 Running 0 7m31s rook-discover-z9kzt 1/1 Running 0 7m31s
從以上輸出能夠看到,沒有任何OSD的pod在運行。
查看rook-discover-kxmdx日誌發現。可以找到硬盤sdb,可是沒有對硬盤進程任何操做
,忽然想到ceph也是經過關鍵lvm來格式化硬盤的,而後查看系統,果真沒有安裝lvm2,果斷重來吧:服務器
kubectl delete -f cluster.yaml kubectl delete -f operator.yaml kubectl delete -f common.yaml 在全部節點刪除 rm -rf /var/lib/rook/*
安裝lvm2網絡
yum install -y lvm2
再次部署架構
[root@k8s-master001 rook]# kubectl get po -n rook NAME READY STATUS RESTARTS AGE csi-cephfsplugin-9l55s 3/3 Running 0 10m csi-cephfsplugin-czwlx 3/3 Running 0 10m csi-cephfsplugin-np7n7 3/3 Running 0 10m csi-cephfsplugin-provisioner-7646976d94-579qz 5/5 Running 3 10m csi-cephfsplugin-provisioner-7646976d94-v68wg 5/5 Running 0 10m csi-rbdplugin-9q82d 3/3 Running 0 10m csi-rbdplugin-l55zq 3/3 Running 0 10m csi-rbdplugin-provisioner-55c946c8c-ft4xl 6/6 Running 0 10m csi-rbdplugin-provisioner-55c946c8c-zkzh7 6/6 Running 1 10m csi-rbdplugin-wk7cw 3/3 Running 0 10m rook-ceph-crashcollector-k8s-master001-6c4c78b6cd-gcfvn 1/1 Running 0 6m17s rook-ceph-crashcollector-k8s-master002-746b76cd87-47k84 1/1 Running 0 9m7s rook-ceph-crashcollector-k8s-master003-5b54f4496-ts64m 1/1 Running 0 8m43s rook-ceph-mgr-a-66779c74c5-cnxbm 1/1 Running 0 8m16s rook-ceph-mon-a-5b7bcd77ff-sb4fz 1/1 Running 0 9m25s rook-ceph-mon-b-779c8467d4-bfd4g 1/1 Running 0 9m7s rook-ceph-mon-c-574fd97c79-v5qcd 1/1 Running 0 8m44s rook-ceph-operator-87f875bbc-z7rwn 1/1 Running 1 11m rook-ceph-osd-0-66775549dc-g2ttv 1/1 Running 0 6m11s rook-ceph-osd-2-6c5b4fc67-gtqjf 1/1 Running 0 6m20s rook-ceph-osd-prepare-k8s-master001-jbpgg 0/1 Completed 0 8m13s rook-ceph-osd-prepare-k8s-master002-vfvnp 0/1 Completed 0 8m12s rook-ceph-osd-prepare-k8s-master003-ffd6r 0/1 Completed 0 6m28s rook-discover-74qf2 1/1 Running 0 10m rook-discover-fk4wn 1/1 Running 0 10m rook-discover-fvbcf 1/1 Running 0 10m
終於看到rook-ceph-osd-*的Pod在運行了。osd若是沒有運行,ceph是不能提供存儲能力的。
四、建立ceph-dashboard
[root@k8s-master001 rook]# kubectl apply -f dashboard-external-https.yaml service/rook-ceph-mgr-dashboard-external-https created 使用以下命令查詢dashboard的admin密碼 MGR_POD=`kubectl get pod -n rook | grep mgr | awk '{print $1}'` kubectl -n rook-ceph logs $MGR_POD | grep password
五、安裝ceph-tool工具,就是一個ceph客戶端工具,可使用ceph命令管理ceph集羣
[root@k8s-master001 rook]# kubectl apply -f toolbox.yaml
六、建立kubernetes集羣存儲類,默認reclaimPolicy:策略從Delete改爲Retain,看本身需求改
[root@k8s-master001 rook]# kubectl apply -f storageclass.yaml cephblockpool.ceph.rook.io/k8spool created storageclass.storage.k8s.io/rook-ceph created
驗證存儲類是否可以使用,這裏使用了nodeSelector,把pod指定到特定機器,也能夠不設置
--- apiVersion: apps/v1 kind: StatefulSet metadata: name: demo001 labels: app: demo00 spec: serviceName: demo001 replicas: 1 selector: matchLabels: app: demo001 template: metadata: labels: app: demo001 spec: terminationGracePeriodSeconds: 180 nodeSelector: kubernetes.io/hostname: k8s-master001 containers: - name: demo001 image: nginx imagePullPolicy: IfNotPresent ports: - containerPort: 80 name: port volumeMounts: - name: volume mountPath: /var/www/html volumeClaimTemplates: - metadata: name: volume spec: accessModes: ["ReadWriteOnce"] storageClassName: rook-ceph resources: requests: storage: 1Gi
執行部署kubectl apply -f demo.yaml
[root@k8s-master001 rook]# kubectl get po NAME READY STATUS RESTARTS AGE demo001-0 1/1 Running 0 78s 查看可提供服務的存儲類 [root@k8s-master001 rook]# kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE rook-ceph rook.rbd.csi.ceph.com Retain Immediate true 8m15s 看已經建立的存儲卷 [root@k8s-master001 rook]# kubectl get pv,pvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-e96e54cb-88bb-44b0-a07d-19cbb36fe739 1Gi RWO Retain Bound default/volume-demo001-0 rook-ceph 104s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/volume-demo001-0 Bound pvc-e96e54cb-88bb-44b0-a07d-19cbb36fe739 1Gi RWO rook-ceph 110s
從以上輸出能夠看出,kubernetes經過調用存儲類建立了PV:pvc-e96e54cb-88bb-44b0-a07d-19cbb36fe739 ,並把它和PVC:volume-demo001-0綁定。
如今咱們進入nginx pod,查看掛載的磁盤狀況
[root@k8s-master001 rook]# kubectl exec -ti demo001-0 /bin/sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. # df -h Filesystem Size Used Avail Use% Mounted on overlay 50G 5.6G 45G 12% / /dev/rbd0 976M 2.6M 958M 1% /var/www/html
這裏/dev/rbd0就是ceph集羣爲nginx提供的後端存儲,大小爲1G,在demo.yaml部署文件中指定。
一、由於ceph在穿件osd的時候須要用到系統工具lvm2,部署以前最好提早裝好。
二、在cluster.yaml中指定硬盤信息是,最好使用hostname,或者保證DNS系統解析正確。
nodes: - name: "10.26.25.20" #這個地方最好寫hostname devices: - name: "sdb"
三、用來給ceph使用的硬盤不要手動建分區。
四、若是從新部署,記得再次部署以前先刪除/var/lib/rook/目錄,避免有老的集羣信息殘留。
五、生產環境最好使用標籤,把ceph安裝到指定的節點,同時避免安裝到master節點。
注:文中圖片來源於網絡,若有侵權,請聯繫我及時刪除。