Rook是基於的Ceph的分佈式存儲系統,可使用kubectl命令部署,也可使用Helm進行管理和部署。node
Rook是專用於Cloud-Native環境的文件、塊、對象存儲服務。它實現了一個自我管理的、自我擴容的、自我修復的分佈式存儲服務。內容包括:git
Rook支持自動部署、啓動、配置、分配(provisioning)、擴容/縮容、升級、遷移、災難恢復、監控,以及資源管理。 爲了實現全部這些功能,Rook依賴底層的容器編排平臺。github
目前Rook仍然處於Alpha版本,初期專一於Kubernetes+Ceph。Ceph是一個分佈式存儲系統,支持文件、塊、對象存儲,在生產環境中被普遍應用。docker
Rook和K8S的交互關係以下圖:數據庫
說明以下:後端
Rook守護程序(Mons, OSDs, MGR, RGW, MDS)被編譯到單體的程序rook中,並打包到一個很小的容器中。該容器還包含Ceph守護程序,以及管理、存儲數據所需的工具。api
Rook隱藏了Ceph的不少細節,而向它的用戶暴露物理資源、池、卷、文件系統、Bucket等概念。緩存
K8S版本要求1.6+,Rook須要管理K8S存儲的權限,此外你須要容許K8S加載Rook存儲插件。bash
注意:K8S 1.10已經經過CSI接口支持容器存儲,不須要下面的插件加載操做。網絡
Rook基於FlexVolume來集成K8S卷控制框架,基於FlexVolume實現的存儲驅動,必須存放到全部K8S節點的存儲插件專用目錄中。
此目錄的默認值是/usr/libexec/kubernetes/kubelet-plugins/volume/exec/。可是某些OS下部署K8S時此目錄是隻讀的,例如CoreOS。你能夠指定Kubelet的啓動參數,修改成其它目錄:
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
使用下文的YAML來建立Rook相關資源後,存儲插件Rook會自動加載到全部K8S節點的存儲插件目錄中:
ls /usr/libexec/kubernetes/kubelet-plugins/volume/exec/rook.io~rook/ # 輸出(一個可執行文件) # rook
在K8S 1.9.x中,須要同時修改rook-operator配置文件中的環境變量:
- name: FLEXVOLUME_DIR_PATH value: "/var/lib/kubelet/volumeplugins"
Rook官方提供了資源配置文件樣例,到這裏下載:
git clone https://github.com/rook/rook.git
執行下面的命令部署:
kubectl apply -f /home/alex/Go/src/rook/cluster/examples/kubernetes/rook-operator.yaml
or
等待rook-operator和全部rook-agent變爲Running狀態:
kubectl -n rook-system get pod # NAME READY STATUS RESTARTS AGE # rook-agent-5ttnt 1/1 Running 0 39m # rook-agent-bmnwn 1/1 Running 0 39m # rook-agent-n8nwd 1/1 Running 0 39m # rook-agent-s6b7r 1/1 Running 0 39m # rook-agent-x5p5n 1/1 Running 0 39m # rook-agent-x6mnj 1/1 Running 0 39m # rook-operator-6f8bbf9b8-4fd29 1/1 Running 0 22m
到這裏,Rook Operator以及全部節點的Agent應該已經正常運行了。如今能夠建立一個Rook Cluster。你必須正確配置dataDirHostPath,才能保證重啓後集羣配置信息不丟失。
執行下面的命令部署:
kubectl apply -f /home/alex/Go/src/rook/cluster/examples/kubernetes/rook-cluster.yaml
等待rook名字空間的全部Pod變爲Running狀態:
kubectl -n rook get pod # NAME READY STATUS RESTARTS AGE # rook-api-848df956bf-blskc 1/1 Running 0 39m # rook-ceph-mgr0-cfccfd6b8-x597n 1/1 Running 0 39m # rook-ceph-mon0-fj4mx 1/1 Running 0 39m # rook-ceph-mon1-7gjjq 1/1 Running 0 39m # rook-ceph-mon2-tc4t4 1/1 Running 0 39m # rook-ceph-osd-6rkbt 1/1 Running 0 39m # rook-ceph-osd-f6x62 1/1 Running 1 39m # rook-ceph-osd-k4rmm 1/1 Running 0 39m # rook-ceph-osd-mtfv5 1/1 Running 1 39m # rook-ceph-osd-sllbh 1/1 Running 2 39m # rook-ceph-osd-wttj4 1/1 Running 2 39m
塊存儲(Block Storage)能夠掛載到單個Pod的文件系統中。
在提供(Provisioning)塊存儲以前,須要先建立StorageClass和存儲池。K8S須要這兩類資源,才能和Rook交互,進而分配持久卷(PV)。
執行下面的命令建立StorageClass和存儲池:
kubectl apply -f /home/alex/Go/src/rook/cluster/examples/kubernetes/rook-storageclass.yaml
確認K8S資源的狀態:
kubectl get storageclass # NAME PROVISIONER AGE # rook-block rook.io/block 5m kubectl get pool --all-namespaces # NAMESPACE NAME AGE # rook replicapool 5m
首先聲明一個PVC,建立部署文件 pvc-test.yml:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-pvc namespace: dev spec: storageClassName: rook-block accessModes: - ReadWriteOnce resources: requests: storage: 128Mi
保存,而後進行部署:
kubectl create -f pvc-test.yml
肯定PV被成功提供:
kubectl -n dev get pvc # 已經綁定到PV # NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE # pvc/test-pvc Bound pvc-e0 128Mi RWO rook-block 10s kubectl -n dev get pv # 已經綁定到PVC # NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE # pv/pvc-e0 128Mi RWO Delete Bound dev/test-pvc rook-block 10s
而後,建立一個Pod配置文件pv-consumer.yml,使用PV:
apiVersion: v1 kind: Pod metadata: name: test namespace: dev spec: restartPolicy: OnFailure containers: - name: test-container image: busybox volumeMounts: - name: test-pv mountPath: /var/test command: ['sh', '-c', 'echo Hello > /var/test/data; exit 0'] volumes: - name: test-pv persistentVolumeClaim: claimName: test-pvc
部署該pv-consumer.yml:
kubectl apply -f pv-consumer.yml
此Pod應該很快就執行完畢:
kubectl -n dev get pod test # NAME READY STATUS RESTARTS AGE # test 0/1 Completed 0 17s
刪除此Pod: kubectl -n dev delete pod test。
能夠發現持久卷仍然存在。把PV掛載到另一個Pod(pv-consumer2.yml):
apiVersion: v1 kind: Pod metadata: name: test namespace: dev spec: restartPolicy: OnFailure containers: - name: test-container image: busybox volumeMounts: - name: test-pv mountPath: /var/test command: ['sh', '-c', 'cat /var/test/data; exit 0'] volumes: - name: test-pv persistentVolumeClaim: claimName: test-pvc
部署該Pod,以下:
kubectl apply -f pv-consumer2.yml
查看第二個Pod的日誌輸出:
kubectl -n dev logs test test-container # Hello
能夠看到,針對PV的讀寫操做正常。
執行下面的命令,解除對塊存儲的支持:
kubectl delete -n rook pool replicapool kubectl delete storageclass rook-block
若是再也不使用Rook,或者但願從新搭建Rook集羣,你須要:
命令示例:
kubectl delete -n rook pool replicapool kubectl delete storageclass rook-block kubectl delete -n kube-system secret rook-admin kubectl delete -f kube-registry.yaml # 刪除Cluster CRD kubectl delete -n rook cluster rook # 當Cluster CRD被刪除後,刪除Rook Operator和Agent kubectl delete thirdpartyresources cluster.rook.io pool.rook.io objectstore.rook.io filesystem.rook.io volumeattachment.rook.io # ignore errors if on K8s 1.7+ kubectl delete crd clusters.rook.io pools.rook.io objectstores.rook.io filesystems.rook.io volumeattachments.rook.io # ignore errors if on K8s 1.5 and 1.6 kubectl delete -n rook-system daemonset rook-agent kubectl delete -f rook-operator.yaml kubectl delete clusterroles rook-agent kubectl delete clusterrolebindings rook-agent # 刪除名字空間 kubectl delete namespace rook # 清理全部節點的dataDirHostPath目錄(默認/var/lib/rook)
一般會把Rook相關的K8S資源存放在如下名字空間:
apiVersion: v1 kind: Namespace metadata: name: rook-system --- apiVersion: v1 kind: Namespace metadata: name: rook
apiVersion: v1 kind: ServiceAccount metadata: name: rook-operator namespace: rook-system imagePullSecrets: - name: gmemregsecret
定義一個訪問權限的集合。
kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-operator rules: - apiGroups: - "" resources: - namespaces - serviceaccounts - secrets - pods - services - nodes - nodes/proxy - configmaps - events - persistentvolumes - persistentvolumeclaims verbs: - get - list - watch - patch - create - update - delete - apiGroups: - extensions resources: - thirdpartyresources - deployments - daemonsets - replicasets verbs: - get - list - watch - create - update - delete - apiGroups: - apiextensions.k8s.io resources: - customresourcedefinitions verbs: - get - list - watch - create - delete - apiGroups: - rbac.authorization.k8s.io resources: - clusterroles - clusterrolebindings - roles - rolebindings verbs: - get - list - watch - create - update - delete - apiGroups: - storage.k8s.io resources: - storageclasses verbs: - get - list - watch - delete - apiGroups: - rook.io resources: - "*" verbs: - "*"
爲帳號rook-operator授予上述角色。
kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-operator namespace: rook-system roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: rook-operator subjects: - kind: ServiceAccount name: rook-operator namespace: rook-system
這是一個Deployment,它在容器中部署Rook Operator。後者啓動後會自動部署Rook Agent,Operator和Agent使用的是同一鏡像。
對於下面這樣的配置,使用私有鏡像的,不但須要爲rook-operator配置imagePullSecrets,還要爲運行Rook Agent的服務帳戶rook-agent配置imagePullSecrets:
kubectl --namespace=rook-system create secret docker-registry gmemregsecret \ --docker-server=docker.gmem.cc --docker-username=alex \ --docker-password=lavender --docker-email=k8s@gmem.cc kubectl --namespace=rook-system patch serviceaccount default -p '{"imagePullSecrets": [{"name": "gmemregsecret"}]}' kubectl --namespace=rook-system patch serviceaccount rook-agent -p '{"imagePullSecrets": [{"name": "gmemregsecret"}]}'
apiVersion: apps/v1beta1 kind: Deployment metadata: name: rook-operator namespace: rook-system spec: replicas: 1 template: metadata: labels: app: rook-operator spec: serviceAccountName: rook-operator containers: - name: rook-operator image: docker.gmem.cc/rook/rook:master args: ["operator"] env: - name: ROOK_MON_HEALTHCHECK_INTERVAL value: "45s" - name: ROOK_MON_OUT_TIMEOUT value: "300s" - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace
定製資源定義(Custom Resources Definition,CRD)即Rook相關的資源的配置規格,每種類型的資源具備本身的CRD 。
這種資源對應了基於Rook的存儲集羣。
apiVersion: rook.io/v1alpha1 kind: Cluster metadata: name: rook namespace: rook spec: # 存儲後端,當前僅支持Ceph backend: ceph # 配置文件在宿主機的存放目錄 dataDirHostPath: /var/lib/rook # 若是設置爲true則使用宿主機的網絡,而非容器的SDN(軟件定義網絡) hostNetwork: false # 啓動mon的數量,必須奇數,1-9之間 monCount: 3 # 控制Rook的各類服務如何被K8S調度 placement: # 整體規則,具體服務(api, mgr, mon, osd)的規則覆蓋整體規則 all: # Rook的Pod可以被調用到什麼節點上 nodeAffinity: # 硬限制 配置變動後已經運行的Pod不被影響 requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: # 節點必須具備role=storage-node - matchExpressions: - key: role operator: In values: - storage-node # Rook可以調用到運行了怎樣的其它Pod的拓撲域上 podAffinity: podAntiAffinity: # 能夠容忍具備哪些taint的節點 tolerations: - key: storage-node operator: Exists api: nodeAffinity: podAffinity: podAntiAffinity: tolerations: mgr: nodeAffinity: podAffinity: podAntiAffinity: tolerations: mon: nodeAffinity: tolerations: osd: nodeAffinity: podAffinity: podAntiAffinity: tolerations: # 配置各類服務的資源需求 resources: api: limits: cpu: "500m" memory: "1024Mi" requests: cpu: "500m" memory: "1024Mi" mgr: mon: osd: # 集羣級別的存儲配置,每一個節點均可以覆蓋 storage: # 是否全部節點都用於存儲。若是指定nodes配置,則必須設置爲false useAllNodes: true # 是否在節點上發現的全部設備,都自動的被OSD消費 useAllDevices: false # 正則式,指定哪些設備能夠被OSD消費,示例: # sdb 僅僅使用設備/dev/sdb # ^sd. 使用全部/dev/sd*設備 # ^sd[a-d] 使用sda sdb sdc sdd # 能夠指定裸設備,Rook會自動分區但不掛載 deviceFilter: ^vd[b-c] # 每一個節點上用於存儲OSD元數據的設備。使用低讀取延遲的設備,例如SSD/NVMe存儲元數據能夠提高性能 metadataDevice: # 集羣的位置信息,例如Region或數據中心,被直接傳遞給Ceph CRUSH map location: # OSD的存儲格式的配置信息 storeConfig: # 可選filestore或bluestore,默認後者,它是Ceph的一個新的存儲引擎 # bluestore直接管理裸設備,拋棄了ext4/xfs等本地文件系統。在用戶態下使用Linux AIO直接對裸設備IO storeType: bluestore # bluestore數據庫容量 正常尺寸的磁盤能夠去掉此參數,例如100GB+ databaseSizeMB: 1024 # filestore日誌容量 正常尺寸的磁盤能夠去掉此參數,例如20GB+ journalSizeMB: 1024 # 用於存儲的節點目錄。在一個物理設備上使用兩個目錄,會對性能有負面影響 directories: - path: /rook/storage-dir # 能夠針對每一個節點進行配置 nodes: # 節點A的配置 - name: "172.17.4.101" directories: - path: "/rook/storage-dir" resources: limits: cpu: "500m" memory: "1024Mi" requests: cpu: "500m" memory: "1024Mi" # 節點B的配置 - name: "172.17.4.201" - name: "sdb" - name: "sdc" storeConfig: storeType: bluestore - name: "172.17.4.301" deviceFilter: "^sd."
apiVersion: rook.io/v1alpha1 kind: Pool metadata: name: ecpool namespace: rook spec: # 存儲池中的每份數據是不是複製的 replicated: # 副本的份數 size: 3 # Ceph的Erasure-coded存儲池消耗更少的存儲空間,必須禁用replicated erasureCoded: # 每一個對象的數據塊數量 dataChunks: 2 # 每一個對象的代碼塊數量 codingChunks: 1 crushRoot: default
apiVersion: rook.io/v1alpha1 kind: ObjectStore metadata: name: my-store namespace: rook spec: # 元數據池,僅支持replication metadataPool: replicated: size: 3 # 數據池,支持replication或erasure codin dataPool: erasureCoded: dataChunks: 2 codingChunks: 1 # RGW守護程序設置 gateway: # 支持S3 type: s3 # 指向K8S的secret,包含數字證書信息 sslCertificateRef: # RGW Pod和服務監聽的端口 port: 80 securePort: # 爲此對象存儲提供負載均衡的RGW Pod數量 instances: 1 # 是否在全部節點上啓動RGW。若是爲false則必須設置instances allNodes: false placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: role operator: In values: - rgw-node tolerations: - key: rgw-node operator: Exists podAffinity: podAntiAffinity: resources: limits: cpu: "500m" memory: "1024Mi" requests: cpu: "500m" memory: "1024Mi"
apiVersion: rook.io/v1alpha1 kind: Filesystem metadata: name: myfs namespace: rook spec: # 元數據池 metadataPool: replicated: size: 3 # 數據池 dataPools: - erasureCoded: dataChunks: 2 codingChunks: 1 # MDS守護程序的設置 metadataServer: # MDS活動實例數量 activeCount: 1 # 若是設置爲true,則額外的MDS實例處於主動Standby狀態,維持文件系統元數據的熱緩存 # 若是設置爲false,則額外MDS實例處於被動Standby狀態 activeStandby: true placement: resources:
Rook提供了一個工具箱容器,該容器中的命令能夠用來調試、測試Rook(rook-tools.yaml):
apiVersion: v1 kind: Pod metadata: name: rook-tools namespace: rook spec: dnsPolicy: ClusterFirstWithHostNet containers: - name: rook-tools image: rook/toolbox:master imagePullPolicy: IfNotPresent env: - name: ROOK_ADMIN_SECRET valueFrom: secretKeyRef: name: rook-ceph-mon key: admin-secret securityContext: privileged: true volumeMounts: - mountPath: /dev name: dev - mountPath: /sys/bus name: sysbus - mountPath: /lib/modules name: libmodules - name: mon-endpoint-volume mountPath: /etc/rook hostNetwork: false volumes: - name: dev # 將宿主機目錄做爲Pod的卷 hostPath: path: /dev - name: sysbus hostPath: path: /sys/bus - name: libmodules hostPath: path: /lib/modules - name: mon-endpoint-volume configMap: # 此ConfigMap已經存在 name: rook-ceph-mon-endpoints items: - key: data path: mon-endpoints
建立好上面的Pod後,執行如下命令鏈接到它:
kubectl -n rook exec -it rook-tools bash
rookctl開箱即用的命令包括rookctl、ceph、rados,你也能夠安裝任何其它工具。
注意:此命令已經被棄用,若是須要配置集羣,請使用CRD。
Rook的客戶端工具rookctl能夠用來管理集羣的塊、對象、文件存儲。
子命令 | 說明 |
block | 管理集羣中的塊設備和鏡像: # rookctl block list # RBD即 RADOS Block Devices # RADOS 即 Reliable, Autonomic Distributed Object Store 可靠原子分佈式對象存儲 # RADOS是Ceph的核心之一,可以在動態變化和異質結構的存儲設備集羣之上提供一種穩定、可擴展、高性能的 # 單一邏輯對象存儲接口和可以實現節點的自適應和自管理的存儲系統 NAME POOL SIZE DEVICE MOUNT pvc-006bea14-23bd-11e8-9763-deadbeef00a0 replicapool 256.00 MiB rbd pvc-145bc143-23bf-11e8-9763-deadbeef00a0 replicapool 256.00 MiB rbd pvc-274e4a9c-23c1-11e8-9763-deadbeef00a0 replicapool 256.00 MiB rbd pvc-2aaaf126-23bf-11e8-9763-deadbeef00a0 replicapool 256.00 MiB rbd pvc-3b70adcb-23bf-11e8-9763-deadbeef00a0 replicapool 256.00 MiB rbd pvc-9e6bb98a-23be-11e8-9763-deadbeef00a0 replicapool 256.00 MiB rbd pvc-c0d5e89b-23be-11e8-9763-deadbeef00a0 replicapool 256.00 MiB rbd pvc-d6f4938d-23be-11e8-9763-deadbeef00a0 replicapool 256.00 MiB rbd pvc-e643cf67-23be-11e8-9763-deadbeef00a0 replicapool 256.00 MiB rbd pvc-f9a64409-23be-11e8-9763-deadbeef00a0 replicapool 256.00 MiB rbd
|
filesystem | 管理集羣中的共享文件系統 |
object | 管理集羣中的對象存儲 |
node | 管理集羣中的節點 |
pool | 管理集羣中的存儲池 |
status | 輸出集羣狀態: # rookctl status # 總體狀態報告 OVERALL STATUS: WARNING SUMMARY: SEVERITY NAME MESSAGE WARNING TOO_FEW_PGS too few PGs per OSD (8 < min 30) WARNING MON_CLOCK_SKEW clock skew detected on mon.rook-ceph-mon1, mon.rook-ceph-mon3 # 總存儲空間,及其用量 USAGE: TOTAL USED DATA AVAILABLE 377.24 GiB 252.17 GiB 105.11 MiB 125.07 GiB MONITORS: NAME ADDRESS IN QUORUM STATUS rook-ceph-mon18 10.96.33.35:6790/0 true OK rook-ceph-mon1 10.97.38.247:6790/0 true WARNING rook-ceph-mon3 10.105.193.133:6790/0 true WARNING MGRs: NAME STATUS rook-ceph-mgr0 Active # OSD,即Object Storage Daemon,是Ceph的對象存儲守護進程 OSDs: TOTAL UP IN FULL NEAR FULL 12 12 12 false false PLACEMENT GROUPS (100 total): STATE COUNT active+clean 100
|