GlusterFS分佈式存儲系統

時間 2019-11-14

原文原文鏈接

一，分佈式文件系統理論基礎node

1.1 分佈式文件系統出現python

計算機經過文件系統管理，存儲數據，而如今數據信息爆炸的時代中人們能夠獲取的數據成指數倍的增加，單純經過增長硬盤個數來擴展計算機文件系統的存儲容量的方式，已經不能知足目前的需求。
分佈式文件系統能夠有效解決數據的存儲和管理難題，將固定於某個地點的某個文件系統，擴展到任意多個地點/多個文件系統，衆多的節點組成一個文件系統網絡。每一個節點能夠分佈在不一樣的地點，經過網絡進行節點間的通訊和數據傳輸。人們在使用分佈式文件系統時，無需關心數據是存儲在哪一個節點上，或者是從哪一個節點從獲取的，只須要像使用本地文件系統同樣管理和存儲文件系統中的數據。

1.2 典型表明NFSlinux

NFS（Network File System）即網絡文件系統，它容許網絡中的計算機之間經過TCP/IP網絡共享資源。在NFS的應用中，本地NFS的客戶端應用能夠透明地讀寫位於遠端NFS服務器上的文件，就像訪問本地文件同樣。NFS的優勢以下：git

（1）節約使用的磁盤空間web

客戶端常用的數據能夠集中存放在一臺機器上，並使用NFS發佈，那麼網絡內部全部計算機能夠經過網絡訪問，沒必要單獨存儲。vim

（2）節約硬件資源api

NFS還能夠共享軟驅，CDROM和ZIP等的存儲設備，減小整個網絡上的可移動設備的數量。緩存

（3）用戶主目錄設定安全

對於特殊用戶，如管理員等，爲了管理的須要，可能會常常登錄到網絡中全部的計算機，若每一個客戶端，均保存這個用戶的主目錄很繁瑣，並且不能保證數據的一致性。實際上，通過NFS服務的設定，而後在客戶端指定這個用戶的主目錄位置，並自動掛載，就能夠在任何計算機上使用用戶主目錄的文件。服務器

1.3 面臨的問題

存儲空間不足，須要更大容量的存儲
直接用NFS掛載存儲，有必定風險，存在單點故障
某些場景不能知足需求，大量的訪問磁盤IO是瓶頸

1.4 GlusterFS概述

GlusterFS是Scale-Out存儲解決方案Gluster的核心，它是一個開源的分佈式文件系統，具備強大的橫向擴展能力，經過擴展可以支持數PB存儲容量和處理數千客戶端。GlusterFS藉助TCP/IP或InfiniBand RDMA網絡將物理分佈的存儲資源彙集在一塊兒，使用單一全局命名空間來管理數據。
GlusterFS支持運行在任何標準IP網絡上標準應用程序的標準客戶端，用戶能夠在全局統一的命令空間中使用NFS/CIFS等標準協議來訪問應用程序。GlusterFS使得用戶可擺脫原有的獨立，高成本的封閉存儲系統，可以利用普通廉價的存儲設備來部署可集中管理，橫向擴展，虛擬化的存儲池，存儲容量可擴展至TB/PB級。
目前glusterfs已被redhat收購，它的官方網站是：http://www.gluster.org/

超高性能（64個節點時吞吐量也就是帶寬甚至達到32GB/s）

1.5 GlusterFS企業主要應用場景

理論和實踐上分析，GlusterFS目前主要適用大文件存儲場景，對於小文件尤爲是海量小文件（小於1M），存儲效率和訪問性能都表現不佳。海量小文件LOSF問題是工業界和學術界公認的難題，GlusterFS做爲通用的分佈式文件系統，並無對小文件做額外的優化措施（小於1M），性能很差也是能夠理解的。

Media

文檔，圖片，音頻，視頻

Shared storage

雲存儲，虛擬化存儲，HPC（高性能計算）

Big data

日誌文件，RFID（射頻識別）數據

二，部署安裝

2.1 GlusterFS 安裝前的準備

電腦一臺，內存>=4G,可用磁盤空間大於50G
安裝VMWARE Workstation虛擬機軟件
安裝好四臺CentOS-6-x86_64（6.2-6.8均可以）的虛擬機
基本系統：1核CPU+1024M內存+10G硬盤
網絡選擇：網絡地址轉換（NAT）
關閉iptables和SELinux
預裝glusterfs軟件包

描述	IP	主機名	需求
Linux_node1	10.1.1.136	Glusterfs01	多添加兩塊各10G的sdb和sdc
Linux_node2	10.1.1.137	Glusterfs02	多添加兩塊各10G的sdb和sdc
Linux_node3	10.1.1.138	Glusterfs03	多添加兩塊各10G的sdb和sdc
Linux_node4	10.1.1.139	Glusterfs04	多添加兩塊各10G的sdb和sdc

#爲了實驗的準確性，請儘可能和我用一個版本的Linux操做系統
#並用實驗給的rpm包做爲yum源

如下01上的操做步驟，其餘三臺服務器也要作

[root@Glusterfs01 ~]# cat /etc/redhat-release
CentOS release 6.5 (Final)
[root@Glusterfs01 ~]# uname -r
2.6.32-431.el6.x86_64
[root@Glusterfs01 ~]# cd rpm/
[root@Glusterfs01 rpm]# ls
dbench-4.0-12.el6.x86_64.rpm
glusterfs-3.7.20-1.el6.x86_64.rpm
glusterfs-api-3.7.20-1.el6.x86_64.rpm
glusterfs-api-devel-3.7.20-1.el6.x86_64.rpm
glusterfs-cli-3.7.20-1.el6.x86_64.rpm
glusterfs-client-xlators-3.7.20-1.el6.x86_64.rpm
glusterfs-coreutils-0.0.1-0.1.git0c86f7f.el6.x86_64.rpm
glusterfs-coreutils-0.2.0-1.el6_37.x86_64.rpm
glusterfs-devel-3.7.20-1.el6.x86_64.rpm
glusterfs-extra-xlators-3.7.20-1.el6.x86_64.rpm
glusterfs-fuse-3.7.20-1.el6.x86_64.rpm
glusterfs-ganesha-3.7.20-1.el6.x86_64.rpm
glusterfs-geo-replication-3.7.20-1.el6.x86_64.rpm
glusterfs-libs-3.7.20-1.el6.x86_64.rpm
glusterfs-rdma-3.7.20-1.el6.x86_64.rpm
glusterfs-resource-agents-3.7.20-1.el6.noarch.rpm
glusterfs-server-3.7.20-1.el6.x86_64.rpm
keyutils-1.4-5.el6.x86_64.rpm
keyutils-libs-1.4-5.el6.x86_64.rpm
libaio-0.3.107-10.el6.x86_64.rpm
libevent-1.4.13-4.el6.x86_64.rpm
libgssglue-0.1-11.el6.x86_64.rpm
libntirpc-1.3.1-1.el6.x86_64.rpm
libntirpc-devel-1.3.1-1.el6.x86_64.rpm
libtirpc-0.2.1-13.el6_9.x86_64.rpm
nfs-utils-1.2.3-75.el6_9.x86_64.rpm
nfs-utils-lib-1.1.5-13.el6.x86_64.rpm
python-argparse-1.2.1-2.1.el6.noarch.rpm
python-gluster-3.7.20-1.el6.noarch.rpm
pyxattr-0.5.0-1.el6.x86_64.rpm
rpcbind-0.2.0-13.el6_9.1.x86_64.rpm
rsync-3.0.6-12.el6.x86_64.rpm
userspace-rcu-0.7.16-2.el6.x86_64.rpm
userspace-rcu-0.7.7-1.el6.x86_64.rpm
userspace-rcu-devel-0.7.16-2.el6.x86_64.rpm
userspace-rcu-devel-0.7.7-1.el6.x86_64.rpm
[root@Glusterfs01 rpm]# yum -y install createrepo
[root@Glusterfs01 rpm]# createrepo -v .

2.2 GlusterFS 安裝

2.2.1 修改主機名

略

2.2.2 添加hosts文件實現集羣主機之間相互可以解析

[root@Glusterfs01 ~]# vim /etc/hosts
[root@Glusterfs01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.1.1.136 glusterfs01
10.1.1.137 glusterfs02
10.1.1.138 glusterfs03
10.1.1.139 glusterfs04

2.2.3 關閉selinux和防火牆

[root@Glusterfs01~]# service iptables stop

[root@Glusterfs01 ~]# chkconfig iptables off
[root@Glusterfs01 ~]# setenforce 0
setenforce: SELinux is disabled

關閉selinux：sed -i 's#SELINUX=enforcing#SELINUX=disabled#' /etc/sysconfig/selinux

2.2.4 利用教程附帶的rpm軟件包組，充當本地定製化yum源

[root@Glusterfs01 rpm]# cd /etc/yum.repos.d/
[root@Glusterfs01 yum.repos.d]# vim CentOS-Media.repo

[root@Glusterfs01 yum.repos.d]# cat CentOS-Media.repo | grep -v "#"

[c6-media]
name=CentOS-$releasever - Media
baseurl=file:///media/CentOS/
file:///media/cdrom/
file:///media/cdrecorder/
gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

[rpm]
name=rpm
baseurl=file:///root/rpm
gpgcheck=0
enabled=1
[root@Glusterfs01 yum.repos.d]# yum -y clean all && yum makecache
[root@Glusterfs01 yum.repos.d]# yum -y install glusterfs-server glusterfs-cli glusterfs-geo-r
eplication

2.3 配置glusterfs

2.3.1 查看glusterfs版本信息

[root@Glusterfs01 ~]# which glusterfs
/usr/sbin/glusterfs
[root@Glusterfs01 ~]# glusterfs -V
glusterfs 3.7.20 built on Jan 30 2017 15:39:27
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

2.3.2 啓動中止服務

[root@Glusterfs01 ~]# /etc/init.d/glusterd status
glusterd is stopped
[root@Glusterfs01 ~]# /etc/init.d/glusterd start
Starting glusterd:                                         [ OK ]
[root@Glusterfs01 ~]# /etc/init.d/glusterd stop
Stopping glusterd:                                         [ OK ]
[root@Glusterfs01 ~]# /etc/init.d/glusterd start
Starting glusterd:                                         [ OK ]
[root@Glusterfs01 ~]# chkconfig glusterd on            #添加開機啓動

2.3.3 存儲主機加入信任存儲池

虛擬機添加信任存儲池
特別提示：只須要讓一個虛擬機進行添加操做便可。但本身並不須要添加信任本身

確保全部的虛擬機的glusterd服務都處於開啓狀態，而後執行以下操做

[root@Glusterfs01 ~]# gluster peer probe glusterfs02
peer probe: success.
[root@Glusterfs01 ~]# gluster peer probe glusterfs03
peer probe: success.
[root@Glusterfs01 ~]# gluster peer probe glusterfs04
peer probe: success.

2.3.4 查看虛擬機信任狀態添加結果

[root@Glusterfs01 ~]# gluster peer status
Number of Peers: 3

Hostname: glusterfs02
Uuid: 7dd28ae9-f31d-4b86-9c82-1e40afac7968
State: Peer in Cluster (Connected)

Hostname: glusterfs03
Uuid: 6cb71933-f9d1-436d-90e0-06d505f9838b
State: Peer in Cluster (Connected)

Hostname: glusterfs04
Uuid: d983c0c1-5f58-4e2b-87ec-9448b3fa49e6
State: Peer in Cluster (Connected)
這時咱們能夠查看每臺虛擬機的信任狀態，他們此時彼此都應該已經互有信任記錄了

2.3.5 配置前的準備工做

在企業裏咱們還須要分區而後才能進行格式化。可是咱們這裏就省略了，咱們直接格式化每臺虛擬機的那塊10G硬盤。

[root@Glusterfs01 ~]# ll /dev/sd*
brw-rw---- 1 root disk 8, 0 Jan 1 00:55 /dev/sda
brw-rw---- 1 root disk 8, 1 Jan 1 00:55 /dev/sda1
brw-rw---- 1 root disk 8, 2 Jan 1 00:55 /dev/sda2
brw-rw---- 1 root disk 8, 16 Jan 1 00:55 /dev/sdb
brw-rw---- 1 root disk 8, 32 Jan 1 00:55 /dev/sdc
[root@Glusterfs01 ~]# mkfs.ext4 /dev/sdb
mke2fs 1.41.12 (17-May-2010)
/dev/sdb is entire device, not just one partition!
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
655360 inodes, 2621440 blocks
131072 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2684354560
80 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

在四臺機器上執行mkdir -p /gluster/brick1 創建掛在塊設備的目錄
掛載磁盤到文件系統(4臺都作，步驟相同)

[root@Glusterfs01 ~]# mkdir -p /gluster/brick1
[root@Glusterfs01 ~]# mount /dev/sdb /gluster/brick1
[root@Glusterfs01 ~]# df -h
Filesystem                   Size Used Avail Use% Mounted on
/dev/mapper/vg_mini-lv_root   36G 1.1G   33G   4% /
tmpfs                        931M     0 931M   0% /dev/shm
/dev/sda1                    485M   33M 427M   8% /boot
/dev/sr0                     4.2G 4.2G     0 100% /media/cdrom
/dev/sdb                     9.9G 151M 9.2G   2% /gluster/brick1

繼續重複以上步驟，將第二塊磁盤sdc格式化，並掛載到每臺虛擬機的/gluster/brick2上

4臺虛擬機加入開機自動掛載

[root@Glusterfs01 ~]# echo "mount /dev/sdb /gluster/brick1" >> /etc/rc.local
[root@Glusterfs01 ~]# echo "mount /dev/sdc /gluster/brick2" >> /etc/rc.local

2.3.6 建立volume分佈式卷

基本卷：
- 分佈式卷（Distributed）：（至關於raid0讀寫速度）
- 複製卷（Replicated）：（至關於raid1）
- 條帶式卷（Striped）：（針對大文件纔會用）
複合卷：
- 分佈式複製卷（Distributed Replicated）：（至關於raid1+0）
- 分佈式條帶卷（Distributed Striped）：（針對大文件）
- 複製條帶卷（Replicated Striped）：
- 分佈式複製條帶卷（Distributed Replicated Striped）：（最安全的）

建立分佈式卷（在glusterfs01上操做）

[root@Glusterfs01 ~]# gluster volume create gs1 glusterfs01:/gluster/brick1 gluste
rfs02:/gluster/brick1 forcevolume create: gs1: success: please start the volume to access data

啓動建立的卷（在glusterfs01上操做）

[root@Glusterfs01 ~]# gluster volume start gs1
volume start: gs1: success

而後咱們發現4臺虛擬機都能看到以下信息（在任意虛擬機上操做）

[root@Glusterfs03 ~]# gluster volume info

Volume Name: gs1 #卷名

Type: Distribute    #分佈式
Volume ID: cfca2f8b-a522-49ae-9f46-b1f50ac61f81      #ID號
Status: Started   #啓動狀態
Number of Bricks: 2     #一共兩個塊設備
Transport-type: tcp   #tcp的鏈接方式
Bricks:      #塊信息
Brick1: glusterfs01:/gluster/brick1
Brick2: glusterfs02:/gluster/brick1
Options Reconfigured:
performance.readdir-ahead: on

2.3.7 volume的兩種掛載方式

（1）以glusterfs方式掛載

掛載捲到目錄（在glusterfs01上操做），將本地的分佈式卷gs1掛載到/mnt目錄下

[root@Glusterfs01 ~]# mount -t glusterfs 127.0.0.1:/gs1 /mnt
[root@Glusterfs01 ~]# df -h
Filesystem                   Size Used Avail Use% Mounted on
/dev/mapper/vg_mini-lv_root   36G 1.1G   33G   4% /
tmpfs                        931M     0 931M   0% /dev/shm
/dev/sda1                    485M   33M 427M   8% /boot
/dev/sr0                     4.2G 4.2G     0 100% /media/cdrom
/dev/sdb                     9.9G 151M 9.2G   2% /gluster/brick1
/dev/sdc                     9.9G 151M 9.2G   2% /gluster/brick2
127.0.0.1:/gs1                20G 302M   19G   2% /mnt        #掛載成功，咱們看到磁盤空間已經整合

在掛載好的/mnt目錄裏建立實驗文件（在glusterfs01上操做）

[root@Glusterfs01 ~]# touch /mnt/{1..5}
[root@Glusterfs01 ~]# ls /mnt
1 2 3 4 5 lost+found
在其餘虛擬機上掛載分佈式卷gs1，查看同步掛載結果

[root@Glusterfs04 ~]# mount -t glusterfs 127.0.0.1:/gs1 /mnt
[root@Glusterfs04 ~]# df -h
Filesystem                   Size Used Avail Use% Mounted on
/dev/mapper/vg_mini-lv_root   36G 1.1G   33G   4% /
tmpfs                        931M     0 931M   0% /dev/shm
/dev/sda1                    485M   33M 427M   8% /boot
/dev/sr0                     4.2G 4.2G     0 100% /media/cdrom
/dev/sdb                     9.9G 151M 9.2G   2% /gluster/brick1
/dev/sdc                     9.9G 151M 9.2G   2% /gluster/brick2
127.0.0.1:/gs1                20G 302M   19G   2% /mnt
[root@Glusterfs04 ~]# ls /mnt
1 2 3 4 5 lost+found
去其餘節點上看這5個文件到底在哪

[root@Glusterfs01 ~]# ll -d /gluster/brick1
drwxr-xr-x 5 root root 4096 Jan 1 04:25 /gluster/brick1
[root@Glusterfs01 ~]# ls /gluster/brick1
1 5 lost+found
[root@Glusterfs02 ~]# ls /gluster/brick1
2 3 4 lost+found
咱們發現文件都是分佈式寫的，互相不重複，就意味着能夠它能夠併發寫，可是這種狀況下壞一個數據就完了，就至關於raid0，分佈卷直接組5張盤的話，就會同時往5個盤裏寫，讀寫速度是5倍。可是其中壞一個盤數據就完了，這就是分佈卷。

（2）以NFS方式進行掛載

在掛載以前咱們先來看一下如何打開glusterfs的NFS掛載方式

在glusterfs01上執行以下操做

[root@Glusterfs01 ~]# gluster volume status     #查看分佈式卷的狀態
Status of volume: gs1
Gluster process                             TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick glusterfs01:/gluster/brick1           49152     0          Y       2815
Brick glusterfs02:/gluster/brick1           49152     0          Y       2035
NFS Server on localhost                      N/A       N/A       N       N/A    #本地分佈式卷NFS掛載未開啓
NFS Server on glusterfs02                   N/A       N/A        N       N/A
NFS Server on glusterfs03                   N/A       N/A        N       N/A
NFS Server on glusterfs04                   N/A       N/A        N       N/A

Task Status of Volume gs1
------------------------------------------------------------------------------
There are no active volume tasks

以上結果是是什麼緣由呢？
若是NFS Server的掛載端口顯示N/A表示未開啓掛載功能，這是因爲要先進行nfs掛載是須要裝兩個nfs的軟件包的rpcbind和nfs-utils
固然就算系統裝了這兩個軟件包，那麼咱們也須要開啓rpcbind服務，而後在重啓glusterfs服務纔可以進行nfs掛載的操做。
如今咱們就來開啓glusterfs01的nfs掛載功能，以下：

在glusterfs01上執行以下操做

[root@Glusterfs01 ~]# rpm -qa nfs-utils   #查看是否安裝nfs-utils
nfs-utils-1.2.3-75.el6_9.x86_64
[root@Glusterfs01 ~]# rpm -qa rpcbind    #查看是否安裝rpcbind
rpcbind-0.2.0-13.el6_9.1.x86_64
[root@Glusterfs01 ~]# /etc/init.d/rpcbind status   #查看rpcbind服務狀態
rpcbind is stopped
[root@Glusterfs01 ~]# /etc/init.d/rpcbind start     #開啓rpcbind服務
Starting rpcbind:                                          [ OK ]
[root@Glusterfs01 ~]# /etc/init.d/glusterd stop    #中止glusterd服務
Stopping glusterd:                                         [ OK ]
[root@Glusterfs01 ~]# /etc/init.d/glusterd start     #開啓glusterd服務
Starting glusterd:                                         [ OK ]

這裏須要等幾秒再查看，就會發現nfs掛載方式開啓了

[root@Glusterfs01 ~]# gluster volume status
Status of volume: gs1
Gluster process                             TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick glusterfs01:/gluster/brick1           49152     0          Y       2815
Brick glusterfs02:/gluster/brick1           49152     0          Y       2035
NFS Server on localhost                     2049      0          Y       3352      #已經開啓
NFS Server on glusterfs03                   N/A       N/A        N       N/A
NFS Server on glusterfs04                   N/A       N/A        N       N/A
NFS Server on glusterfs02                   N/A       N/A        N       N/A

Task Status of Volume gs1
------------------------------------------------------------------------------
There are no active volume tasks

爲了驗證再起一臺虛擬機，充當realserver，web節點，把realserver節點打開
WebClient 10.1.1.140

接下來，咱們嘗試在準備好的第五臺虛擬機WebClient上進行nfs方式的掛載

在WebClient上進行以下操做(客戶端要有nfs-utils)

[root@WebClient ~]# rpm -qa nfs-utils
[root@WebClient ~]# mount /dev/sr0 /media/cdrom/
mount: block device /dev/sr0 is write-protected, mounting read-only
[root@WebClient ~]# yum -y install nfs-utils
[root@WebClient ~]# mount -t nfs 10.1.1.136:/gs1 /mnt    #以nfs方式遠程掛載分佈式卷
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
[root@WebClient ~]# mount -o nolock -t nfs 10.1.1.136:/gs1 /mnt #根據提示咱們加上-o nolock參數
[root@WebClient ~]# df -hT
Filesystem                  Type     Size Used Avail Use% Mounted on
/dev/mapper/vg_mini-lv_root ext4      36G 1.1G   33G   4% /
tmpfs                       tmpfs    931M     0 931M   0% /dev/shm
/dev/sda1                   ext4     485M   33M 427M   8% /boot
/dev/sr0                    iso9660 4.2G 4.2G     0 100% /media/cdrom
10.1.1.136:/gs1             nfs       20G 301M   19G   2% /mnt
[root@WebClient ~]# ls /mnt   #掛載成功
1 2 3 4 5 lost+found

2.3.8 建立分佈式複製卷

在glusterfs任意虛擬機上進行以下操做

[root@Glusterfs01 ~]# gluster volume create gs2 replica 2 glusterfs03:/gluster
/brick1 glusterfs04:/gluster/brick1 forcevolume create: gs2: success: please start the volume to access data
[root@Glusterfs01 ~]# gluster volume info gs2

Volume Name: gs2
Type: Replicate      #複製卷
Volume ID: 44a22292-faa6-4dd1-8ed5-9038e8e01d1f
Status: Created
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: glusterfs03:/gluster/brick1
Brick2: glusterfs04:/gluster/brick1
Options Reconfigured:
performance.readdir-ahead: on
[root@Glusterfs01 ~]# gluster volume start gs2    #啓動卷
volume start: gs2: success
[root@Glusterfs01 ~]# umount /mnt
[root@Glusterfs01 ~]# df -hT
Filesystem                  Type     Size Used Avail Use% Mounted on
/dev/mapper/vg_mini-lv_root ext4      36G 1.1G   33G   4% /
tmpfs                       tmpfs    931M     0 931M   0% /dev/shm
/dev/sda1                   ext4     485M   33M 427M   8% /boot
/dev/sr0                    iso9660 4.2G 4.2G     0 100% /media/cdrom
/dev/sdb                    ext4     9.9G 151M 9.2G   2% /gluster/brick1
/dev/sdc                    ext4     9.9G 151M 9.2G   2% /gluster/brick2
[root@Glusterfs01 ~]# mount -t glusterfs 127.0.0.1:gs2 /mnt
[root@Glusterfs01 ~]# df -hT
Filesystem                  Type            Size Used Avail Use% Mounted on
/dev/mapper/vg_mini-lv_root ext4             36G 1.1G   33G   4% /
tmpfs                       tmpfs           931M     0 931M   0% /dev/shm
/dev/sda1                   ext4            485M   33M 427M   8% /boot
/dev/sr0                    iso9660         4.2G 4.2G     0 100% /media/cdrom
/dev/sdb                    ext4            9.9G 151M 9.2G   2% /gluster/bri
ck1/dev/sdc                    ext4            9.9G 151M 9.2G   2% /gluster/bri
ck2127.0.0.1:gs2               fuse.glusterfs 9.9G 151M 9.2G   2% /mnt
建立文件測試
[root@Glusterfs01 ~]# touch /mnt/{1..6}
[root@Glusterfs01 ~]# ls /mnt/
1 2 3 4 5 6 lost+found
再看看它是怎麼存的
[root@Glusterfs03 ~]# ls /gluster/brick1
1 2 3 4 5 6 lost+found
[root@Glusterfs04 ~]# ls /gluster/brick1
1 2 3 4 5 6 lost+found
複製卷，這就是raid1。安全，可是容量減半

分佈式複製卷，這裏測試只用了兩個，若是用四個就會出現它是兩兩組合的，兩個之間是共享，兩個1之間又是0，又是分佈式的。這就是分佈式複製卷。

2.3.9 建立分佈式條帶卷

[root@Glusterfs01 ~]# gluster volume create gs3 stripe 2 glusterfs01:/gluste
r/brick2 glusterfs02:/gluster/brick2 forcevolume create: gs3: success: please start the volume to access data
[root@Glusterfs01 ~]# gluster volume start gs3
volume start: gs3: success
[root@Glusterfs01 ~]# gluster volume info gs3

Volume Name: gs3
Type: Stripe #條帶卷
Volume ID: d3f57ff2-6968-46eb-8ab4-e0bfef747551
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: glusterfs01:/gluster/brick2
Brick2: glusterfs02:/gluster/brick2
Options Reconfigured:
performance.readdir-ahead: on

三，進行條帶卷的數據寫入測試

[root@Glusterfs01 ~]# umount /mnt
[root@Glusterfs01 ~]# mount -t glusterfs 127.0.0.1:/gs3 /mnt
[root@Glusterfs01 ~]# ls /mnt/
lost+found
[root@Glusterfs01 ~]# df -Th
Filesystem                  Type            Size Used Avail Use% Mounted on
/dev/mapper/vg_mini-lv_root ext4             36G 1.1G   33G   4% /
tmpfs                       tmpfs           931M     0 931M   0% /dev/shm
/dev/sda1                   ext4            485M   33M 427M   8% /boot
/dev/sr0                    iso9660         4.2G 4.2G     0 100% /media/cdrom
/dev/sdb                    ext4            9.9G 151M 9.2G   2% /gluster/brick1
/dev/sdc                    ext4            9.9G 151M 9.2G   2% /gluster/brick2
127.0.0.1:/gs3          fuse.glusterfs   20G 302M   19G   2% /mnt

建立大小爲256M的文件
[root@Glusterfs01 ~]# dd if=/dev/zero of=/mnt/test bs=1024 count=262144
262144+0 records in
262144+0 records out
268435456 bytes (268 MB) copied, 22.295 s, 12.0 MB/s
[root@Glusterfs01 ~]# ls /mnt
lost+found test
[root@Glusterfs01 ~]# du -sh /mnt/test
256M /mnt/test
[root@Glusterfs01 ~]# ls /gluster/brick2
lost+found test
[root@Glusterfs01 ~]# du -sh /gluster/brick2/test
129M /gluster/brick2/test
條帶卷會把大文件給分半，以前的分佈式卷他是按文件個數，不會把單個文件給拆分，可是條帶卷針對大文件它會分半，因此條帶卷是專門針對大文件大數據用的

綜上，咱們得知：

分佈式卷的數據存儲方式是將數據平均寫入到每一個整合的磁盤中,相似於raid0，寫入速度快，但這樣磁盤一旦損壞沒有糾錯能力。

分佈式複製卷的數據存儲方式爲，每一個整合的磁盤中都寫入一樣的數據內容，相似於raid1，數據很是安全，讀取性能高，佔磁盤容量。

咱們發現分佈式條帶卷，是將數據的容量平均分配到了每一個整合的磁盤節點上。大幅提升大文件的併發讀訪問。

四，存儲卷中brick塊設備的擴容

4.1 分佈式複製卷的擴容

先將各個虛擬機上掛載的卸載掉，省得有處於使用狀態的
[root@Glusterfs01 ~]# umount /mnt
[root@Glusterfs01 ~]# gluster volume add-brick gs2 replica 2 glusterfs03:/gluster/brick2 gluste
rfs04:/gluster/brick2 force #添加兩個塊

volume add-brick: success
[root@Glusterfs01 ~]# gluster volume info gs2

Volume Name: gs2
Type: Distributed-Replicate
Volume ID: 44a22292-faa6-4dd1-8ed5-9038e8e01d1f
Status: Started
Number of Bricks: 2 x 2 = 4 #已經擴容
Transport-type: tcp
Bricks:
Brick1: glusterfs03:/gluster/brick1
Brick2: glusterfs04:/gluster/brick1
Brick3: glusterfs03:/gluster/brick2
Brick4: glusterfs04:/gluster/brick2
Options Reconfigured:
performance.readdir-ahead: on

特別提示：
對分佈式複製卷和分佈式條帶捲進行擴容時，要特別注意，若是建立卷之初的時候選擇的是replica 2 或者stripe 2。那麼擴容時，就必須一次性擴容兩個或兩個的倍數的塊設備。
例如你給一個分佈式複製卷的replica爲2，你在增長bricks的時候數量必須爲2，4，6，8等。

4.2 查看擴容後的容量並進行寫入測試

在WebClient上掛載gs2並查看掛載目錄的容量

[root@WebClient ~]# mount -o nolock -t nfs 10.1.1.136:/gs2 /mnt
[root@WebClient ~]# df -Th
Filesystem                  Type     Size Used Avail Use% Mounted on
/dev/mapper/vg_mini-lv_root ext4      36G 1.1G   33G   4% /
tmpfs                       tmpfs    931M     0 931M   0% /dev/shm
/dev/sda1                   ext4     485M   33M 427M   8% /boot
/dev/sr0                    iso9660 4.2G 4.2G     0 100% /media/cdrom
10.1.1.136:/gs2             nfs       20G 301M   19G   2% /mnt

咱們再次寫入數據

在WebClient上進行數據寫入操做

[root@Glusterfs01 ~]# gluster volume info gs2

Volume Name: gs2
Type: Distributed-Replicate
Volume ID: 44a22292-faa6-4dd1-8ed5-9038e8e01d1f
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: glusterfs03:/gluster/brick1
Brick2: glusterfs04:/gluster/brick1
Brick3: glusterfs03:/gluster/brick2
Brick4: glusterfs04:/gluster/brick2
Options Reconfigured:
performance.readdir-ahead: on
看真實存儲

[root@Glusterfs03 ~]# ls /gluster/brick1
1 2 3 4 5 6 lost+found
[root@Glusterfs03 ~]# ls /gluster/brick2
lost+found
[root@Glusterfs04 ~]# ls /gluster/brick1
1 2 3 4 5 6 lost+found
[root@Glusterfs04 ~]# ls /gluster/brick2
lost+found
brick2都是空的也就是說，咱們啓用的新節點沒有進行復制
如今往裏面寫入數據，從11寫到20，看它是怎麼存的

[root@WebClient ~]# touch /mnt/{11..20}
[root@WebClient ~]# ls /mnt
1 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 lost+found

再看看他是怎麼存的

[root@Glusterfs03 ~]# ls /gluster/brick1
1 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 lost+found
[root@Glusterfs03 ~]# ls /gluster/brick2
lost+found
[root@Glusterfs04 ~]# ls /gluster/brick1
1 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 lost+found
[root@Glusterfs04 ~]# ls /gluster/brick2
lost+found
沒有寫到brick2裏面去，由於咱們添加節點沒有進行同步操做，如今我們把新的節點加進去了，可是他的這個gs2還不認得，咱們須要作一次磁盤平衡

經過對擴容的gs2進行寫入測試，咱們發現數據並無被寫入到新加入的塊設備中，這是爲何？
這是由於，爲了數據的安全，新擴容塊設備的卷，默認必須先作一次磁盤平衡（塊設備同步），如此才能正常開始使用。

4.3 進行磁盤存儲的平衡

注意：平衡佈局是頗有必要的，由於佈局結構是靜態的，當新的bricks加入現有卷，新建立的文件會分佈到舊的bricks中，因此須要平衡佈局結構，使新加入的bricks生效。佈局平衡只是使新佈局生效，並不會在新的佈局移動老的數據，若是你想在新佈局生效後，從新平衡卷中的數據，還須要對卷中的數據進行平衡。

對gs2進行磁盤存儲平衡

[root@Glusterfs01 ~]# gluster volume rebalance gs2 start
volume rebalance: gs2: success: Rebalance on gs2 has been started successfully. Use
rebalance status command to check status of the rebalance process.ID: 94c7cd2b-4a1f-4609-b371-06e18e077b1f

檢查gs2塊設備磁盤平衡結果

[root@Glusterfs03 ~]# ls /gluster/brick1
12 14 15 16 17 2 3 4 6 lost+found
[root@Glusterfs03 ~]# ls /gluster/brick2
1 11 13 18 19 20 5 lost+found
[root@Glusterfs04 ~]# ls /gluster/brick1
12 14 15 16 17 2 3 4 6 lost+found
[root@Glusterfs04 ~]# ls /gluster/brick2
1 11 13 18 19 20 5 lost+found
執行磁盤存儲平衡之後，咱們發現數據被複製成了4份在4個塊設備中

在測試一下

[root@WebClient ~]# touch /mnt/{30..35}
[root@WebClient ~]# ls /mnt
1 12 14 16 18 2 3 31 33 35 5 lost+found
11 13 15 17 19 20 30 32 34 4 6
[root@Glusterfs03 ~]# ls /gluster/brick1
12 14 15 16 17 2 3 30 31 32 33 34 35 4 6 lost+found
[root@Glusterfs03 ~]# ls /gluster/brick2
1 11 13 18 19 20 5 lost+found
[root@Glusterfs04 ~]# ls /gluster/brick1
12 14 15 16 17 2 3 30 31 32 33 34 35 4 6 lost+found
[root@Glusterfs04 ~]# ls /gluster/brick2
1 11 13 18 19 20 5 lost+found
以前作過磁盤平衡，現有數據平衡了，新寫數據不平衡了。咱們發現沒有往裏面寫！爲何呢？？
由於沒作磁盤平衡以前，那邊將gs2已經掛載上了，掛載過去的話實際上它的後方是端口對端口，也就是說他以前掛載過，只識別了03和04的brick1的端口，後來磁盤平衡之後因爲我沒有卸載再掛，因此它仍是識別那兩個端口（猜的，否則解釋不通）

解決方法：磁盤平衡之後要從新掛載gs2

[root@WebClient ~]# umount /mnt
[root@WebClient ~]# mount -t nfs -o nolock 10.1.1.136:/gs2 /mnt
[root@WebClient ~]# rm -rf /mnt/*
[root@WebClient ~]# ls /mnt
[root@WebClient ~]# touch /mnt/{1..10}
[root@Glusterfs03 ~]# ls /gluster/brick1
10 2 3 4 6
[root@Glusterfs03 ~]# ls /gluster/brick2
1 5 7 8 9
[root@Glusterfs04 ~]# ls /gluster/brick1
10 2 3 4 6
[root@Glusterfs04 ~]# ls /gluster/brick2
1 5 7 8 9
成功了，擴容正常。

六，構建企業級分佈式存儲

6.1 硬件要求

通常選擇2U的機型，磁盤STAT盤4T，若是I/O要求比較高，能夠採購SSD固態硬盤。爲了充分保證系統的穩定性和性能，要求全部glusterfs服務器硬件配置儘可能一致，尤爲是硬盤數量和大小。機器的RAID卡須要帶電池，緩存越大，性能越好。通常狀況下，建議作RAID10，若是出於空間要求考慮，須要作RAID5，建議最好能有1-2塊硬盤的熱備盤。

6.2 系統要求和分區劃分

系統要求使用CentOS6.x，安裝完成後升級到最新版本，安裝的時候，不要使用LVM，建議/boot分區200M，根分區100G，swap分區和內存同樣大小，剩餘空間給gluster使用，劃分單獨的硬盤空間。系統安裝軟件沒有特殊要求，建議除了開發工具和基本的管理軟件，其餘軟件一概不裝。

6.3 網絡環境

網絡要求所有千兆環境，gluster服務器至少有2塊網卡，1塊網卡綁定供gluster使用，剩餘一塊分配管理網絡ip，用於系統管理。若是有條件購買萬兆交換機，服務器配置萬兆網卡，存儲性能會更好。網絡方面若是安全性要求高，能夠多網卡綁定。

6.4 服務器擺放分佈

服務器主備機器要放在不一樣的機櫃，鏈接不一樣的交換機，即便一個機櫃出現問題，還有一份數據正常訪問。

6.5 構建高性能，高可用存儲

通常在企業中，採用的是分佈式複製卷，由於有數據備份，數據相對安全，分佈式條帶卷目前對glusterfs來講沒有徹底成熟，存在必定的是數據安全風險。

6.5.1 開啓防火牆端口

通常在企業應用中Linux防火牆是打開的，開通服務器之間訪問的端口

iptables -I INPUT -p tcp --dport 24007:24011 -j ACCEPT

iptables -I INPUT -p tcp --dport 49152:49162 -j ACCEPT

[root@Glusterfs01 ~]# cat /etc/glusterfs/glusterd.vol
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option ping-timeout 0
    option event-threads 1
#   option base-port 49152    #默認端口能夠在這裏改，由於這個端口可能會和企業裏的kvm端口衝突

end-volume

6.5.2 Glusterfs文件系統優化

參數項目	說明	缺省值	合法值
Auth.allow	IP訪問受權	*（allow all）	IP地址
Cluster.min-free-disk	剩餘磁盤空間閾值	10%	百分比
Cluster.stripe-block-size	條帶大小	128KB	字節
Network.frame-timeout	請求等待時間	1800s	0-1800
Network.ping-timeout	客戶端等待時間	42s	0-42
Nfs.disabled	關閉NFS服務	Off	Off\|on
Performance.io-thread-count	IO線程數	16	0-65
Performance.cache-refresh-timeout	緩存校驗週期	1s	0-61
Performance.cache-size	讀緩存大小	32MB	字節

Performance.quick-read:優化讀取小文件的性能
Performance.read-ahead:用預讀的方式提升讀取的性能，有利於應用頻繁持續性的訪問文件，當應用完成當前數據塊讀取的時候，下一個數據塊就已經準備好了。
Performance.write-behind:寫入數據時，先寫入緩存內，再寫入硬盤內，以提升寫入的性能。
Performance.io-cache:緩存已經被讀過的。

調整方法：

Glusster volume set <卷> <參數>

6.5.3 監控及平常維護

使用Zabbix自帶模板便可。Cpu，內存，主機存活，磁盤空間，主機運行時間，系統load。平常狀況要查看服務器的監控值，遇到報警要及時處理。

如下命令在複製卷的場景下才會有

gluster volume status gs2 查看節點NFS是否在線（開沒開端口）

gluster volume heal gs2 full 啓動徹底修復

gluster volume heal gs2 info 查看須要修復的文件

gluster volume heal gs2 info healed 查看修復成功的文件

gluster volume heal gs2 info heal-failed 查看修復失敗文件

gluster volume heal gs2 info split-brain 查看腦裂的文件

gluster volume quota gs2 enable --激活quota功能

gluster volume quota gs2 disable --關閉quota功能

gluster volume quota gs2 limit-usage /data 10GB --/gs2/data 目錄限制

gluster volume quota gs2 list --quota 信息列表

gluster volume quota gs2 list /data --限制目錄的quota信息

gluster volume set gs2 features.quota-timeout 5 --設置信息的超時事實上時間

gluster volume quota gs2 remove /data -刪除某個目錄的quota設置

備註：quota 功能，主要是對掛載點下的某個目錄進行空間限額。如：/mnt/glusterfs/data目錄，而不是對組成卷組的空間進行限制

七生產環境遇到常見故障處理

7.1 硬盤故障

由於底層作了raid配置，有硬件故障，直接更換硬盤，會自動同步數據。（raid5）

7.2 一臺主機故障

一臺節點故障的狀況包括如下類型：

1，物理故障
2，同時有多塊硬盤故障，形成是數據丟失
3，系統損壞不可修復

解決方法：

找一臺徹底同樣的機器，至少要保證硬盤數量和大小一致，安裝系統，配置和故障機一樣的ip，安裝gluster軟件，保證配置同樣，在其餘健康的節點上執行命令gluster peer status，查看故障服務器的uuid

例如：

[root@Glusterfs03 ~]# gluster peer status
Number of Peers: 3

Hostname: glusterfs01
Uuid: 6ccc0f39-79e1-4e68-9574-62d3e68ddc9c
State: Peer in Cluster (Connected)

Hostname: glusterfs02
Uuid: 7dd28ae9-f31d-4b86-9c82-1e40afac7968
State: Peer in Cluster (Connected)

Hostname: glusterfs04
Uuid: d983c0c1-5f58-4e2b-87ec-9448b3fa49e6
State: Peer in Cluster (Connected)

修改新加機器的/var/lib/glusterd/glusterd.info和故障機器的同樣

[root@Glusterfs04 ~]# cat /var/lib/glusterd/glusterd.info
UUID=d983c0c1-5f58-4e2b-87ec-9448b3fa49e6
operating-version=30712

在新機器掛載目錄上執行磁盤故障的操做（任意節點）

[root@Glusterfs04 ~]# gluster volume heal gs2 full
Launching heal operation to perform full self heal on volume gs2 has been
successful

就會自動開始同步，可是同步的時候會影響整個系統的性能
能夠查看狀態

[root@Glusterfs04 ~]# gluster volume heal gs2 infoBrick glusterfs03:/gluster/brick1Status: ConnectedNumber of entries: 0Brick glusterfs04:/gluster/brick1Status: ConnectedNumber of entries: 0Brick glusterfs03:/gluster/brick2Status: ConnectedNumber of entries: 0Brick glusterfs04:/gluster/brick2Status: ConnectedNumber of entries: 0