mooseFS 分佈式文件系統

市面上各類分佈式文件系統品種繁多,層出不窮。列舉幾個主要的:

mogileFS:Key-Value型元文件系統,不支持FUSE,應用程序訪問它時須要API,主要用在web領域處理海量小圖片,效率相比mooseFS高不少。

fastDFS:國人在mogileFS的基礎上進行改進的key-value型文件系統,一樣不支持FUSE,提供比mogileFS更好的性能。

mooseFS:支持FUSE,相對比較輕量級,對master服務器有單點依賴,用perl編寫,性能相對較差,國內用的人比較多

glusterFS:支持FUSE,比mooseFS龐大

ceph:支持FUSE,客戶端已經進入了linux-2.6.34內核,也就是說能夠像ext3/rasierFS同樣,選擇ceph爲文件系統。完全的分佈式,沒有單點依賴,用C編寫,性能較好。基於不成熟的btrfs,其自己也很是不成熟。

lustre:Oracle公司的企業級產品,很是龐大,對內核和ext3深度依賴

NFS:老牌網絡文件系統,具體不瞭解,反正NFS最近幾年沒發展,確定不能用。

原本我打算用mogileFS,由於它用的人最多,並且個人主要需求都是在web方面。

可是研究了它的api以後發現,Key-Value型文件系統沒有目錄結構,致使不能用list某個子目錄的全部文件,不能直接像本地文件系統同樣操做,幹什麼事情都須要一個api,讓人十分不爽。

mogileFs這種作法,多是受同一個開發團隊的另外一個大名鼎鼎的產品memcached的偵聽端口+api模式影響,也有多是mogileFS剛開始設計的時候,FUSE尚未開始流行。

總之我決心要找一個支持FUSE的分佈式文件系統,最後就在mooseFS, glusterFS, ceph中選擇。從技術上來看,ceph確定是最棒的,用c編寫,進入linux-2.6.34內核,基於btrfs文件系統,保證了它的高性能,而多臺master的結構完全解決了單點依賴問題,從而實現了高可用。但是ceph太不成熟了,它基於的btrfs自己就不成熟,它的官方網站上也明確指出不要把ceph用在生產環境中。

並且國內用的人較少,linux發行版中,ubuntu10.04的內核版本是2.6.32,仍然不能直接使用ceph。

而glusterFS比較適合大型應用,口碑相對較差,所以也不考慮。

最後我選擇了缺點和優勢一樣明顯的mooseFS。雖然它有單點依賴,它的master很是佔內存。可是根據個人需求,mooseFS已經足夠知足個人存儲需求。國內mooseFS的人比較多,而且有不少人用在了生產環境,更加堅決了個人選擇。

打算用一臺高性能服務器(雙路至強5500, 24GB內存)做爲爲master,兩臺HP DL360G4(6塊SCSI 146GB)做爲chunk服務器,搭建一個冗餘度爲2的分佈式文件系統,提供給web服務中的每一臺服務器使用。




一.MooseFS簡介:

MooseFs是一個具備容錯功能的網絡分佈式文件系統.

MooseFS獨有的特性:

*高可靠性,數據能在不一樣計算機上存儲若干副本。
*經過添加新的計算機或是磁盤來動態擴展空間。
*能存儲特定時間內刪除的文件。
*創建文件快照,和整個原文件保持一致的副本,原文件也能夠正在被訪問或寫入

二. MooseFS架構(如圖):

包括四種類型的機器:

*Managing server(master server)
*Data servers(chunk servers)
*Metadata backup servers(metalogger server)
*Client
三.支持的平臺:

*Linux (Linux 2.6.14 and up have FUSE support included in the official kernel)
*FreeBSD
*NetBSD
*OpenSolaris
*MacOS X
四.環境以下:

Managing server(master server):                     OS: Centos5.4           IP:192.168.2.241
Metadata backup server (metalogger server):         OS: Centos5.4           IP:192.168.2.242
Data servers(chunk servers):                        OS: Centos5.4           IP:192.168.2.243
                                                    OS: Centos5.4           IP:192.168.2.244
Client(mfsmount):                                   OS: Ubuntu9.10          IP:192.168.2.66
五.安裝配置

1.Master server機器上:

*下載安裝

wget  http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz

groupadd mfs
useradd mfs –g mfstar -zxvf mfs-1.6.13.tar.gz
cd mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfschunkserver --disable-mfsmount --with-default-group=mfs --with-default-user=mfs
make&make install
*修改相關配置:

cd  /usr/local/mfs/etc
 
mv mfsexports.cfg.dist           mfsexports.cfg
mv mfsmaster.cfg.dist            mfsmaster.cfgcd  /usr/local/mfs/var/mfs
 
mv  metadata.mfs.empty    metadata.mfs  (不然啓動時會提示以下錯誤)
if this is new instalation then rename metadata.mfs.empty as metadata.mfs
init: file system manager failed !!! error occured during initialization – exiting
*修改mfsexports.cfg爲以下:(容許192.168.2.66 掛載)

#*            /    ro
#192.168.1.0/24        /    rw
#192.168.1.0/24        /    rw,alldirs,maproot=0,password=passcode
#10.0.0.0-10.0.0.5    /test    rw,maproot=nobody,password=test
#*            .    rw
192.168.2.66            /    rw,alldirs,maproot=0
*修改/etc/hosts添加:

192.168.2.241          mfsmaster
*其它配置文件我採用的默認方式

*啓動master

/usr/local/mfs/sbin/mfsmaster  start
*關閉master

/usr/local/mfs/sbin/mfsmaster  stop

啓動: /usr/local/mfs/sbin/mfscgiserv
中止: kill /usr/local/mfs/sbin/mfscgiserv
2. metalogger server機器上

*下載安裝:

wget  http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz

groupadd mfs
useradd mfs –g mfstar -zxvf mfs-1.6.13.tar.gz
cd mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfschunkserver --disable-mfsmount --with-default-user=mfs --with-default-user=mfs
make&make install
*修改相關配置:

cd  /usr/local/mfs/etc
mv mfsmetalogger.cfg.dist    mfsmetalogger.cfg
*修改/etc/hosts添加:

192.168.2.241          mfsmaster
*其它保持默認配置

*啓動metalogger:

/usr/local/mfs/sbin/mfsmetalogger start
*關閉metalogger:

/usr/local/mfs/sbin/mfsmetalogger stop
注:metalogger鏈接master的9419端口,注意Firewall把端口打開,我在作測試時把iptables 關閉了.當master server出現故障須要恢復時能夠從metalogger server 複製metadata.mfs.back 和最後一個日誌,缺一不可.

3.chunk server 機器上

*下載安裝:

wget  http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz

groupadd  mfs
useradd   mfs –g mfstar -zxvf mfs-1.6.13.tar.gz
cd  mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfsmaster --disable-mfsmount --with-default-user=mfs --with-default-group=mfs
make&make install
*修改相關配置:

cd  /usr/local/mfs/etc
mv  mfschunkserver.cfg.dist         mfschunkserver.cfg
mv  mfshdd.cfg.dist                 mfshdd.cfg
*修改mfshdd.cfg 文件內容以下:

/store-data
注:mfshdd.cfg文件存放的是用來給MooseFS使用的空間,我這裏是/store-data分區,把其它沒用的刪除掉,若是你有更多分區能夠一一加進來.

*修改 /etc/hosts添加:

192.168.2.241          mfsmaster
我這裏mfschunkserver.cfg文件採用默認配置.

*改變/store-data權限

chown –R mfs.mfs /store-data
*啓動chunk server:

/usr/local/mfs/sbin/mfschunkserver start
*中止chunk server

/usr/local/mfs/sbin/mfschunkserver stop
另外一個chunk server配置步驟和這個同樣,詳細步驟再也不給出.

4.Client配置:

須要用到FUSE,官網地址:http://fuse.sourceforge.net/

Ubuntu要用到libfuse-dev

*下載安裝

*安裝FUSE

因爲個人Client是ubuntu9.10因此直接用apt-get install libfuse-dev 安裝的FUSE

非ubuntu系統能夠用源碼直接安裝步驟以下:

wget  http://cdnetworks-kr-2.dl.sourceforge.net/project/fuse/fuse-2.X/2.8.3/fuse-2.8.3.tar.gz

tar  –zxvf fuse-2.8.3.tar.gz
cd  fuse-2.8.3
./configure  –prefix=/usr/local/fuse
make
make install
*安裝MooseFS

wget  http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz

groupadd  mfs
useradd mfs -g mfstar -zxvf mfs-1.6.13.tar.gz
cd mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfsmaster --disable-mfschunkserver --enable-mfsmount --with-default-user=mfs --with-default-group=mfs
make & make install
*掛載MooseFS

mkdir –p /media/mfs
/usr/local/mfs/bin/mfsmount -H mfsmaster /media/mfs/
mfsmaster accepted connection with parameters:read-write,restricted_ip;root mapped to root:root
*查看掛載狀況:

四、掛接MFSMETA文件系統
(1)、建立掛接點 mkdir /mnt/mfsmeta
(2)、掛接MFSMETA
/usr/local/mfs-old/bin/mfsmount -m /mnt/mfsmeta/  -H 192.168.3.34

df –h | grep mfs
 
mfsmaster:9421  1.8T  139G  1.6T   9%  /media/mfs

5)、卸載已掛接的文件系統
利用Linux系統的umount命令就能夠了,例如:
[root@www ~]# umount /mnt/mfs


若是出現下列狀況:
[root@www ~]# umount /mnt/mfs
umount: /mnt/mfs: device is busy
umount: /mnt/mfs: device is busy
則說明客戶端本機有正在使用此文件系統,能夠查明是什麼命令正在使用,而後推出就能夠了,最好不要強制退出。

*在client端進行具體的操做指令

*指定文件副本的分數。

/usr/local/mfs/sbin/mfssetgoal –r 3 /media/mfs
注: (-r 表示遞歸)

*查看:

/usr/local/mfs/bin/mfsgetgoal /media/mfs/ubuntu-9.10-server-i386.iso
 
/media/mfs/ubuntu-9.10-server-i386.iso: 3
*指定文件刪除後回收的時間600秒(10分鐘)

/usr/local/mfs/bin/mfssettrashtime -r 600 /media/mfs
*更多的使用方法查看:

http://www.moosefs.com/reference-guide.html#operations-specific-for-moosefs

六.故障恢復測試

假設系統崩潰,我如今作的是刪除/usr/local/mfs而且從新啓動計算機,顯然如今整個文件系統是不可用的,從新安裝MooseFS,作和原先相同的配置

從Metadata backup server 複製metadata_ml.mfs.back和最後一個日誌文件我這裏是changelog_ml.30.mfs,到新安裝好的master機器上,執行恢復命令

/usr/local/mfs/sbin/mfsmetarestore -m metadata_ml.mfs.back -o metadata.mfs changelog_ml.30.mfsloading objects (files,directories,etc.) ... ok
 
loading names ... ok
 
loading deletion timestamps ... ok
 
checking filesystem consistency ... ok
 
loading chunks data ... ok
 
connecting files and chunks ... ok
 
applying changes from file: changelog_ml.30.mfs
 
meta data version: 4633
 
4765: version mismatch
接着執行以下命令:

/usr/local/mfs/sbin/mfsmetarestore  -a

file 'metadata.mfs.back' not found - will try 'metadata_ml.mfs.back' instead
 
loading objects (files,directories,etc.) ... ok
 
loading names ... ok
 
loading deletion timestamps ... ok
 
checking filesystem consistency ... ok
 
loading chunks data ... ok
 
connecting files and chunks ... ok
 
applying changes from file: /usr/local/mfs/var/mfs/changelog_ml.30.mfs
 
meta data version: 4633
 
4765: version mismatch
最後啓動master,並查看整個文件系統的恢復狀況


按如下步驟安全中止MooseFS集羣:

在全部機器上用umount命令卸載文件系統(在咱們的示例中是:umount /mnt/mfs)

中止chunk server進程: /usr/sbin/mfschunkserver stop
中止metalogger進程: /usr/sbin/mfsmetalogger stop
中止master server進程: /usr/sbin/mfsmaster stop


文件恢復
Removed files may be accessed through a separately mounted MFSMETA file system. In particular it contains directories /trash (containing information about deleted files that are still being stored) and /trash/undel (designed for retrieving files). Only the administrator has access to MFSMETA (user with uid 0, usually root).

 

$ mfssettrashtime 3600 /mnt/mfs-test/test1

/mnt/mfs-test/test1: 3600

$ rm /mnt/mfs-test/test1

$ ls /mnt/mfs-test/test1

ls: /mnt/mfs-test/test1: No such file or directory

 

The name of the file that is still visible in the "trash" directory consists of an 8-digit hexadecimal i-node number and a path to the file relative to the mounting point with characters / replaced with the | character. If such a name exceeds the limits of the operating system (usually 255 characters), the initial part of the path is deleted.

 

The full path of the file in relation to the mounting point can be read or saved by reading or saving this special file:

# ls -l /mnt/mfs-test-meta/trash/*test1

-rw-r--r-- 1 user users 1 2007-08-09 15:23 /mnt/mfs-test-meta/trash/00013BC7|test1

# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'

test1

# echo 'test/test2' > '/mnt/mfs-test-meta/trash/00013BC7|test1'

# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'

test/test2

 

Moving this file to the trash/undel subdirectory causes restoring of the original file in a proper MooseFS file system - at path set in a way described above or the original path (if it was not changed).

# mv /mnt/mfs-test-meta/trash/00013BC7|test1 /mnt/mfs-test-meta/trash/undel/

 

Note: if a new file with the same path already exists, restoring of the file will not succeed.

Similarly, you cannot move the file with different filename.

mfs集羣開機順序與關機
MOOSEFS MAINTENANCE
 
Starting MooseFS cluster
 
The safest way to start MooseFS (avoiding any read or write errors, inaccessible data or similar problems) is to run the following commands in this sequence:

start mfsmaster process
start all mfschunkserver processes
start mfsmetalogger processes (if configured)
when all chunkservers get connected to the MooseFS master, the filesystem can be mounted on any number of clients using mfsmount (you can check if all chunkservers are connected by checking master logs or CGI monitor).
 
Stopping MooseFS cluster
 
To safely stop MooseFS:

unmount MooseFS on all clients (using the umount command or an equivalent)
stop chunkserver processes with the mfschunkserver stop command
stop metalogger processes with the mfsmetalogger stop command
stop master process with the mfsmaster stop command.
 
Maintenance of MooseFS chunkservers
 
Provided that there are no files with a goal lower than 2 and no under-goal files (what can be checked by mfsgetgoal -r and mfsdirinfo commands), it is possible to stop or restart a single chunkserver at any time. When you need to stop or restart another chunkserver afterwards, be sure that the previous one is connected and there are no under-goal chunks.

 
MooseFS metadata backups
 
There are two general parts of metadata:

main metadata file (metadata.mfs, named metadata.mfs.back when the mfsmaster is running), synchronized each hour
metadata changelogs (changelog.*.mfs), stored for last N hours (configured by BACK_LOGS setting)
The main metadata file needs regular backups with the frequency depending on how many hourly changelogs are stored. Metadata changelogs should be automatically replicated in real time. Since MooseFS 1.6.5, both tasks are done by mfsmetalogger daemon.

 
MooseFS master recovery
 
In case of mfsmaster crash (due to e.g. host or power failure) last metadata changelog needs to be merged into the main metadata file. It can be done with the mfsmetarestore utility; the simplest way to use it is:

$ mfsmetarestore -a

If master data are stored in location other than the specified during MooseFS compilation, the actual path needs to be specified using the -d option, e.g.:

$ mfsmetarestore -a -d /storage/mfsmaster

 
MooseFS master recovery from a backup
 
In order to restore the master host from a backup:

install mfsmaster in normal way
configure it using the same settings (e.g. by retrieving mfsmaster.cfg file from the backup)
retrieve metadata.mfs.back file from the backup or metalogger host, place it in mfsmaster data directory
copy last metadata changelogs from any metalogger running just before master failure into mfsmaster data directory
merge metadata changelogs using mfsmetarestore command as specified before - either using mfsmetarestore -a, or by specifying actual file names using non-automatic mfsmetarestore syntax, e.g.
$ mfsmetarestore -m metadata.mfs.back -o metadata.mfs changelog.*.mfs

 

Please also read a mini howto about preparing a fail proof solution in case of outage of the master server. In that document we present a solution using CARP and in which metalogger takes over functionality of the broken master server.


html

相關文章
相關標籤/搜索