mongodb不愧是功能上都比較完備的NoSQL數據庫,其高可用方面作的明顯要好一些。mongodb
主從複製的設置比較簡單,關鍵是使用--master、--slave和--source參數,啓動主從服務的命令以下shell
XXXXX@XXXXX-asus:~$ sudo mongod --dbpath /var/lib/mongodb --logpath /var/log/mongodb/mongodb.log --port 10000 --master --rest --nojournal all output going to: /var/log/mongodb/mongodb.log log file [/var/log/mongodb/mongodb.log] exists; copied to temporary file [/var/log/mongodb/mongodb.log.2014-03-15T02-46-11] XXXXX@XXXXX-asus:~$ sudo mongod --dbpath /var/lib/mongodb --logpath /var/log/mongodb/mongodb.log --port 10000 --master --rest --nojournal --fork forked process: 5969 all output going to: /var/log/mongodb/mongodb.log log file [/var/log/mongodb/mongodb.log] exists; copied to temporary file [/var/log/mongodb/mongodb.log.2014-03-15T02-46-58] child process started successfully, parent exiting XXXXX@XXXXX-asus:~$ sudo mongod --dbpath /var/lib/mongodb_slave --logpath /var/log/mongodb/mongodb_slave.log --port 10001 --slave --source localhost:10000 --rest --nojournal --fork forked process: 5987 all output going to: /var/log/mongodb/mongodb_slave.log log file [/var/log/mongodb/mongodb_slave.log] exists; copied to temporary file [/var/log/mongodb/mongodb_slave.log.2014-03-15T02-47-12] child process started successfully, parent exiting XXXXX@XXXXX-asus:~$ ps -ef | grep mongod root 5969 1 0 10:46 ? 00:00:00 mongod --dbpath /var/lib/mongodb --logpath /var/log/mongodb/mongodb.log --port 10000 --master --rest --nojournal --fork root 5987 1 0 10:47 ? 00:00:00 mongod --dbpath /var/lib/mongodb_slave --logpath /var/log/mongodb/mongodb_slave.log --port 10001 --slave --source localhost:10000 --rest --nojournal --fork XXXXX 6050 2732 0 10:48 pts/0 00:00:00 grep --color=auto mongod
可見,master進程在啓動的時候使用了--master參數監聽了10000端口,隨後slave進程啓動,使用了--slave --source localhost:10000,從master複製,並監聽10001端口。作了主從的配置以後,在master上作的改動會馬上同步到slave上,以下所示數據庫
XXXXX@XXXXX-asus:~$ mongo localhost:10000 MongoDB shell version: 2.2.4 connecting to: localhost:10000/test > use test switched to db test > db.master_slave.insert({"abc":123}) > db.master_slave.find() { "_id" : ObjectId("5323c1594678819d9c4323a2"), "abc" : 123 } > exit bye XXXXX@XXXXX-asus:~$ mongo localhost:10001 MongoDB shell version: 2.2.4 connecting to: localhost:10001/test > use test switched to db test > db.master_slave.find() { "_id" : ObjectId("5323c1594678819d9c4323a2"), "abc" : 123 } > exit bye XXXXX@XXXXX-asus:~$ clear
先登錄master,並insert了一條數據,在slave上能夠馬上看到這個改動。vim
mongodb的副本集是想在primary數據庫啓動的時候,有多個副本(secondary)在向primary同步數據,此時只有primary可用,全部secondary都不可用。在primary異常離線以後,副本集中馬上選取出一個primary來代替原有的primary繼續工做。socket
配置副本集稍微麻煩一點,先要啓動多個mongodb的副本集進程,以下spa
XXXXX@XXXXX-asus:~$ sudo mongod --dbpath /var/lib/mongodb --logpath /var/log/mongodb/mongodb.log --port 10000 --replSet testrep --nojournal --fork forked process: 6862 all output going to: /var/log/mongodb/mongodb.log log file [/var/log/mongodb/mongodb.log] exists; copied to temporary file [/var/log/mongodb/mongodb.log.2014-03-15T03-07-53] child process started successfully, parent exiting XXXXX@XXXXX-asus:~$ sudo mongod --dbpath /var/lib/mongodb1 --logpath /var/log/mongodb/mongodb1.log --port 10001 --replSet testrep --nojournal --fork forked process: 6915 all output going to: /var/log/mongodb/mongodb1.log child process started successfully, parent exiting XXXXX@XXXXX-asus:~$ sudo mongod --dbpath /var/lib/mongodb2 --logpath /var/log/mongodb/mongodb2.log --port 10002 --replSet testrep --nojournal --fork forked process: 6964 all output going to: /var/log/mongodb/mongodb2.log child process started successfully, parent exiting XXXXX@XXXXX-asus:~$
在上述步驟中,啓動了3個mongodb進程,分別監聽10000,10001,10002端口,組成副本集testrep,這裏關鍵是要使用--replSet參數。3d
可是這樣仍然不算已經完成配置,須要登錄mongodb初始化副本集才能夠,以下rest
XXXXX@XXXXX-asus:~$ mongo localhost:10000 MongoDB shell version: 2.2.4 connecting to: localhost:10000/test > rs.initiate({"_id":"testrep","members":[ ... {"_id":1, "host":"localhost:10000"}, ... {"_id":2, "host":"localhost:10001"}, ... {"_id":3, "host":"localhost:10002"} ... ]}) { "info" : "Config now saved locally. Should come online in about a minute.", "ok" : 1 } >
用rs.initiate初始化副本集,其中的參數要和啓動mongodb時候的參數相符。初始化須要一點時間,完成以後,監聽10000端口的mongodb變爲primary,另外2個是secondary,能夠看到log裏面的狀態變化,primary的log以下code
Sat Mar 15 11:15:43 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG) Sat Mar 15 11:15:48 [initandlisten] connection accepted from 127.0.0.1:59140 #3 (1 connection now open) Sat Mar 15 11:15:52 [conn3] replSet replSetInitiate admin command received from client Sat Mar 15 11:15:52 [conn3] replSet replSetInitiate config object parses ok, 3 members specified Sat Mar 15 11:15:52 [conn3] replSet replSetInitiate all members seem up Sat Mar 15 11:15:52 [conn3] ****** Sat Mar 15 11:15:52 [conn3] creating replication oplog of size: 1810MB... Sat Mar 15 11:15:52 [FileAllocator] allocating new datafile /var/lib/mongodb/local.1, filling with zeroes... Sat Mar 15 11:15:52 [FileAllocator] creating directory /var/lib/mongodb/_tmp Sat Mar 15 11:16:37 [FileAllocator] done allocating datafile /var/lib/mongodb/local.1, size: 2047MB, took 44.99 secs Sat Mar 15 11:16:41 [conn3] ****** Sat Mar 15 11:16:41 [conn3] replSet info saving a newer config version to local.system.replset Sat Mar 15 11:16:41 [conn3] replSet saveConfigLocally done Sat Mar 15 11:16:41 [conn3] replSet replSetInitiate config now saved locally. Should come online in about a minute. Sat Mar 15 11:16:41 [conn3] command admin.$cmd command: { replSetInitiate: { _id: "testrep", members: [ { _id: 1.0, host: "localhost:10000" }, { _id: 2.0, host: "localhost:10001" }, { _id: 3.0, host: "localhost:10002" } ] } } ntoreturn:1 keyUpdates:0 locks(micros) W:49581556 reslen:112 49587ms Sat Mar 15 11:16:41 [rsStart] replSet I am localhost:10000 Sat Mar 15 11:16:41 [rsStart] replSet STARTUP2 Sat Mar 15 11:16:41 [rsHealthPoll] replSet member localhost:10001 is up Sat Mar 15 11:16:42 [rsSync] replSet SECONDARY Sat Mar 15 11:16:43 [rsHealthPoll] replSet member localhost:10002 is up Sat Mar 15 11:16:43 [rsMgr] replSet info electSelf 1 Sat Mar 15 11:16:43 [rsMgr] replSet couldn't elect self, only received 1 votes Sat Mar 15 11:16:45 [initandlisten] connection accepted from 127.0.0.1:59173 #4 (2 connections now open) Sat Mar 15 11:16:47 [rsHealthPoll] replSet member localhost:10002 is now in state STARTUP2 Sat Mar 15 11:16:47 [rsMgr] not electing self, localhost:10002 would veto Sat Mar 15 11:16:49 [initandlisten] connection accepted from 127.0.0.1:59177 #5 (3 connections now open) Sat Mar 15 11:17:01 [conn4] end connection 127.0.0.1:59173 (2 connections now open) Sat Mar 15 11:17:01 [initandlisten] connection accepted from 127.0.0.1:59185 #6 (3 connections now open) Sat Mar 15 11:17:01 [rsHealthPoll] DBClientCursor::init call() failed Sat Mar 15 11:17:02 [rsHealthPoll] replSet info localhost:10001 is down (or slow to respond): DBClientBase::findN: transport error: localhost:10001 ns: admin.$cmd query: { replSetHeartbeat: "testrep", v: 1, pv: 1, checkEmpty: false, from: "localhost:10000", $auth: {} } Sat Mar 15 11:17:02 [rsHealthPoll] replSet member localhost:10001 is now in state DOWN
secondary中的log以下server
Sat Mar 15 11:16:39 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG) Sat Mar 15 11:16:47 [initandlisten] connection accepted from 127.0.0.1:59521 #2 (2 connections now open) Sat Mar 15 11:16:49 [rsStart] trying to contact localhost:10000 Sat Mar 15 11:16:52 [rsStart] trying to contact localhost:10002 Sat Mar 15 11:17:02 [initandlisten] connection accepted from 127.0.0.1:59532 #3 (3 connections now open) Sat Mar 15 11:17:03 [rsStart] DBClientCursor::init call() failed Sat Mar 15 11:17:03 [conn2] command admin.$cmd command: { replSetHeartbeat: "testrep", v: 1, pv: 1, checkEmpty: false, from: "localhost:10002", $auth: {} } ntoreturn:1 keyUpdates:0 reslen:72 11356ms Sat Mar 15 11:17:03 [conn1] command admin.$cmd command: { replSetHeartbeat: "testrep", v: 1, pv: 1, checkEmpty: false, from: "localhost:10000", $auth: {} } ntoreturn:1 keyUpdates:0 reslen:72 11177ms Sat Mar 15 11:17:03 [conn1] end connection 127.0.0.1:59489 (2 connections now open) Sat Mar 15 11:17:03 [rsStart] replSet I am localhost:10001 Sat Mar 15 11:17:03 [rsStart] replSet got config version 1 from a remote, saving locally Sat Mar 15 11:17:03 [rsStart] replSet info saving a newer config version to local.system.replset Sat Mar 15 11:17:03 [FileAllocator] allocating new datafile /var/lib/mongodb1/local.ns, filling with zeroes... Sat Mar 15 11:17:03 [FileAllocator] creating directory /var/lib/mongodb1/_tmp Sat Mar 15 11:17:04 [conn2] end connection 127.0.0.1:59521 (1 connection now open) Sat Mar 15 11:17:04 [initandlisten] connection accepted from 127.0.0.1:59535 #4 (2 connections now open) Sat Mar 15 11:17:06 [conn3] end connection 127.0.0.1:59532 (1 connection now open) Sat Mar 15 11:17:06 [initandlisten] connection accepted from 127.0.0.1:59537 #5 (3 connections now open) Sat Mar 15 11:17:07 [FileAllocator] done allocating datafile /var/lib/mongodb1/local.ns, size: 16MB, took 1.438 secs Sat Mar 15 11:17:07 [FileAllocator] allocating new datafile /var/lib/mongodb1/local.0, filling with zeroes... Sat Mar 15 11:17:13 [FileAllocator] done allocating datafile /var/lib/mongodb1/local.0, size: 64MB, took 5.174 secs Sat Mar 15 11:17:13 [FileAllocator] allocating new datafile /var/lib/mongodb1/local.1, filling with zeroes... Sat Mar 15 11:17:15 [rsStart] replSet saveConfigLocally done Sat Mar 15 11:17:16 [rsStart] replSet STARTUP2 Sat Mar 15 11:17:16 [rsSync] ****** Sat Mar 15 11:17:16 [rsSync] creating replication oplog of size: 1631MB... Sat Mar 15 11:17:17 [rsHealthPoll] replSet member localhost:10000 is up Sat Mar 15 11:17:17 [rsHealthPoll] replSet member localhost:10000 is now in state SECONDARY Sat Mar 15 11:17:17 [rsHealthPoll] replSet member localhost:10002 is up Sat Mar 15 11:17:17 [rsHealthPoll] replSet member localhost:10002 is now in state STARTUP2 Sat Mar 15 11:17:21 [FileAllocator] done allocating datafile /var/lib/mongodb1/local.1, size: 128MB, took 7.663 secs Sat Mar 15 11:17:21 [FileAllocator] allocating new datafile /var/lib/mongodb1/local.2, filling with zeroes... Sat Mar 15 11:17:26 [conn4] end connection 127.0.0.1:59535 (1 connection now open)
也能夠直接登入primary去查看狀態,用rs.status()
XXXXX@XXXXX-asus:~$ mongo localhost:10000 MongoDB shell version: 2.2.4 connecting to: localhost:10000/test testrep:PRIMARY> rs.status() { "set" : "testrep", "date" : ISODate("2014-03-15T05:18:36Z"), "myState" : 1, "members" : [ { "_id" : 1, "name" : "localhost:10000", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 7843, "optime" : Timestamp(1394853401000, 1), "optimeDate" : ISODate("2014-03-15T03:16:41Z"), "self" : true }, { "_id" : 2, "name" : "localhost:10001", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 7292, "optime" : Timestamp(1394853401000, 1), "optimeDate" : ISODate("2014-03-15T03:16:41Z"), "lastHeartbeat" : ISODate("2014-03-15T05:18:35Z"), "pingMs" : 0 }, { "_id" : 3, "name" : "localhost:10002", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 7313, "optime" : Timestamp(1394853401000, 1), "optimeDate" : ISODate("2014-03-15T03:16:41Z"), "lastHeartbeat" : ISODate("2014-03-15T05:18:34Z"), "pingMs" : 0 } ], "ok" : 1 } testrep:PRIMARY>
能夠看到,再次登入primary的時候,提示符中的信息已經變爲testrep:PRIMARY>,表示當前登錄的是副本集的primary節點。
若是此時對primary作讀寫操做,都是能夠的,可是對secondary都不能讀寫,以下
XXXXX@XXXXX-asus:~$ mongo localhost:10000 MongoDB shell version: 2.2.4 connecting to: localhost:10000/test testrep:PRIMARY> use test switched to db test testrep:PRIMARY> db.testrep.insert({"abcd":1234}) testrep:PRIMARY> db.testrep.find() { "_id" : ObjectId("5323e46faeb9bd3d8d02e2b6"), "abcd" : 1234 } testrep:PRIMARY> exit bye XXXXX@XXXXX-asus:~$ mongo localhost:10001 MongoDB shell version: 2.2.4 connecting to: localhost:10001/test testrep:SECONDARY> use test switched to db test testrep:SECONDARY> db.testrep.find() error: { "$err" : "not master and slaveOk=false", "code" : 13435 } testrep:SECONDARY> db.testrep.insert({"efgh":5678}) not master testrep:SECONDARY> exit bye XXXXX@XXXXX-asus:~$
若是此時primary離線,2個secondary中會選舉出一個成爲新的primary,用kill -9 來模擬這個primary離線這個動做
XXXXX@XXXXX-asus:~$ ps -ef | grep mongo root 6862 1 1 11:07 ? 00:01:30 mongod --dbpath /var/lib/mongodb --logpath /var/log/mongodb/mongodb.log --port 10000 --replSet testrep --nojournal --fork root 6915 1 0 11:08 ? 00:01:26 mongod --dbpath /var/lib/mongodb1 --logpath /var/log/mongodb/mongodb1.log --port 10001 --replSet testrep --nojournal --fork root 6964 1 1 11:08 ? 00:01:27 mongod --dbpath /var/lib/mongodb2 --logpath /var/log/mongodb/mongodb2.log --port 10002 --replSet testrep --nojournal --fork XXXXX 13047 5380 0 13:33 pts/2 00:00:00 grep --color=auto mongo XXXXX@XXXXX-asus:~$ sudo kill -9 6862 XXXXX@XXXXX-asus:~$ ps -ef | grep mongo root 6915 1 0 11:08 ? 00:01:26 mongod --dbpath /var/lib/mongodb1 --logpath /var/log/mongodb/mongodb1.log --port 10001 --replSet testrep --nojournal --fork root 6964 1 1 11:08 ? 00:01:27 mongod --dbpath /var/lib/mongodb2 --logpath /var/log/mongodb/mongodb2.log --port 10002 --replSet testrep --nojournal --fork XXXXX 13058 5380 0 13:34 pts/2 00:00:00 grep --color=auto mongo XXXXX@XXXXX-asus:~$
此時secondary的log會顯示其已經代替原先的primary成爲新的primary,新的primary是原先監聽10001端口的mongodb進程
Sat Mar 15 13:33:59 [initandlisten] connection accepted from 127.0.0.1:35927 #556 (2 connections now open) Sat Mar 15 13:34:00 [conn555] end connection 127.0.0.1:35925 (1 connection now open) Sat Mar 15 13:34:00 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: localhost:10000 Sat Mar 15 13:34:01 [rsHealthPoll] DBClientCursor::init call() failed Sat Mar 15 13:34:01 [rsHealthPoll] replSet info localhost:10000 is down (or slow to respond): DBClientBase::findN: transport error: localhost:10000 ns: admin.$cmd query: { replSetHeartbeat: "testrep", v: 1, pv: 1, checkEmpty: false, from: "localhost:10001", $auth: {} } Sat Mar 15 13:34:01 [rsHealthPoll] replSet member localhost:10000 is now in state DOWN Sat Mar 15 13:34:02 [rsMgr] replSet info electSelf 2 Sat Mar 15 13:34:02 [rsMgr] replSet PRIMARY Sat Mar 15 13:34:03 [rsHealthPoll] couldn't connect to localhost:10000: couldn't connect to server localhost:10000 Sat Mar 15 13:34:03 [rsHealthPoll] couldn't connect to localhost:10000: couldn't connect to server localhost:10000 Sat Mar 15 13:34:05 [rsHealthPoll] couldn't connect to localhost:10000: couldn't connect to server localhost:10000 Sat Mar 15 13:34:07 [rsHealthPoll] couldn't connect to localhost:10000: couldn't connect to server localhost:10000 Sat Mar 15 13:34:09 [rsHealthPoll] couldn't connect to localhost:10000: couldn't connect to server localhost:10000 Sat Mar 15 13:34:10 [initandlisten] connection accepted from 127.0.0.1:35945 #557 (2 connections now open) Sat Mar 15 13:34:11 [rsHealthPoll] couldn't connect to localhost:10000: couldn't connect to server localhost:10000 Sat Mar 15 13:34:13 [rsHealthPoll] couldn't connect to localhost:10000: couldn't connect to server localhost:10000
也能夠直接登錄新的primary數據庫查看副本集的狀態
XXXXX@XXXXX-asus:~$ mongo localhost:10001 MongoDB shell version: 2.2.4 connecting to: localhost:10001/test testrep:PRIMARY> rs.status() { "set" : "testrep", "date" : ISODate("2014-03-15T05:39:24Z"), "myState" : 1, "members" : [ { "_id" : 1, "name" : "localhost:10000", "health" : 0, "state" : 8, "stateStr" : "(not reachable/healthy)", "uptime" : 0, "optime" : Timestamp(1394861167000, 1), "optimeDate" : ISODate("2014-03-15T05:26:07Z"), "lastHeartbeat" : ISODate("2014-03-15T05:33:59Z"), "pingMs" : 0, "errmsg" : "socket exception [CONNECT_ERROR] for localhost:10000" }, { "_id" : 2, "name" : "localhost:10001", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 9065, "optime" : Timestamp(1394861167000, 1), "optimeDate" : ISODate("2014-03-15T05:26:07Z"), "self" : true }, { "_id" : 3, "name" : "localhost:10002", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 8527, "optime" : Timestamp(1394861167000, 1), "optimeDate" : ISODate("2014-03-15T05:26:07Z"), "lastHeartbeat" : ISODate("2014-03-15T05:39:23Z"), "pingMs" : 0 } ], "ok" : 1 } testrep:PRIMARY>
能夠看到,原先監聽10000端口的副本狀態是「(not reachable/healthy)」,此時新的primary數據庫能夠作查詢新增動做
XXXXX@XXXXX-asus:~$ mongo localhost:10001 MongoDB shell version: 2.2.4 connecting to: localhost:10001/test testrep:PRIMARY> use test switched to db test testrep:PRIMARY> db.testrep.find() { "_id" : ObjectId("5323e46faeb9bd3d8d02e2b6"), "abcd" : 1234 } testrep:PRIMARY> db.testrep.insert({"efgh":5678}) testrep:PRIMARY> db.testrep.find() { "_id" : ObjectId("5323e46faeb9bd3d8d02e2b6"), "abcd" : 1234 } { "_id" : ObjectId("5323e8d10a61cc7b258aac5a"), "efgh" : 5678 } testrep:PRIMARY>
此時若是再次啓動原先監聽10000端口的進程,多半沒法啓動,由於數據庫異常推出,鎖沒有釋放,此時須要先對數據庫作repair而後再啓動
XXXXX@XXXXX-asus:~$ sudo mongod --dbpath /var/lib/mongodb --logpath /var/log/mongodb/mongodb.log --port 10000 --replSet testrep --nojournal --repair --fork forked process: 16902 all output going to: /var/log/mongodb/mongodb.log log file [/var/log/mongodb/mongodb.log] exists; copied to temporary file [/var/log/mongodb/mongodb.log.2014-03-15T06-00-10] child process started successfully, parent exiting XXXXX@XXXXX-asus:~$ sudo mongod --dbpath /var/lib/mongodb --logpath /var/log/mongodb/mongodb.log --port 10000 --replSet testrep --nojournal --fork forked process: 17059 all output going to: /var/log/mongodb/mongodb.log log file [/var/log/mongodb/mongodb.log] exists; copied to temporary file [/var/log/mongodb/mongodb.log.2014-03-15T06-02-04] child process started successfully, parent exiting XXXXX@XXXXX-asus:~$ ps -ef | grep mongo root 6915 1 0 11:08 ? 00:01:44 mongod --dbpath /var/lib/mongodb1 --logpath /var/log/mongodb/mongodb1.log --port 10001 --replSet testrep --nojournal --fork root 6964 1 1 11:08 ? 00:01:44 mongod --dbpath /var/lib/mongodb2 --logpath /var/log/mongodb/mongodb2.log --port 10002 --replSet testrep --nojournal --fork XXXXX 13123 5712 0 13:34 pts/3 00:00:00 vim /var/log/mongodb/mongodb1.log XXXXX 15728 5086 0 13:51 pts/1 00:00:00 mongo localhost:10001 root 17059 1 3 14:02 ? 00:00:00 mongod --dbpath /var/lib/mongodb --logpath /var/log/mongodb/mongodb.log --port 10000 --replSet testrep --nojournal --fork XXXXX 17131 5380 0 14:02 pts/2 00:00:00 grep --color=auto mongo XXXXX@XXXXX-asus:~$
此時,再次使用rs.status()就能夠看到和最初狀態幾乎相同的輸出了,惟一差異是10000端口的進程變爲secondary,而10001端口的進程變爲primary。
另外,mongodb官方已經開始推薦使用副本集而非主從複製來創建高可用,副本集經常使用的命令還有,rs.add("localhost:10000"),rs.remove("localhost:10000")分別用來增長或者刪除副本集中的一個節點,在之後的文章中描述。