High Availability and PyMongo
高可用性和PyMongo
************************************
PyMongo makes it easy to write highly available applications whether you use a single replica set or a large sharded cluster.
不論你使用一個簡單的副本集仍是一個大型的分片集羣,Pymongo都讓你能輕鬆的寫出高可用性的應用程序.
Connecting to a Replica Set
鏈接到一個副本集
============================
PyMongo makes working with replica sets easy. Here we’ll launch a new replica set and show how to handle both initialization and normal connections with PyMongo.
用PyMongo鏈接副本集很容易.咱們將啓動一個新的副本集來展現如何用Pymongo初始化和鏈接它.
Note
Replica sets require server version >= 1.6.0. Support for connecting to replica sets also requires PyMongo version >= 1.8.0.
副本集要求服務器版本不低於1.6.0. 要鏈接到副本集,要求PyMongo版本不低於 1.8.0.
See general MongoDB documentation rs ( http://dochub.mongodb.org/core/rs )
Starting a Replica Set
啓動一個副本集
============================
The main replica set documentation contains extensive information about setting up a new replica set or migrating an existing MongoDB setup, be sure to check that out. Here, we’ll just do the bare minimum to get a three node replica set setup locally.
副本集的主文檔包含豐富的關於如何設置一個新的副本集或者從已經存在的mongo改裝安裝的信息,必定要看一下那個文檔.
這裏,咱們只作最基本的,在本地創建一個3節點的副本集.
Warning
Replica sets should always use multiple nodes in production - putting all set members on the same physical node is only recommended for testing and development.
生產環境中,副本集應用老是使用多個節點 - 將全部副本集成員放到一個物理節點上的行爲,建議只在測試和開發環境中存在.
We start three mongod processes, each on a different port and with a different dbpath, but all using the same replica set name 「foo」. In the example we use the hostname 「morton.local」, so replace that with your hostname when running:
咱們起了3個mongod進程,分別使用不一樣的端口,不一樣的db路徑,它們使用同一個副本集名稱"foo". 在示例中咱們使用的hostname爲"morton.local", 本身實驗時別忘了改爲你本身的hostname.
$ hostname
morton.local
$ mongod --replSet foo/morton.local:27018,morton.local:27019 --rest
$ mongod --port 27018 --dbpath /data/db1 --replSet foo/morton.local:27017 --rest
$ mongod --port 27019 --dbpath /data/db2 --replSet foo/morton.local:27017 --rest
Initializing the Set
初始化集合
============================
At this point all of our nodes are up and running, but the set has yet to be initialized. Until the set is initialized no node will become the primary, and things are essentially 「offline」.
如今全部的節點都起來了, 可是集合還須要初始化.初始化以前,集合中將沒有主節點,本質上至關於offline.
To initialize the set we need to connect to a single node and run the initiate command. Since we don’t have a primary yet, we’ll need to tell PyMongo that it’s okay to connect to a slave/secondary:
咱們須要鏈接到一個節點而且運行初始化命令來初始化副本集.因爲咱們如今尚未主節點,咱們須要告訴PyMongo鏈接到一個slave/secondary節點也無妨:
>>> from pymongo import MongoClient, ReadPreference
>>> c = MongoClient("morton.local:27017",
read_preference=ReadPreference.SECONDARY)
Note
We could have connected to any of the other nodes instead, but only the node we initiate from is allowed to contain any initial data.
咱們能夠鏈接任何一個節點去作集合的初始化,可是隻有咱們連的這臺機器才能包含初始化數據.(?)
After connecting, we run the initiate command to get things started (here we just use an implicit configuration, for more advanced configuration options see the replica set documentation):
連上一臺db server以後,咱們運行初始化命令來使集合運行起來(咱們這裏只用了一個顯式的配置,更多高級的配置選項,參見 副本集 的文檔):
>>> c.admin.command("replSetInitiate")
{u'info': u'Config now saved locally. Should come online in about a minute.',
u'info2': u'no configuration explicitly specified -- making one', u'ok': 1.0}
The three mongod servers we started earlier will now coordinate and come online as a replica set.
咱們以前啓動的三臺mongod server如今將一塊兒合做而且做爲一個副本集而online了.
Connecting to a Replica Set
鏈接到副本集
============================
The initial connection as made above is a special case for an uninitialized replica set. Normally we’ll want to connect differently. A connection to a replica set can be made using the normal MongoClient() constructor, specifying one or more members of the set. For example, any of the following will create a connection to the set we just created:
前面的初始化鏈接是一種專門用來鏈接未初始化的副本集的狀況. 一般狀況下,咱們不這麼作(譯者注: 由於一般咱們不須要本身在程序裏初始化副本集).
能夠用一個普通的MongoClient()構造器經過制定一個或多個集合成員來鏈接到副本集. 例如,以下的方式都能鏈接到咱們剛剛建立的副本集:
(這些方法能夠鏈接未初始化的副本集嗎? 應該不行. ??)
>>> MongoClient("morton.local", replicaset='foo')
MongoClient([u'morton.local:27019', 'morton.local:27017', u'morton.local:27018'])
>>> MongoClient("morton.local:27018", replicaset='foo')
MongoClient([u'morton.local:27019', u'morton.local:27017', 'morton.local:27018'])
>>> MongoClient("morton.local", 27019, replicaset='foo')
MongoClient(['morton.local:27019', u'morton.local:27017', u'morton.local:27018'])
>>> MongoClient(["morton.local:27018", "morton.local:27019"])
MongoClient(['morton.local:27019', u'morton.local:27017', 'morton.local:27018'])
>>> MongoClient("mongodb://morton.local:27017,morton.local:27018,morton.local:27019")
MongoClient(['morton.local:27019', 'morton.local:27017', 'morton.local:27018'])
The nodes passed to MongoClient() are called the seeds. If only one host is specified the replicaset parameter must be used to indicate this isn’t a connection to a single node. As long as at least one of the seeds is online, the driver will be able to 「discover」 all of the nodes in the set and make a connection to the current primary.
傳遞給MongoClient()的節點被成爲種子.若是隻指定了一個host,那麼必須使用'replicaset'參數來指明不是要鏈接到一個單獨節點.
種子中要至少有一臺在線, driver才能"發現"副本集中全部的節點而且鏈接到當前的主節點.
Handling Failover
處理 failover
============================
When a failover occurs, PyMongo will automatically attempt to find the new primary node and perform subsequent operations on that node. This can’t happen completely transparently, however. Here we’ll perform an example failover to illustrate how everything behaves. First, we’ll connect to the replica set and perform a couple of basic operations:
當failover發生時, Pymongo會自動嘗試發現新的主節點而且在新的主節點上進行後續操做. 然而,這個過程並非徹底透明的. 咱們將用一個示例failover來演示會發生什麼事情.
首先,咱們鏈接到副本集而且作一些基本操做:
>>> db = MongoClient("morton.local", replicaSet='foo').test
>>> db.test.save({"x": 1})
ObjectId('...')
>>> db.test.find_one()
{u'x': 1, u'_id': ObjectId('...')}
By checking the host and port, we can see that we’re connected to morton.local:27017, which is the current primary:
經過檢查 host和port,咱們能夠看出咱們當前鏈接到 morton.local:27017, 也就是當前的主節點:
>>> db.connection.host
'morton.local'
>>> db.connection.port
27017
Now let’s bring down that node and see what happens when we run our query again:
如今咱們把這個節點放倒來看看咱們再次運行查詢時會發生什麼:
>>> db.test.find_one()
Traceback (most recent call last):
pymongo.errors.AutoReconnect: ...
We get an AutoReconnect exception. This means that the driver was not able to connect to the old primary (which makes sense, as we killed the server), but that it will attempt to automatically reconnect on subsequent operations. When this exception is raised our application code needs to decide whether to retry the operation or to simply continue, accepting the fact that the operation might have failed.
咱們獲得一個 AutoReconnect 異常.這意味着驅動鏈接不到老的主節點(這就對了,咱們剛剛殺掉了這個server), 可是驅動會嘗試自動重連.
當這個異常被拋出時,咱們的應用程序須要決定是重試操做仍是直接繼續,接受剛纔這個操做可能失敗了的事實.
On subsequent attempts to run the query we might continue to see this exception. Eventually, however, the replica set will failover and elect a new primary (this should take a couple of seconds in general). At that point the driver will connect to the new primary and the operation will succeed:
後面再次嘗試這個查詢時,咱們仍是有可能看到這個異常. 不過,最終,副本集會從新選出一個主節點(這個過程一般須要幾秒鐘). 到時候,驅動會鏈接到這個新的主節點,操做就會成功了.
>>> db.test.find_one()
{u'x': 1, u'_id': ObjectId('...')}
>>> db.connection.host
'morton.local'
>>> db.connection.port
27018
MongoReplicaSetClient
MongoReplicaSetClient
============================
Using a MongoReplicaSetClient instead of a simple MongoClient offers two key features: secondary reads and replica set health monitoring. To connect using MongoReplicaSetClient just provide a host:port pair and the name of the replica set:
使用MongoReplicaSetClient替代MongoClient提供兩個關鍵的特性: 讀從庫和副本集健康監控. 用MongoReplicaSetClient鏈接副本集只須要提供一個 host:port對和副本集名稱便可:
>>> from pymongo import MongoReplicaSetClient
>>> MongoReplicaSetClient("morton.local:27017", replicaSet='foo')
MongoReplicaSetClient([u'morton.local:27019', u'morton.local:27017', u'morton.local:27018'])
Secondary Reads
讀從庫
------------------
By default an instance of MongoReplicaSetClient will only send queries to the primary member of the replica set. To use secondaries for queries we have to change the ReadPreference:
默認狀況下,MongoReplicaSetClient的實例只會將查詢發送到副本集的主節點. 爲了使用讀從庫的功能咱們須要修改ReadPreference.
>>> db = MongoReplicaSetClient("morton.local:27017", replicaSet='foo').test
>>> from pymongo.read_preferences import ReadPreference
>>> db.read_preference = ReadPreference.SECONDARY_PREFERRED
Now all queries will be sent to the secondary members of the set. If there are no secondary members the primary will be used as a fallback. If you have queries you would prefer to never send to the primary you can specify that using the SECONDARY read preference:
並不是全部的查詢都會被髮送到副本集的從庫. 若是沒有從庫,則查詢會回溯到主節點. 若是你有些查詢不但願發到主節點,你能夠指定它使用 SECONDARY 讀:
>>> db.read_preference = ReadPreference.SECONDARY
Read preference can be set on a client, database, collection, or on a per-query basis, e.g.:
讀偏好 能夠在client,database,collection或者單個查詢爲基礎設定,例如:
>>> db.collection.find_one(read_preference=ReadPreference.PRIMARY)
Reads are configured using three options: read_preference, tag_sets, and secondary_acceptable_latency_ms.
有三個選項能夠配置讀操做: read_preference, tag_sets 和 secondary_acceptable_latency_ms.
read_preference:
- - - - - - - - -
* PRIMARY:
Read from the primary. This is the default, and provides the strongest consistency. If no primary is available, raise AutoReconnect.
從主節點讀. 這是默認行爲, 並且提供了最強的一致性保障. 若是主節點不可用, 拋出 AutoReconnect 異常.
* PRIMARY_PREFERRED:
Read from the primary if available, or if there is none, read from a secondary matching your choice of tag_sets and secondary_acceptable_latency_ms.
若是主節點可用則讀主節點, 若是不可用, 讀第二個符合你的 tag_sets 和 secondary_acceptable_latency_ms 選擇的節點.
* SECONDARY:
Read from a secondary matching your choice of tag_sets and secondary_acceptable_latency_ms. If no matching secondary is available, raise AutoReconnect.
讀第二個符合你的 tag_sets 和 secondary_acceptable_latency_ms 選擇的節點. 若是不存在這樣的節點, 拋出 AutoReconnect 異常.
* SECONDARY_PREFERRED:
Read from a secondary matching your choice of tag_sets and secondary_acceptable_latency_ms if available, otherwise from primary (regardless of the primary’s tags and latency).
讀第二個符合你的 tag_sets 和 secondary_acceptable_latency_ms 選擇的節點. 若是不存在這樣的節點, 讀主節點(忽略主節點的tags和latency).
* NEAREST:
Read from any member matching your choice of tag_sets and secondary_acceptable_latency_ms.
從任意一個符合你 tag_sets 和 secondary_acceptable_latency_ms 選擇的節點.
tag_sets:
- - - - - -
Replica-set members can be tagged according to any criteria you choose. By default, MongoReplicaSetClient ignores tags when choosing a member to read from, but it can be configured with the tag_sets parameter. tag_sets must be a list of dictionaries, each dict providing tag values that the replica set member must match. MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member. For example, to prefer reads from the New York data center, but fall back to the San Francisco data center, tag your replica set members according to their location and create a MongoReplicaSetClient like so:
副本集成員能夠根據你選擇的任何標準來打tag. 默認狀況下, MongoReplicaSetClient 選擇讀節點時忽略tags, 可是這個行爲能夠經過tag_sets參數配置.
tag_sets 必須是一個字典的列表,每個字典提供副本集成員須要知足的tag 值. MongoReplicaSetClient 順序嘗試每個tag集合,直到發現有至少一個匹配成員的tag集合.
例如, 要優先從New York數據中心讀數據, 其次從 San Francisco數據中心讀, 能夠給你的副本集按照位置打tag,而且建立一個這樣的 MongoReplicaSetClient:
>>> rsc = MongoReplicaSetClient(
... "morton.local:27017",
... replicaSet='foo'
... read_preference=ReadPreference.SECONDARY,
... tag_sets=[{'dc': 'ny'}, {'dc': 'sf'}]
... )
MongoReplicaSetClient tries to find secondaries in New York, then San Francisco, and raises AutoReconnect if none are available. As an additional fallback, specify a final, empty tag set, {}, which means 「read from any member that matches the mode, ignoring tags.」
MongoReplicaSetClient 嘗試從NewYork尋找 secondaries, 而後嘗試從 San Francisco找, 若是一個匹配都沒有則拋出 AutoReconnect 異常.
做爲一個附加的跌落方案, 指定一個最終的,空的tag集合, {}, 這意味着"從任何一個匹配mode的成員讀數據,忽略tags."
secondary_acceptable_latency_ms:
- - - - - - - - - - - - - - - - -
If multiple members match the mode and tag sets, MongoReplicaSetClient reads from among the nearest members, chosen according to ping time. By default, only members whose ping times are within 15 milliseconds of the nearest are used for queries. You can choose to distribute reads among members with higher latencies by setting secondary_acceptable_latency_ms to a larger number. In that case, MongoReplicaSetClient distributes reads among matching members within secondary_acceptable_latency_ms of the closest member’s ping time.
若是多個成員匹配mode 和 tag集合, MongoReplicaSetClient將從最近的成員那裏讀數據, 以ping耗時排列遠近. 默認狀況下,只有ping延時比最近節點慢15毫秒之內的節點纔會被查詢.
你能夠經過將 secondary_acceptable_latency_ms 設置爲一個大一點的數字來選擇延遲高一些成員進行查詢.
這種狀況下, MongoReplicaSetClient 將查詢分發到延遲符合條件的成員中.
Note
secondary_acceptable_latency_ms is ignored when talking to a replica set through a mongos. The equivalent is the localThreshold command line option.
(??)
Health Monitoring
健康監控
------------------------
When MongoReplicaSetClient is initialized it launches a background task to monitor the replica set for changes in:
MongoReplicaSetClient初始化以後, 將啓動一個後臺進程來監控副本集的以下變化:
* Health: detect when a member goes down or comes up, or if a different member becomes primary
健康: 檢測成員的下線和上線, 或者主節點變動
* Configuration: detect changes in tags
配置: 檢測tags 的變動
* Latency: track a moving average of each member’s ping time
延遲: 跟蹤每一個成員的平均ping耗時
Replica-set monitoring ensures queries are continually routed to the proper members as the state of the replica set changes.
副本集監控能確保副本集狀態發生變動時,查詢被持續的路由到合適的成員.
It is critical to call close() to terminate the monitoring task before your process exits.
程序結束前,調用 close()方法結束監控任務 是很重要的.
High Availability and mongos
高可用性和 mongos
============================
An instance of MongoClient can be configured to automatically connect to a different mongos if the instance it is currently connected to fails. If a failure occurs, PyMongo will attempt to find the nearest mongos to perform subsequent operations. As with a replica set this can’t happen completely transparently, Here we’ll perform an example failover to illustrate how everything behaves. First, we’ll connect to a sharded cluster, using a seed list, and perform a couple of basic operations:
MongoClient的實例能夠配置成當前鏈接失敗時自動鏈接到另外一個mongos. 當失敗發生時,PyMongo會嘗試找出最近的mongos來進行後續的操做.
需iyu副本集來講,這不會是徹底透明的,咱們來人造一個failover演示一下事情會怎樣.首先,咱們鏈接到一個分片的集羣,使用一個種子列表, 而後執行一些基本操做:
>>> db = MongoClient('morton.local:30000,morton.local:30001,morton.local:30002').test
>>> db.test.save({"x": 1})
ObjectId('...')
>>> db.test.find_one()
{u'x': 1, u'_id': ObjectId('...')}
Each member of the seed list passed to MongoClient must be a mongos. By checking the host, port, and is_mongos attributes we can see that we’re connected to morton.local:30001, a mongos:
傳遞給MongoClient的每個種子列表都必須是一個mongos. 經過查看host,port和is_mongos屬性 咱們能夠看到咱們如今鏈接到 morton.local:30001, 一個mongos:
>>> db.connection.host
'morton.local'
>>> db.connection.port
30001
>>> db.connection.is_mongos
True
Now let’s shut down that mongos instance and see what happens when we run our query again:
如今咱們關閉這個mongos實例來看看當咱們再次執行查詢時會發生什麼:
>>> db.test.find_one()
Traceback (most recent call last):
pymongo.errors.AutoReconnect: ...
As in the replica set example earlier in this document, we get an AutoReconnect exception. This means that the driver was not able to connect to the original mongos at port 30001 (which makes sense, since we shut it down), but that it will attempt to connect to a new mongos on subsequent operations. When this exception is raised our application code needs to decide whether to retry the operation or to simply continue, accepting the fact that the operation might have failed.
就像前面的副本集示例同樣,咱們獲得了一個AutoReconnect異常.
這意味着驅動沒法鏈接到最初的端口30001上的mongos了(這很正常,由於咱們把它關了), 可是它會嘗試爲後續操做鏈接一個新的mongos.
當這個異常被拋出時,咱們的應用程序須要決定是重試操做仍是直接繼續,接受剛纔這個操做可能失敗了的事實.
As long as one of the seed list members is still available the next operation will succeed:
只要種子列表成員中還有一個成員可用,下一步操做就會成功:
>>> db.test.find_one()
{u'x': 1, u'_id': ObjectId('...')}
>>> db.connection.host
'morton.local'
>>> db.connection.port
30002
>>> db.connection.is_mongos
Truenode