MongoDB水平分片集羣學習筆記

時間 2019-12-06

標籤 mongodb 水平分片集羣學習筆記欄目 MongoDB 简体版

原文原文鏈接

爲什麼須要水平分片

1 減小單機請求數，將單機負載，提升總負載 mongodb

2 減小單機的存儲空間，提升總存空間。 shell

下圖一目瞭然：數據庫

mongodb sharding 服務器架構

簡單註解：服務器

1 mongos 路由進程，應用程序接入mongos再查詢到具體分片。架構

2 config server 路由表服務。每一臺都具備所有chunk的路由信息。 app

3 shard爲數據存儲分片。每一片均可以是複製集（replica set)。 ui

如何部署分片集羣

step 1 啓動config server this

mkdir /data/configdb
mongod --configsvr --dbpath /data/configdb --port 27019

正式生產環境通常啓動3個config server。啓動3個是爲了作熱備。

step 2 啓動mongos spa

mongos --configdb cfg0.example.net:27019,cfg1.example.net:27019,cfg2.example.net:27019

step3 啓動分片mongod

分片就是普通的mongod .net

mongod --dbpath <path> --port <port>

step4 在mongos添加分片

用mongo 鏈接上mongos，而後經過Mongo命令行輸入：

添加非replica set做爲分片：

sh.addShard( "mongodb0.example.net:27017" )

添加replica set做爲分片：

sh.addShard( "rs1/mongodb0.example.net:27017" )

step5 對某個數據庫啓用分片

sh.enableSharding("<database>")

這裏只是標識這個數據庫能夠啓用分片，但實際上並無進行分片。

step6 對collection進行分片

分片時須要指定分片的key，語法爲

sh.shardCollection("<database>.<collection>", shard-key-pattern)

例子爲：

sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } )

sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } )

sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } )

db.alerts.ensureIndex( { _id : "hashed" } )
sh.shardCollection("events.alerts", { "_id": "hashed" } )

最後一個爲hash sharded key。 hash sharded key是爲了解決某些狀況下sharded key的 write scaling的問題。

如何選擇shard key

1 shard key須要有高的cardinality 。也就是shard key須要擁有不少不一樣的值。便於數據的切分和遷移。

2 儘可能與應用程序融合。讓mongos面對查詢時能夠直接定位到某個shard。

3 具備隨機性。這是爲了避免會讓某段時間內的insert請求所有集中到某個單獨的分片上，形成單片的寫速度成爲整個集羣的瓶頸。用objectId做爲shard key時會發生隨機性差狀況。 ObjectId實際上由進程ID+TIMESTAMP + 其餘因素組成，因此一段時間內的timestamp會相對集中。

不過隨機性高會有一個反作用，就是query isolation性比較差。

可用hash key增長隨機性。

如何查看shard信息

登上mongos

sh.status()或者須要看詳細一點

sh.status({verbose:true})

Sharding Status ---
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
    {  "_id" : "shard0000",  "host" : "m0.example.net:30001" }
    {  "_id" : "shard0001",  "host" : "m3.example2.net:50000" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "contacts",  "partitioned" : true,  "primary" : "shard0000" }
        foo.contacts
            shard key: { "zip" : 1 }
            chunks:
                shard0001    2
                shard0002    3
                shard0000    2
            { "zip" : { "$minKey" : 1 } } -->> { "zip" : 56000 } on : shard0001 { "t" : 2, "i" : 0 }
            { "zip" : 56000 } -->> { "zip" : 56800 } on : shard0002 { "t" : 3, "i" : 4 }
            { "zip" : 56800 } -->> { "zip" : 57088 } on : shard0002 { "t" : 4, "i" : 2 }
            { "zip" : 57088 } -->> { "zip" : 57500 } on : shard0002 { "t" : 4, "i" : 3 }
            { "zip" : 57500 } -->> { "zip" : 58140 } on : shard0001 { "t" : 4, "i" : 0 }
            { "zip" : 58140 } -->> { "zip" : 59000 } on : shard0000 { "t" : 4, "i" : 1 }
            { "zip" : 59000 } -->> { "zip" : { "$maxKey" : 1 } } on : shard0000 { "t" : 3, "i" : 3 }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }

備份cluster meta information

Step1 disable balance process. 鏈接上Mongos

sh.setBalancerState(false)

Step2 關閉config server

Step3 備份數據文件夾

Step4 重啓config server

Step5 enable balance process.

sh.setBalancerState(false)

查看balance 狀態

能夠經過下面的命令來查看當前的balance進程狀態。先鏈接到任意一臺mongos

use config
db.locks.find( { _id : "balancer" } ).pretty()
{   "_id" : "balancer",
"process" : "mongos0.example.net:1292810611:1804289383",
  "state" : 2,
     "ts" : ObjectId("4d0f872630c42d1978be8a2e"),
   "when" : "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)",
    "who" : "mongos0.example.net:1292810611:1804289383:Balancer:846930886",
    "why" : "doing balance round" }

state=2 表示正在進行balance, 在2.0版本以前這個值是1

配置balance時間窗口

能夠經過balance時間窗口指定在一天以內的某段時間以內能夠進行balance，其餘時間不得進行balance。

先鏈接到任意一臺mongos

use config
db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "23:00", stop : "6:00" } } }, true )

這個設置讓只有從23:00到6:00之間能夠進行balance。

也能夠取消時間窗口設置:

use config
db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true } })

修改chunk size

這是一個全局的參數。默認是64MB。

小的chunk會讓不一樣的shard數據量更均衡。但會致使更多的Migration。

大的chunk會減小migration。不一樣的shard數據量不均衡。

這樣修改chunk size。先鏈接上任意mongos

db.settings.save( { _id:"chunksize", value: <size> } )

單位是MB

什麼時候會自動balance

每一個mongos進程均可能發動balance。

一次只會有一個balance跑。這是由於須要競爭這個鎖:

db.locks.find( { _id : "balancer" } )

balance一次只會遷移一個chunk。

只有chunk最多的shard的chunk數目減去chunk最少的shard的chunk數目超過treshhold時纔開始migration。

Number of Chunks	Migration Threshold
Fewer than 20	2
21-80	4
Greater than 80	8

上面的treshhold從2.2版本開始生效。

一旦balancer開始行動起來，只有當任意兩個shard的chunk數量小於2或者是migration失敗纔會中止。

設置分片上最大的存儲容量

有兩種方式，第一種在添加分片時候用maxSize參數指定:

db.runCommand( { addshard : "example.net:34008", maxSize : 125 } )

第二種方式能夠在運行中修改設定：

use config
db.shards.update( { _id : "shard0000" }, { $set : { maxSize : 250 } } )

刪除分片

鏈接上任意一臺mongos

STEP1 確認balancer已經打開。

STEP2 運行命令:

db.runCommand( { removeShard: "mongodb0" } )

mongodb0是須要刪除的分片的名字。這時balancer進程會開始把要刪除掉的分片上的數據往別的分片上遷移。

STEP3 查看是否刪除完

仍是運行上面那條removeShard命令

若是還未刪除完數據則返回：

{ msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: NumberLong(42), dbs : NumberLong(1) }, ok: 1 }

STEP4 刪除unsharded data

有一些分片上保存上一些unsharded data，須要遷移到其餘分片上：

能夠用sh.status()查看分片上是否有unsharded data。

若是有則顯示：

{ "_id" : "products", "partitioned" : true, "primary" : "mongodb0" }

用下面的命令遷移：

db.runCommand( { movePrimary: "products", to: "mongodb1" })

只有所有遷移完上面的命令纔會返回：

{ "primary" : "mongodb1", "ok" : 1 }

STEP5 最後運行命令

db.runCommand( { removeShard: "mongodb0" } )

手動遷移分片

通常狀況下你不須要這麼作，只有當一些特殊狀況發生時，好比：

1 預分配空的集合時

2 在balancing時間窗以外

手動遷移的方法：

chunks:
                                shard0000       2
                                shard0001       2
                        { "zipcode" : { "$minKey" : 1 } } -->> { "zipcode" : 10001 } on : shard0000 Timestamp(6, 0) 
                        { "zipcode" : 10001 } -->> { "zipcode" : 23772 } on : shard0001 Timestamp(6, 1) 
                        { "zipcode" : 23772 } -->> { "zipcode" : 588377 } on : shard0001 Timestamp(3, 2) 
                        { "zipcode" : 588377 } -->> { "zipcode" : { "$maxKey" : 1 } } on : shard0000 Timestamp(5, 1)
mongos> db.adminCommand({moveChunk: "contact.people", find:{zipcode:10003}, to:"192.168.1.135:20002"})
{ "millis" : 2207, "ok" : 1 }
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "version" : 3,
        "minCompatibleVersion" : 3,
        "currentVersion" : 4,
        "clusterId" : ObjectId("52ece49ae6ab22400d937891")
}
  shards:
        {  "_id" : "shard0000",  "host" : "192.168.1.135:20002" }
        {  "_id" : "shard0001",  "host" : "192.168.1.135:20003" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }
        {  "_id" : "contact",  "partitioned" : true,  "primary" : "shard0000" }
                contact.people
                        shard key: { "zipcode" : 1 }
                        chunks:
                                shard0000       3
                                shard0001       1
                        { "zipcode" : { "$minKey" : 1 } } -->> { "zipcode" : 10001 } on : shard0000 Timestamp(6, 0) 
                        { "zipcode" : 10001 } -->> { "zipcode" : 23772 } on : shard0000 Timestamp(7, 0) 
                        { "zipcode" : 23772 } -->> { "zipcode" : 588377 } on : shard0001 Timestamp(7, 1) 
                        { "zipcode" : 588377 } -->> { "zipcode" : { "$maxKey" : 1 } } on : shard0000 Timestamp(5, 1) 

mongos>

預分配空chunk

這是一種提升寫效率的方法。至關於在寫入真實數據以前，就分配好了數據桶，而後再對號入座。省去了建立chunk和split的時間。

實際上使用的是split命令:

db.runCommand( { split : "myapp.users" , middle : { email : prefix } } );

myapp.users 是 collection的名字。

middle參數是split的點。

split命令以下：

db.adminCommand( { split: <database>.<collection>, <find|middle|bounds> } )

find 表示查找到的記錄進行分裂

bounds是指定[low, up]分裂

middle是指定分裂的點。

一個預分配chunk的例子以下：

for ( var x=97; x<97+26; x++ ){
  for( var y=97; y<97+26; y+=6 ) {
    var prefix = String.fromCharCode(x) + String.fromCharCode(y);
    db.runCommand( { split : "myapp.users" , middle : { email : prefix } } );
  }
}

這個預分配的目的是字母順序有必定間隔的email，分配到不一樣的chunk裏。

例如aa-ag到一個chunk

ag-am到一個chunk

預分配的結果以下：

{ "email" : { "$minKey" : 1 } } -->> { "email" : "aa" } on : shard0001 Timestamp(2, 0) 
{ "email" : "aa" } -->> { "email" : "ag" } on : shard0001 Timestamp(3, 0) 
{ "email" : "ag" } -->> { "email" : "am" } on : shard0001 Timestamp(4, 0) 
{ "email" : "am" } -->> { "email" : "as" } on : shard0001 Timestamp(5, 0) 
{ "email" : "as" } -->> { "email" : "ay" } on : shard0001 Timestamp(6, 0)
...

{ "email" : "zm" } -->> { "email" : "zs" } on : shard0000 Timestamp(1, 257) 
{ "email" : "zs" } -->> { "email" : "zy" } on : shard0000 Timestamp(1, 259) 
{ "email" : "zy" } -->> { "email" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 260)

如何刪除在sharding中刪除複製集replica set的成員

假設sharding的分片是複製集，須要刪除某個複製集的某個成員。

只要在複製集的設置中刪除該成員便可，不須要在mongos中刪除。mongos會自動同步這個配置。

例如 sharding cluster中有這個分片:

{ "_id" : "rs3", "host" : "rs3/192.168.1.5:30003,192.168.1.6:30003" }

須要刪除192.168.1.6:30003這個成員。

只須要:

step 1: 在192.168.1.6:30003上運行db.shutdownServer()關閉mongod

step 2:在rs3的primary的成員192.168.1.5:30003上執行

rs.remove("192.168.1.6:30003")

如何關閉/打開balancer

關閉

sh.setBalancerState(false)

打開

sh.setBalancerState(true)

查看是否打開：

sh.getBalancerState()

新添加的分片始終不進行數據同步的問題

1 若是sharding cluster中新添加的分片始終不進行數據migration, 並出現相似日誌:

migrate commit waiting for 2 slaves for

則須要重啓該分片的mongod進程。

特別須要注意的是，若是某mongod進程是一個replica set的primary，而且該replica set上只有一個mongod，那麼不能用db.shutdownServer()的方法關閉。會報下面的錯誤:

no secondary is within 10 seconds of the primary,

須要用下面的命令關閉：

db.adminCommand({shutdown : 1, force : true})

2 另一個新的分片始終部進行更新的問題：

日誌裏出現這樣的錯誤：

secondaryThrottle on, but doc insert timed out after 60 seconds, continuing

經過1 將全部分片的secondary和arbitary刪除掉，2 重啓同步的分片解決。

找到這個問題的解決方法是看到mongo/s/d_migration.cpp裏有這樣一段代碼

if ( secondaryThrottle && thisTime > 0 ) {
   if ( ! waitForReplication( cc().getLastOp(), 2, 60 /* seconds to wait */ ) ) {
       warning() << "secondaryThrottle on, but doc insert timed out after 60 seconds, continuing" << endl;
   }
}

這段代碼含義是，要進行同步的chunk所在的分片的從服務的secondary的optime和主分片不一致，因此須要等待60秒鐘的時間。

因此將要進行同步的chunk所在分片的複製集secondary和arbiter都刪除掉，再重啓新分片的mongod以後解決。