mongodb的sharding(分片)實驗

其實sharding也不是什麼很新的數據庫功能,更不是mongodb獨有的,相似的oracle中的功能叫作分區表(partition table),目前多數互聯網公司的數據庫應該都使用了相似的技術,口頭交流中也有人喜歡叫「分庫分表」。其功能是,當一個集合(或者說oracle中的表)變的很大以後,能夠把這個集合的數據分到2個數據庫(即shards)中,這樣不會致使數據庫變得大到難以維護,同時自動實現讀寫和負載均衡,這樣在集合持續變大的過程當中,只須要增大數據庫的個數便可。html

mongodb的sharding須要好幾個服務,配置服務器(config server),路由進程(route process),片數據庫(shards)。config server用來保存sharding的配置信息,在整個sharding中是單點,通常會有多個備份;route process,mongodb中叫mongos,負責與client鏈接並將數據分佈到各個shards上;shards就是存儲分片以後的數據。固然,從數據的安全和運行的業務連續性角度來看,shards和route process也應該有多個備份。mongodb

實驗中,使用1個config server,1個mongos,2個shards,爲了啓動進程加快,試驗中基本都使用了--nojournal選項。shell

首先創建好存儲數據文件的目錄,並啓動config server和mongos,分別監聽10000和10100端口數據庫

XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/config
XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard1
XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard2
XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/config_log
XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard1_log
XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard2_log
XXXXX@XXXXX-asus:/var/lib$ sudo mongod --dbpath /var/lib/mongodb_sharding/config --logpath /var/lib/mongodb_sharding/config_log/mongodb.log --port 10000 --nojournal --fork
forked process: 8430
all output going to: /var/lib/mongodb_sharding/config_log/mongodb.log
log file [/var/lib/mongodb_sharding/config_log/mongodb.log] exists; copied to temporary file [/var/lib/mongodb_sharding/config_log/mongodb.log.2014-03-21T08-03-50]
child process started successfully, parent exiting
XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/mongos_log
XXXXX@XXXXX-asus:/var/lib$ sudo mongos --bind_ip localhost --port 10100 --logpath /var/lib/mongodb_sharding/mongos_log/mongos.log --configdb localhost:10000 --fork
forked process: 8709
all output going to: /var/lib/mongodb_sharding/mongos_log/mongos.log
log file [/var/lib/mongodb_sharding/mongos_log/mongos.log] exists; copied to temporary file [/var/lib/mongodb_sharding/mongos_log/mongos.log.2014-03-21T08-12-03]
child process started successfully, parent exiting

而後啓動2個shard,分別監聽20000和30000端口。安全

XXXXX@XXXXX-asus:/var/lib$ sudo mongod --dbpath /var/lib/mongodb_sharding/shard1 --logpath /var/lib/mongodb_sharding/shard1_log/mongodb.log --port 20000 --nojournal --fork
forked process: 8834
all output going to: /var/lib/mongodb_sharding/shard1_log/mongodb.log
log file [/var/lib/mongodb_sharding/shard1_log/mongodb.log] exists; copied to temporary file [/var/lib/mongodb_sharding/shard1_log/mongodb.log.2014-03-21T08-15-43]
child process started successfully, parent exiting
XXXXX@XXXXX-asus:/var/lib$ sudo mongod --dbpath /var/lib/mongodb_sharding/shard2 --logpath /var/lib/mongodb_sharding/shard2_log/mongodb.log --port 30000 --nojournal --fork
forked process: 8846
all output going to: /var/lib/mongodb_sharding/shard2_log/mongodb.log
child process started successfully, parent exiting
XXXXX@XXXXX-asus:/var/lib$ 

而後用admin用戶登錄mongos,並將2個shard加入到整個分片體系中,ruby

XXXXX@XXXXX-asus:/var/lib$ mongo localhost:10100
MongoDB shell version: 2.2.4
connecting to: localhost:10100/test
mongos> use admin
switched to db admin
mongos> db.printShardingStatus()
--- Sharding Status --- 
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

mongos> db.runCommand({addshard:"localhost:20000", allowLocal:true})
{ "shardAdded" : "shard0000", "ok" : 1 }
mongos> db.runCommand({addshard:"localhost:30000", allowLocal:true})
{ "shardAdded" : "shard0001", "ok" : 1 }
mongos> db.printShardingStatus()
--- Sharding Status --- 
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:20000" }
    {  "_id" : "shard0001",  "host" : "localhost:30000" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

mongos> 

通常狀況下,若是要作sharding的話,會對某個表的「_id」這個列來作片鍵(shard key),由於這個是每一個集合都有的元素。固然,也能夠用其餘的列來作片鍵,不過要對這個列新建索引,不然對集合作shardcollection的時候會報錯。本實驗中仍然是用"_id"來作片鍵,實驗用的數據庫是test_sharding,集合是test_sharding.test。服務器

mongos> use admin
switched to db admin
mongos> db.runCommand({"enablesharding":"test_sharding"})
{ "ok" : 1 }
mongos> db.runCommand({"shardcollection":"test_sharding.test","key":{"_id":1}})
{ "collectionsharded" : "test_sharding.test", "ok" : 1 }
mongos> 

在沒有insert任何數據的狀況下,test集合的統計信息以下oracle

mongos> use test_sharding
switched to db test_sharding
mongos> db.test.stats()
{
    "sharded" : true,
    "ns" : "test_sharding.test",
    "count" : 0,
    "numExtents" : 1,
    "size" : 0,
    "storageSize" : 8192,
    "totalIndexSize" : 8176,
    "indexSizes" : {
        "_id_" : 8176
    },
    "avgObjSize" : 0,
    "nindexes" : 1,
    "nchunks" : 1,
    "shards" : {
        "shard0000" : {
            "ns" : "test_sharding.test",
            "count" : 0,
            "size" : 0,
            "storageSize" : 8192,
            "numExtents" : 1,
            "nindexes" : 1,
            "lastExtentSize" : 8192,
            "paddingFactor" : 1,
            "systemFlags" : 1,
            "userFlags" : 0,
            "totalIndexSize" : 8176,
            "indexSizes" : {
                "_id_" : 8176
            },
            "ok" : 1
        }
    },
    "ok" : 1
}
mongos> 

能夠看到sharding已經被啓用,可是此時只有一個shard在使用中,而後用腳本插入不少條數據,我是用ruby負載均衡

#!/usr/bin/env ruby
# 20140321, sharding_test.rb
###
# test sharding
###


require "rubygems"
require "mongo"

class MongoConnection
  def initialize(host, port)
    @mongoconn = Mongo::MongoClient.new(host, port)
    @db = @mongoconn.db("test_sharding")
  end

  def test()
    @coll = @db.collection("test")

    @coll.insert({"num"=>0})
    (1..600000).each { |i| @coll.insert({ num:i, num_org:i, create_at:Time.now, txt:"The quick brown fox jumps over the lazy dog." } ) }
    puts @coll.find({num:5000}).explain

  end

end

mongo_conn = MongoConnection.new("localhost", 10100)

puts "======TEST WITHOUT INDEX========================"
mongo_conn.test()

插入600000條數據以後,集合的統計信息發生了明顯的變化測試

mongos> db.test.stats()
{
    "sharded" : true,
    "ns" : "test_sharding.test",
    "count" : 600001,
    "numExtents" : 18,
    "size" : 72000032,
    "storageSize" : 116883456,
    "totalIndexSize" : 19491584,
    "indexSizes" : {
        "_id_" : 19491584
    },
    "avgObjSize" : 119.99985333357778,
    "nindexes" : 1,
    "nchunks" : 6,
    "shards" : {
        "shard0000" : {
            "ns" : "test_sharding.test",
            "count" : 319678,
            "size" : 38361360,
            "avgObjSize" : 120,
            "storageSize" : 58441728,
            "numExtents" : 9,
            "nindexes" : 1,
            "lastExtentSize" : 20643840,
            "paddingFactor" : 1,
            "systemFlags" : 1,
            "userFlags" : 0,
            "totalIndexSize" : 10383520,
            "indexSizes" : {
                "_id_" : 10383520
            },
            "ok" : 1
        },
        "shard0001" : {
            "ns" : "test_sharding.test",
            "count" : 280323,
            "size" : 33638672,
            "avgObjSize" : 119.99968607641898,
            "storageSize" : 58441728,
            "numExtents" : 9,
            "nindexes" : 1,
            "lastExtentSize" : 20643840,
            "paddingFactor" : 1,
            "systemFlags" : 1,
            "userFlags" : 0,
            "totalIndexSize" : 9108064,
            "indexSizes" : {
                "_id_" : 9108064
            },
            "ok" : 1
        }
    },
    "ok" : 1
}
mongos> 

可見,2個shard上都被分配到了數據,分配的還算近似均勻。試驗中,若是數據少的話,基本看不到sharding的效果,數據都放在shard0000上,可是數據多了以後,sharding的效果就有了。引用別人博客上的一段話,不知道對不對,來自這裏(http://blog.sina.com.cn/s/blog_502c8cc40100pdiv.html

爲了測試Sharding的balance效果,我陸續插入了大約200M的數據,插入過程當中使用db.stats() 查詢數據分佈狀況。發如今數據量較小,30M如下時,全部trunk都存儲在了shard0000上,但繼續插入後,數據開始平均分佈,而且mongos會對多個shard之間的數據進行rebalance 。在插入數據達到200M,剛插入結束時,shard0000上大約有135M數據,而shard0001上大約有65M數據,但過一段時間以後,shard0000上的數據量減小到了115M,shard0001上的數據量達到了85M。

實驗中發現,mongodb對shard的選擇有某種記憶性,好比,寫入10000條數據,最後第10000條數據是寫在shard0001上的,此時用remove刪除全部數據,再次寫數據的時候,發現mongodb是從shard0001開始寫,而不是shard0000。還有一個比較噁心的地方,mongodb彷佛不容許更改shard key,即時是空集合也不行,至少我沒有找到相關的方法。

insert數據以後能夠發現sharding status發生變化

mongos> db.printShardingStatus()
--- Sharding Status --- 
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:20000" }
    {  "_id" : "shard0001",  "host" : "localhost:30000" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }
    {  "_id" : "test_sharding",  "partitioned" : true,  "primary" : "shard0000" }
        test_sharding.test chunks:
                shard0000    3
                shard0001    3
            { "_id" : { $minKey : 1 } } -->> { "_id" : ObjectId("532c3064cf7c7c2f53000001") } on : shard0000 Timestamp(5000, 0) 
            { "_id" : ObjectId("532c3064cf7c7c2f53000001") } -->> { "_id" : ObjectId("532c306acf7c7c2f530012d9") } on : shard0001 Timestamp(5000, 1) 
            { "_id" : ObjectId("532c306acf7c7c2f530012d9") } -->> { "_id" : ObjectId("532c310acf7c7c2f530290e0") } on : shard0000 Timestamp(4000, 1) 
            { "_id" : ObjectId("532c310acf7c7c2f530290e0") } -->> { "_id" : ObjectId("532c31a1cf7c7c2f5304f397") } on : shard0000 Timestamp(3000, 2) 
            { "_id" : ObjectId("532c31a1cf7c7c2f5304f397") } -->> { "_id" : ObjectId("532c3234cf7c7c2f53075090") } on : shard0001 Timestamp(4000, 2) 
            { "_id" : ObjectId("532c3234cf7c7c2f53075090") } -->> { "_id" : { $maxKey : 1 } } on : shard0001 Timestamp(4000, 3) 

mongos> 

此時,還能夠再增長一個shard,操做和以前相似

XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard3
XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard3_log
XXXXX@XXXXX-asus:/var/lib$ sudo mongod --dbpath /var/lib/mongodb_sharding/shard3 --logpath /var/lib/mongodb_sharding/shard3_log/mongodb.log --port 40000 --nojournal --fork
forked process: 12525
all output going to: /var/lib/mongodb_sharding/shard3_log/mongodb.log
child process started successfully, parent exiting
XXXXX@XXXXX-asus:/var/lib$ mongo localhost:10100
MongoDB shell version: 2.2.4
connecting to: localhost:10100/test
mongos> use admin
switched to db admin
mongos> db.runCommand({addshard:"localhost:40000", allowLocal:true})
{ "shardAdded" : "shard0002", "ok" : 1 }
mongos> 

加入新的shard以後,數據會自動飄移到新的shard上,能夠查看集合的統計信息

mongos> db.test.stats()
{
    "sharded" : true,
    "ns" : "test_sharding.test",
    "count" : 600001,
    "numExtents" : 22,
    "size" : 72000032,
    "storageSize" : 117579776,
    "totalIndexSize" : 19499760,
    "indexSizes" : {
        "_id_" : 19499760
    },
    "avgObjSize" : 119.99985333357778,
    "nindexes" : 1,
    "nchunks" : 6,
    "shards" : {
        "shard0000" : {
            "ns" : "test_sharding.test",
            "count" : 319678,
            "size" : 38361360,
            "avgObjSize" : 120,
            "storageSize" : 58441728,
            "numExtents" : 9,
            "nindexes" : 1,
            "lastExtentSize" : 20643840,
            "paddingFactor" : 1,
            "systemFlags" : 1,
            "userFlags" : 0,
            "totalIndexSize" : 10383520,
            "indexSizes" : {
                "_id_" : 10383520
            },
            "ok" : 1
        },
        "shard0001" : {
            "ns" : "test_sharding.test",
            "count" : 275499,
            "size" : 33059880,
            "avgObjSize" : 120,
            "storageSize" : 58441728,
            "numExtents" : 9,
            "nindexes" : 1,
            "lastExtentSize" : 20643840,
            "paddingFactor" : 1,
            "systemFlags" : 1,
            "userFlags" : 0,
            "totalIndexSize" : 8952720,
            "indexSizes" : {
                "_id_" : 8952720
            },
            "ok" : 1
        },
        "shard0002" : {
            "ns" : "test_sharding.test",
            "count" : 4824,
            "size" : 578792,
            "avgObjSize" : 119.98175787728026,
            "storageSize" : 696320,
            "numExtents" : 4,
            "nindexes" : 1,
            "lastExtentSize" : 524288,
            "paddingFactor" : 1,
            "systemFlags" : 1,
            "userFlags" : 0,
            "totalIndexSize" : 163520,
            "indexSizes" : {
                "_id_" : 163520
            },
            "ok" : 1
        }
    },
    "ok" : 1
}
mongos> 

用admin帳戶能夠看新的sharding status

mongos> db.printShardingStatus()
--- Sharding Status --- 
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:20000" }
    {  "_id" : "shard0001",  "host" : "localhost:30000" }
    {  "_id" : "shard0002",  "host" : "localhost:40000" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }
    {  "_id" : "test_sharding",  "partitioned" : true,  "primary" : "shard0000" }
        test_sharding.test chunks:
                shard0002    2
                shard0000    2
                shard0001    2
            { "_id" : { $minKey : 1 } } -->> { "_id" : ObjectId("532c3064cf7c7c2f53000001") } on : shard0002 Timestamp(6000, 0) 
            { "_id" : ObjectId("532c3064cf7c7c2f53000001") } -->> { "_id" : ObjectId("532c306acf7c7c2f530012d9") } on : shard0002 Timestamp(7000, 0) 
            { "_id" : ObjectId("532c306acf7c7c2f530012d9") } -->> { "_id" : ObjectId("532c310acf7c7c2f530290e0") } on : shard0000 Timestamp(6000, 1) 
            { "_id" : ObjectId("532c310acf7c7c2f530290e0") } -->> { "_id" : ObjectId("532c31a1cf7c7c2f5304f397") } on : shard0000 Timestamp(3000, 2) 
            { "_id" : ObjectId("532c31a1cf7c7c2f5304f397") } -->> { "_id" : ObjectId("532c3234cf7c7c2f53075090") } on : shard0001 Timestamp(7000, 1) 
            { "_id" : ObjectId("532c3234cf7c7c2f53075090") } -->> { "_id" : { $maxKey : 1 } } on : shard0001 Timestamp(4000, 3) 

mongos> 

能夠看到,新增的shard沒有很好的去平均全部的數據,而是隻有不多的一部分數據飄移到了新的shard中,固然,這可能也是由於數據不足夠多形成的。

相關文章
相關標籤/搜索