文檔的數據模型表明了數據的組織結構,一個好的數據模型能更好的支持應用程序。在MongoDB中,文檔有兩種數據模型,內嵌(embed)和引用(references)。mongodb
MongoDB的文檔是無模式的,因此能夠支持各類數據結構,內嵌模型也叫作非規格化模型(denormalized)。在MongoDB中,一組相關的數據能夠是一個文檔,也能夠是組成文檔的一部分。看看下面一張MongoDB文檔中的圖片。數據庫
內嵌類型支持一組相關的數據存儲在一個文檔中,這樣的好處就是,應用程序能夠經過比較少的的查詢和更新操做來完成一些常規的數據的查詢和更新工做。數據結構
根據MongoDB文檔,當遇到如下狀況的時候,咱們應該考慮使用內嵌類型:app
像這種一對一的關係,使用內嵌類型能夠很方便的進行數據的查詢和更新。post
{ "_id": <ObjectId0>, "name": "Wilber", "contact": { "phone": "12345678", "email": "wilber@shanghai.com" } }
在這中狀況中,若是應用程序會常常經過用戶名字段來查詢改用戶發佈的博客信息。那麼,把posts做爲內嵌字段會是一個比較好的選擇,這樣就能夠減小不少查詢的操做。性能
{ "_id": <ObjectId1>, "name": "Wilber", "contact": { "phone": "12345678", "email": "wilber@shanghai.com" }, "posts": [ { "title": "Indexes in MongoDB", "created": "12/01/2014", "link": "www.blog.com" }, { "title": "Replication in MongoDB", "created": "12/02/2014", "link": "www.blog.com" }, { "title": "Sharding in MongoDB", "created": "12/03/2014", "link": "www.blog.com" } ] }
根據上面的描述能夠看出,內嵌模型能夠給應用程序提供很好的數據查詢性能,由於基於內嵌模型,能夠經過一次數據庫操做獲得全部相關的數據。同時,內嵌模型可使數據更新操做變成一個原子寫操做。fetch
然而,內嵌模型也可能引入一些問題,好比說文檔會愈來愈大,這樣就可能會影響數據庫寫操做的性能,還可能會產生數據碎片(data fragmentation)(即:使用內嵌模型要考慮Document Growth,下面引入MongoDB文檔對Document Grouth的介紹)。另外,MongoDB中會有最大文檔大小限制,因此在使用內嵌類型時還要考慮這點。ui
Some updates to documents can increase the size of documents. These updates include pushing elements to an array (i.e. $push) and adding new fields to a document. If the document size exceeds the allocated space for that document, MongoDB will relocate the document on disk. Relocating documents takes longer than in place updates and can lead to fragmented storage. Although MongoDB automatically adds padding to document allocations to minimize the likelihood of relocation, data models should avoid document growth when possible.spa
For instance, if your applications require updates that will cause document growth, you may want to refactor your data model to use references between data in distinct documents rather than a denormalized data model. 3d
相對於嵌入模型,引用模型又稱規格化模型(Normalized data models),經過引用的方式來表示數據之間的關係。
這裏一樣使用來自MongoDB文檔中的圖片,在這個模型中,把contact和access從user中移出,並經過user_id做爲索引來表示他們之間的聯繫。
當咱們遇到如下狀況的時候,就能夠考慮使用引用模型了:
下面看一個比較有意思的例子,該例子來自MongoDB文檔
很直觀的,咱們都會使用父子關係來表示這中樹形結構
db.categories.insert( { _id: "MongoDB", parent: "Databases" } ) db.categories.insert( { _id: "dbm", parent: "Databases" } ) db.categories.insert( { _id: "Databases", parent: "Programming" } ) db.categories.insert( { _id: "Languages", parent: "Programming" } ) db.categories.insert( { _id: "Programming", parent: "Books" } ) db.categories.insert( { _id: "Books", parent: null } )
db.categories.insert( { _id: "MongoDB", children: [] } ) db.categories.insert( { _id: "dbm", children: [] } ) db.categories.insert( { _id: "Databases", children: [ "MongoDB", "dbm" ] } ) db.categories.insert( { _id: "Languages", children: [] } ) db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } ) db.categories.insert( { _id: "Books", children: [ "Programming" ] } )
在MongoDB中,引用又有兩種實現方式,手動引用(Manual references)和DBRefs。
像前面的一對多例子,咱們能夠把use中的name字段保存在post文檔中創建二者的關係,這樣咱們能夠經過屢次查詢的方式的到咱們想要的數據。這種引用方式比較簡單,並且能夠知足大多數的需求。
user document |
post document |
{ "name": "Wilber", "gender": "Male", "birthday": "1987-09", "contact": { "phone": "12345678", "email": "wilber@shanghai.com" } } |
{ "title": "Indexes in MongoDB", "created": "12/01/2014", "link": "www.blog.com", "author": "Wilber" } { "title": "Replication in MongoDB", "created": "12/02/2014", "link": "www.blog.com", "author": "Wilber" } { "title": "Sharding in MongoDB", "created": "12/03/2014", "link": "www.blog.com", "author": "Wilber" } |
注意,手動引用的惟一不足是這種引用沒有指明使用哪一個database,哪一個collection。若是出現一個collection中的文檔與多個其它collection中的文檔有引用關係,咱們可能就要考慮使用DBRefs了。
舉例,假如用戶能夠在多個博客平臺上發佈博客,不一樣博客平臺的數據保存在不一樣的collection。這種狀況使用DBRefs就比較方便了。
user document |
Post4CNblog document |
Post4CSDN document |
Post4ITeye document |
{ "name": "Wilber", "gender": "Male", "birthday": "1987-09", "contact": { "phone": "12345678", "email": "wilber@shanghai.com" } } |
{ "title": "Indexes in MongoDB", "created": "12/01/2014", "link": "www.blog.com", "author": "Wilber" } { "title": "Replication in MongoDB", "created": "12/02/2014", "link": "www.blog.com", "author": "Wilber" } |
{ "title": "Sharding in MongoDB", "created": "12/03/2014", "link": "www.blog.com", "author": "Wilber" } |
{ "title": "Notepad++ configuration", "created": "12/05/2014", "link": "www.blog.com", "author": "Wilber" }
|
若是要查詢在CNblog上發佈"Replication in MongoDB"的用戶詳細信息,咱們可使用下面語句,經過兩次查詢獲得用戶詳細信息
> db.Post4CNblog.find({"title": "Replication in MongoDB"}) { "_id" : ObjectId("548fe8100c3e84a00806a48f"), "title" : "Replication in MongoDB", "created" : "12/02/2014", "link" : "www.blog.com", "auth or" : "Wilber" } > db.user.find({"name":"Wilber"}).toArray() [ { "_id" : ObjectId("548fe8100c3e84a00806a48d"), "name" : "Wilber", "gender" : "Male", "birthday" : "1987-09", "contact" : { "phone" : "12345678", "email" : "wilber@shanghai.com" } } ]
DBRefs引用經過_id,collection名和database名(可選)來創建文檔之間的關係。經過這種方式,即便文檔分佈在多個不一樣的collection中,也能夠被方便的連接起來。
DBRefs有特定的格式,會包含下面字段:
舉例,將上面的例子經過DBRefs來實現。注意,這是要把user文檔中的用戶名設置成_id字段。
user document |
Post4CNblog document |
Post4CSDN document |
Post4ITeye document |
{ "_id": "Wilber", "gender": "Male", "birthday": "1987-09", "contact": { "phone": "12345678", "email": "wilber@shanghai.com" } } |
{ "title": "Indexes in MongoDB", "created": "12/01/2014", "link": "www.blog.com", "author": {"$ref": "user", "$id": "Wilber"} } { "title": "Replication in MongoDB", "created": "12/02/2014", "link": "www.blog.com", "author": {"$ref": "user", "$id": "Wilber"} } |
{ "title": "Sharding in MongoDB", "created": "12/03/2014", "link": "www.blog.com", "author": {"$ref": "user", "$id": "Wilber"} } |
{ "title": "Notepad++ configuration", "created": "12/05/2014", "link": "www.blog.com", "author": {"$ref": "user", "$id": "Wilber"} }
|
一樣查詢在CNblog上發佈"Replication in MongoDB"的用戶詳細信息,這樣能夠經過一次查詢來完成
> db.Post4CNblog.findOne({"title":"Replication in MongoDB"}).author.fetch() { "_id" : "Wilber", "gender" : "Male", "birthday" : "1987-09", "contact" : { "phone" : "12345678", "email" : "wilber@shanghai.com" } } >
經過這篇文章大概認識了MongoDB中的數據模型,不能說內嵌模型和引用模型那個好,關鍵是看應用場景。
還有就是,在使用內嵌模型是必定要注意Document Growth和最大文檔限制。
Ps:例子中全部的命令均可以參考如下連接
http://files.cnblogs.com/wilber2013/data_modeling.js