正確理解和使用 Mongodb 的索引

時間 2019-11-29

標籤正確理解使用 mongodb 索引欄目 MongoDB 简体版

原文原文鏈接

在 Mongodb 典型的數據庫查詢場景中，索引 index 扮演着很是重要的做用，若是沒有索引，MongoDB 須要爲了找到一個匹配的文檔而掃描整個 collection，代價很是高昂。mongodb

Mongodb 的索引使用的 B-tree 這一特殊的數據結構，藉助索引 Mongodb 能夠高效的匹配到須要查詢的數據，如下圖來爲例(來自官方)：shell

score 索引不但能夠高效的支持 range 查詢，此外也可讓 MongoDB 高效地返回排序以後的數據。數據庫

Mongodb 的索引同其它數據庫系統很類似，Mongodb 的索引是定義在 collection 級別的，支持對任何單個 field 以及任何 sub-field 創建索引。bash

默認的 `_id` index

Mongodb 在 collection 建立時會默認創建一個基於_id的惟一性索引做爲 document 的 primary key，這個 index 沒法被刪除。數據結構

Mongodb 支持多種方式建立索引，具體建立方式見官方文檔 https://docs.mongodb.com/manual/indexes/#create-an-index性能

Single field index

Single field index 是 Mongodb 最簡單的索引類型，不一樣於 MySQL，MongoDB 的索引是有順序 ascending或 descending。spa

可是對於 single field index 來講，索引的順序可有可無，由於 MongoDB 支持任意順序遍歷 single field index。code

在此建立一個 records collection：cdn

{
  "_id": ObjectId("570c04a4ad233577f97dc459"),
  "score": 1034,
  "location": { state: "NY", city: "New York" }
}
複製代碼

而後建立一個 single field index：排序

db.records.createIndex( { score: 1 } )
複製代碼

上面的語句在 collection 的 score field 上建立了一個 ascending 索引，這個索引支持如下查詢：

db.records.find( { score: 2 } )
db.records.find( { score: { $gt: 10 } } )
複製代碼

可使用 MongoDB 的 explain 來對以上兩個查詢進行分析：

db.records.find({score:2}).explain('executionStats')
複製代碼

single index on embedded field

此外 MongoDB 還支持對 embedded field 進行索引建立：

db.records.createIndex( { "location.state": 1 } )
複製代碼

上面的 embedded index 支持如下查詢：

db.records.find( { "location.state": "CA" } )
db.records.find( { "location.city": "Albany", "location.state": "NY" } )
複製代碼

sort on single index

對於 single index 來講，因爲 MongoDB index 自己支持順序查找，因此對於single index 來講

db.records.find().sort( { score: 1 } )
db.records.find().sort( { score: -1 } )
db.records.find({score:{$lte:100}}).sort( { score: -1 } )
複製代碼

這些查詢語句都是知足使用 index 的。

Compound index

Mongodb 支持對多個 field 創建索引，稱之爲 compound index。Compound index 中 field 的順序對索引的性能有相當重要的影響，好比索引 {userid:1, score:-1} 首先根據 userid 排序，而後再在每一個 userid 中根據 score 排序。

建立 Compound index

在此建立一個 products collection：

{
 "_id": ObjectId(...),
 "item": "Banana",
 "category": ["food", "produce", "grocery"],
 "location": "4th Street Store",
 "stock": 4,
 "type": "cases"
}
複製代碼

而後建立一個 compound index：

db.products.createIndex( { "item": 1, "stock": 1 } )
複製代碼

這個 index 引用的 document 首先會根據 item 排序，而後在每一個 item 中，又會根據 stock 排序，如下語句都知足該索引：

db.products.find( { item: "Banana" } )
db.products.find( { item: "Banana", stock: { $gt: 5 } } )
複製代碼

條件 {item: "Banana"} 知足是由於這個 query 知足 prefix 原則。

使用 compound 索引須要知足 prefix 原則

Index prefix 是指 index fields 的左前綴子集，考慮如下索引：

{ "item": 1, "location": 1, "stock": 1 }
複製代碼

這個索引包含如下 index prefix：

{ item: 1 }
{ item: 1, location: 1 }
複製代碼

因此只要語句知足 index prefix 原則都是能夠支持使用 compound index 的：

db.products.find( { item: "Banana" } )
db.products.find( { item: "Banana",location:"4th Street Store"} )
db.products.find( { item: "Banana",location:"4th Street Store",stock:4})
複製代碼

相反若是不知足 index prefix 則沒法使用索引，好比如下 field 的查詢：

the location field
the stock field
the location and stock fields

因爲 index prefix 的存在，若是一個 collection 既有 {a:1, b:1} 索引，也有 {a:1} 索引，若是兩者沒有稀疏或者惟一性的要求，single index 是能夠移除的。

Sort on Compound index

前文說過 single index 的 sort 順序可有可無，可是 compound index 則徹底不一樣。

考慮有以下場景：

db.events.find().sort( { username: 1, date: -1 } )
複製代碼

events collection 有一個上面的查詢，首先結果根據 username 進行 ascending 排序，而後再對結果進行 date descending 排序，或者是下面的查詢：

db.events.find().sort( { username: -1, date: 1 } )
複製代碼

根據 username 進行 descending 排序，而後再對 date 進行 ascending 排序，索引：

db.events.createIndex( { "username" : 1, "date" : -1 } ）
複製代碼

能夠支持這兩種查詢，可是下面的查詢不支持：

db.events.find().sort( { username: 1, date: 1 })
複製代碼

也就是說 sort 的順序必需要和建立索引的順序是一致的，一致的意思是不必定非要同樣，總結起來大體以下

	{ "username" : 1, "date" : -1 }	{ "username" : 1, "date" : 1 }
sort( { username: 1, date: -1 } )	支持	不支持
sort( { username: -1, date: 1 } )	支持	不支持
sort( { username: 1, date: 1 } )	不支持	支持
sort( { username: -1, date: -1 } )	不支持	支持

即排序的順序必需要和索引一致，逆序以後一致也能夠，下表清晰的列出了 compound index 知足的 query 語句：

query	index
db.data.find().sort( { a: 1 } )	{ a: 1 }
db.data.find().sort( { a: -1 } )	{ a: 1 }
db.data.find().sort( { a: 1, b: 1 } )	{ a: 1, b: 1 }
db.data.find().sort( { a: -1, b: -1 } )	{ a: 1, b: 1 }
db.data.find().sort( { a: 1, b: 1, c: 1 } )	{ a: 1, b: 1, c: 1 }
db.data.find( { a: { $gt: 4 } } ).sort( { a: 1, b: 1 } )	{ a: 1, b: 1 }

非 index prefix 的排序

考慮索引 { a: 1, b: 1, c: 1, d: 1 }，即便排序的 field 不知足 index prefix 也是能夠的，但前提條件是排序 field 以前的 index field 必須是等值條件，

	Example	Index Prefix
r1	db.data.find( { a: 5 } ).sort( { b: 1, c: 1 } )	{ a: 1 , b: 1, c: 1 }
r2	db.data.find( { b: 3, a: 4 } ).sort( { c: 1 } )	{ a: 1, b: 1, c: 1 }
r3	db.data.find( { a: 5, b: { $lt: 3} } ).sort( { b: 1 } )	{ a: 1, b: 1 }

上面表格 r1 的排序 field 是 b 和 c，a 是 index field 並且在 b 和 c 以前，可使用索引；r3 的排序中 b 是範圍查詢，可是 b 以前的 a 用的也是等值條件，也就是隻要排序 field 以前的 field 知足等值條件便可，其它的 field 能夠任意條件。

如何創建正確索引

前文基本覆蓋了平常使用 MongoDB 所須要的主要索引知識，可是如何才創建正確的索引？

使用 explain 分析查詢語句

MongoDB 默認提供了相似 MySQL explain 的語句來分析查詢語句的來對咱們正確創建索引提供幫助，在創建索引時咱們須要對照 explain 對各類查詢條件進行分析。

理解 field 順序對索引的影響

索引的真正做用是幫助咱們限制數據的選擇範圍，好比 Compound index 多個 feild 的順序如何決定，應該首選能夠最大化的縮小數據查找範圍的 field，這樣若是第一個 field 能夠迅速縮小數據的查找範圍，那麼後續的 feild 匹配的行就會變少不少。考慮語句：

{'online_time': {'$lte': present}, 'offline_time': {'$gt': present}, 'online': 1, 'orientation': 'quality', 'id': {'$gt': max_id}}
複製代碼

考慮以下索引

	索引	nscanded
r1	{start_time:1, end_time: 1, origin: 1, id: 1, orientation: 1}	12959
r2	{start_time:1, end_time: 1, origin: 1, orientation: 1, id: 1}	2700

因爲 field id 和 orientation 的順序不一樣會致使須要掃描的 documents 數量差別巨大，說明兩者對對數據的限制範圍差異很大，優先考慮可以最大化限制數據範圍的索引順序。

監控慢查詢

始終對生成環境產生的慢查詢進行第一時間分析，提前發現問題並解決。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。