MongoDB複合索引詳解

摘要: 對於MongoDB的多鍵查詢,建立複合索引能夠有效提升性能。javascript

什麼是複合索引?

複合索引,即Compound Index,指的是將多個鍵組合到一塊兒建立索引,這樣能夠加速匹配多個鍵的查詢。不妨經過一個簡單的示例理解複合索引。java

students集合以下:mongodb

db.students.find().pretty()
{
	"_id" : ObjectId("5aa7390ca5be7272a99b042a"),
	"name" : "zhang",
	"age" : "15"
}
{
	"_id" : ObjectId("5aa7393ba5be7272a99b042b"),
	"name" : "wang",
	"age" : "15"
}
{
	"_id" : ObjectId("5aa7393ba5be7272a99b042c"),
	"name" : "zhang",
	"age" : "14"
}
複製代碼

在name和age兩個鍵分別建立了索引(_id自帶索引):數據庫

db.students.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"name" : 1
		},
		"name" : "name_1",
		"ns" : "test.students"
	},
	{
		"v" : 1,
		"key" : {
			"age" : 1
		},
		"name" : "age_1",
		"ns" : "test.students"
	}
]
複製代碼

當進行多鍵查詢時,能夠經過explian()分析執行狀況(結果僅保留winningPlan):json

db.students.find({name:"zhang",age:"14"}).explain()
"winningPlan":
{
    "stage": "FETCH",
    "filter":
    {
        "name":
        {
            "$eq": "zhang"
        }
    },
    "inputStage":
    {
        "stage": "IXSCAN",
        "keyPattern":
        {
            "age": 1
        },
        "indexName": "age_1",
        "isMultiKey": false,
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 1,
        "direction": "forward",
        "indexBounds":
        {
            "age": [
                "[\"14\", \"14\"]"
            ]
        }
    }
}
複製代碼

由winningPlan可知,這個查詢依次分爲IXSCANFETCH兩個階段。IXSCAN即索引掃描,使用的是age索引;FETCH即根據索引去查詢文檔,查詢的時候須要使用name進行過濾。bash

爲name和age建立複合索引:性能

db.students.createIndex({name:1,age:1})

db.students.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"name" : 1,
			"age" : 1
		},
		"name" : "name_1_age_1",
		"ns" : "test.students"
	}
]
複製代碼

有了複合索引以後,同一個查詢的執行方式就不一樣了:ui

db.students.find({name:"zhang",age:"14"}).explain()
"winningPlan":
{
    "stage": "FETCH",
    "inputStage":
    {
        "stage": "IXSCAN",
        "keyPattern":
        {
            "name": 1,
            "age": 1
        },
        "indexName": "name_1_age_1",
        "isMultiKey": false,
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 1,
        "direction": "forward",
        "indexBounds":
        {
            "name": [
                "[\"zhang\", \"zhang\"]"
            ],
            "age": [
                "[\"14\", \"14\"]"
            ]
        }
    }
}
複製代碼

由winningPlan可知,這個查詢的順序沒有變化,依次分爲IXSCANFETCH兩個階段。可是,IXSCAN使用的是name與age的複合索引;FETCH即根據索引去查詢文檔,不須要過濾。spa

這個示例的數據量過小,並不能看出什麼問題。可是實際上,當數據量很大,IXSCAN返回的索引比較多時,FETCH時進行過濾將很是耗時。接下來將介紹一個真實的案例。.net

定位MongoDB性能問題

隨着接收的錯誤數據不斷增長,咱們Fundebug已經累計處理3.5億錯誤事件,這給咱們的服務不斷帶來性能方面的挑戰,尤爲對於MongoDB集羣來講。

對於生產數據庫,配置profile,能夠記錄MongoDB的性能數據。執行如下命令,則全部超過1s的數據庫讀寫操做都會被記錄下來。

db.setProfilingLevel(1,1000)
複製代碼

查詢profile所記錄的數據,會發現events集合的某個查詢很是慢:

db.system.profile.find().pretty()
{
	"op" : "command",
	"ns" : "fundebug.events",
	"command" : {
		"count" : "events",
		"query" : {
			"createAt" : {
				"$lt" : ISODate("2018-02-05T20:30:00.073Z")
			},
			"projectId" : ObjectId("58211791ea2640000c7a3fe6")
		}
	},
	"keyUpdates" : 0,
	"writeConflicts" : 0,
	"numYield" : 1414,
	"locks" : {
		"Global" : {
			"acquireCount" : {
				"r" : NumberLong(2830)
			}
		},
		"Database" : {
			"acquireCount" : {
				"r" : NumberLong(1415)
			}
		},
		"Collection" : {
			"acquireCount" : {
				"r" : NumberLong(1415)
			}
		}
	},
	"responseLength" : 62,
	"protocol" : "op_query",
	"millis" : 28521,
	"execStats" : {

	},
	"ts" : ISODate("2018-03-07T20:30:59.440Z"),
	"client" : "192.168.59.226",
	"allUsers" : [ ],
	"user" : ""
}
複製代碼

events集合中有數億個文檔,所以count操做比較慢也不算太意外。根據profile數據,這個查詢耗時28.5s,時間長得有點離譜。另外,numYield高達1414,這應該就是操做如此之慢的直接緣由。根據MongoDB文檔,numYield的含義是這樣的:

The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation.

這就意味着大量時間消耗在讀取硬盤上,且讀了很是屢次。能夠推測,應該是索引的問題致使的。

不妨使用explian()來分析一下這個查詢(僅保留executionStats):

db.events.explain("executionStats").count({"projectId" : ObjectId("58211791ea2640000c7a3fe6"),createAt:{"$lt" : ISODate("2018-02-05T20:30:00.073Z")}})
"executionStats":
{
    "executionSuccess": true,
    "nReturned": 20853,
    "executionTimeMillis": 28055,
    "totalKeysExamined": 28338,
    "totalDocsExamined": 28338,
    "executionStages":
    {
        "stage": "FETCH",
        "filter":
        {
            "createAt":
            {
                "$lt": ISODate("2018-02-05T20:30:00.073Z")
            }
        },
        "nReturned": 20853,
        "executionTimeMillisEstimate": 27815,
        "works": 28339,
        "advanced": 20853,
        "needTime": 7485,
        "needYield": 0,
        "saveState": 1387,
        "restoreState": 1387,
        "isEOF": 1,
        "invalidates": 0,
        "docsExamined": 28338,
        "alreadyHasObj": 0,
        "inputStage":
        {
            "stage": "IXSCAN",
            "nReturned": 28338,
            "executionTimeMillisEstimate": 30,
            "works": 28339,
            "advanced": 28338,
            "needTime": 0,
            "needYield": 0,
            "saveState": 1387,
            "restoreState": 1387,
            "isEOF": 1,
            "invalidates": 0,
            "keyPattern":
            {
                "projectId": 1
            },
            "indexName": "projectId_1",
            "isMultiKey": false,
            "isUnique": false,
            "isSparse": false,
            "isPartial": false,
            "indexVersion": 1,
            "direction": "forward",
            "indexBounds":
            {
                "projectId": [
                    "[ObjectId('58211791ea2640000c7a3fe6'), ObjectId('58211791ea2640000c7a3fe6')]"
                ]
            },
            "keysExamined": 28338,
            "dupsTested": 0,
            "dupsDropped": 0,
            "seenInvalidated": 0
        }
    }
}
複製代碼

可知,events集合並無爲projectId與createAt創建複合索引,所以IXSCAN階段採用的是projectId索引,其nReturned爲28338; FETCH階段須要根據createAt進行過濾,其nReturned爲20853,過濾掉了7485個文檔;另外,IXSCAN與FETCH階段的executionTimeMillisEstimate分別爲30ms27815ms,所以基本上全部時間都消耗在了FETCH階段,這應該是讀取硬盤致使的。

建立複合索引

沒有爲projectId和createAt建立複合索引是個尷尬的錯誤,趕忙補救一下:

db.events.createIndex({projectId:1,createTime:-1},{background: true})
複製代碼

在生產環境構建索引這種事最好是晚上作,這個命令一共花了大概7個小時吧!background設爲true,指的是不要阻塞數據庫的其餘操做,保證數據庫的可用性。可是,這個命令會一直佔用着終端,這時不能使用CTRL + C,不然會終止索引構建過程。

複合索引建立成果以後,前文的查詢就快了不少(僅保留executionStats):

db.javascriptevents.explain("executionStats").count({"projectId" : ObjectId("58211791ea2640000c7a3fe6"),createAt:{"$lt" : ISODate("2018-02-05T20:30:00.073Z")}})
"executionStats":
{
    "executionSuccess": true,
    "nReturned": 0,
    "executionTimeMillis": 47,
    "totalKeysExamined": 20854,
    "totalDocsExamined": 0,
    "executionStages":
    {
        "stage": "COUNT",
        "nReturned": 0,
        "executionTimeMillisEstimate": 50,
        "works": 20854,
        "advanced": 0,
        "needTime": 20853,
        "needYield": 0,
        "saveState": 162,
        "restoreState": 162,
        "isEOF": 1,
        "invalidates": 0,
        "nCounted": 20853,
        "nSkipped": 0,
        "inputStage":
        {
            "stage": "COUNT_SCAN",
            "nReturned": 20853,
            "executionTimeMillisEstimate": 50,
            "works": 20854,
            "advanced": 20853,
            "needTime": 0,
            "needYield": 0,
            "saveState": 162,
            "restoreState": 162,
            "isEOF": 1,
            "invalidates": 0,
            "keysExamined": 20854,
            "keyPattern":
            {
                "projectId": 1,
                "createAt": -1
            },
            "indexName": "projectId_1_createTime_-1",
            "isMultiKey": false,
            "isUnique": false,
            "isSparse": false,
            "isPartial": false,
            "indexVersion": 1
        }
    }
}
複製代碼

可知,count操做使用了projectId和createAt的複合索引,所以很是快,只花了46ms,性能提高了將近**600倍!!!**對比使用複合索引先後的結果,發現totalDocsExamined從28338降到了0,表示使用複合索引以後再也不須要去查詢文檔,只須要掃描索引就行了,這樣就不須要去訪問磁盤了,天然快了不少。

參考

相關文章
相關標籤/搜索