技術分享 | MongoDB 一次排序超過內存限制的排查

時間 2019-11-06

標籤技術分享 mongodb 一次排序超過內存限制排查欄目 MongoDB 简体版

原文原文鏈接

做者：任仲禹html

本文目錄：mongodb

1、背景shell

1. 配置參數檢查測試

2. 排序字段是否存在索引優化

2、測試環境模擬索引對排序的影響this

1. 測試環境信息spa

2. 報錯語句的執行計劃解釋 3. 創建新的組合索引進行測試rest

3、引伸的組合索引問題日誌

1. 查詢語句中，排序字段 _id 使用降序code

2. 查詢語句中，排序字段 Num 和 _id 所有使用降序

4、引伸的聚合查詢問題

1.Sort stage 使用內存排序

5、結論

1. 排序內存限制的問題

2. 使排序操做使用到索引

1) 爲查詢語句建立合適的索引

2) 注意前綴索引的使用

3.聚合查詢添加allowDiskUse選項

6、參考文獻

1、背景

某次在客戶現場處理一塊兒APP業務中頁面訪問異常的問題，該頁面直接是返回一行行碩大的報錯代碼，錯誤大概以下所示：

MongoDB.Driver.MongoQueryException: QueryFailure flag was Executor error: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit

報錯頁面很明顯告知了問題排查的方向：

Sort operation 該頁面涉及的MongoDB查詢語句使用了排序。
more than the maximum 33554432 排序操做超過了MongoDB單個Session排序可以使用的最大內存限制。

檢索MongoDB的日誌確實存在大量的查詢報錯，跟APP頁面報錯可以對應上；而且日誌中排序使用的字段爲DT 和 _id ，升序排序。

涉及業務敏感字，全文會略過、改寫或使用'xxx'代替
2019-XX-XXTXX:XX:XX.XXX+0800 E QUERY [conn3644666] Plan executor error during find: FAILURE, ·········· sortPattern: {DT: 1, _id: 1 }, memUsage: 33555513, memLimit: 33554432, ·············· }
2019-XX-XXTXX:XX:XX.XXX+0800 I QUERY [conn3644666] assertion 17144 Executor error: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit. ns:XXXXX query:{ $query:········ $orderby: { DT: 1, _id: 1 }, $hint: { CID: 1, CVX: 1 } }

1. 配置參數檢查

MongoDB Server中確認了對於Sort排序可以支持的最大內存限制爲32M。

> use admin
switched to db admin
> db.runCommand({ getParameter : 1, "internalQueryExecMaxBlockingSortBytes" : 1 } )
{ "internalQueryExecMaxBlockingSortBytes" : 33554432, "ok" : 1 }

2. 排序字段是否存在索引

根據報錯信息的建議，查看官方文檔的解釋：

In MongoDB, sort operations can obtain the sort order by retrieving documents based on the ordering in an index. If the query planner cannot obtain the sort order from an index, it will sort the results in memory. Sort operations that use an index often have better performance than those that do not use an index. In addition, sort operations that do not use an index will abort when they use 32 megabytes of memory.

文檔中意思大概是：在排序字段未利用到索引的狀況下，若超過32M內存則會被Abort，語句直接返回報錯。

那麼如今方向基本能夠鎖定在排序操做是否使用到索引了；查看該集合狀態，排序字段 DT 和 _id確實存在索引_id_、 DT_1 、 DT_1_CID_1_id_1 ，爲啥還會報錯？帶着疑問咱們下文在測試環境進行模擬。

> db.xxx.getIndexes()
[
·········
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "xxx.xxx"
},
{
"v" : 1,
"key" : {
"DT" : 1
},
"name" : "DT_1",
"ns" : "xxx.xxx"
},
{
"v" : 1,
"key" : {
"DT" : 1,
"CID" : 1,
"_id" : 1
},
"name" : "DT_1_CID_1_id_1",
"ns" : "xxx.xxx"
}
···········

> db.xxx.getIndexes()
[
·········
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "xxx.xxx"
},
{
"v" : 1,
"key" : {
"DT" : 1
},
"name" : "DT_1",
"ns" : "xxx.xxx"
},
{
"v" : 1,
"key" : {
"DT" : 1,
"CID" : 1,
"_id" : 1
},
"name" : "DT_1_CID_1_id_1",
"ns" : "xxx.xxx"
}
···········

2、測試環境模擬索引對排序的影響

1.測試環境信息

MongoDB版本	4.0.10
MongoDB 存儲引擎	wiredTiger
數據量	1000000
測試集合名	data_test

集合數據存儲格式

> db.data_test.findOne()
{
"_id" : ObjectId("5d0872dc5f13ad3173457186"),
"Name" : "Edison",
"Num" : 195930,
"loc" : {
"type" : "Point",
"coordinates" : [
118.0222094243601,
36.610739264097646
]
}
}

集合索引信息

> db.data_test.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mongobench.data_test"
},
{
"v" : 2,
"key" : {
"Name" : 1
},
"name" : "Name_1",
"ns" : "mongobench.data_test"
},
{
"v" : 2,
"key" : {
"Num" : 1
},
"name" : "Num_1",
"ns" : "mongobench.data_test"
},
{
"v" : 2,
"key" : {
"Num" : 1,
"Name" : 1,
"_id" : 1
},
"name" : "Num_1_Name_1__id_1",
"ns" : "mongobench.data_test"
}
]

查詢語句

爲測試方便，將業務中報錯的聚合查詢按一樣查詢邏輯修改成 Mongo Shell 中的普通 find() 查詢

2. 報錯語句的執行計劃解釋

測試查詢報錯的語句，嘗試查看其查詢計劃以下：

> db.data_test.find({'Num':{"$gt":500000}}).sort({"Num":1,"_id":1}).explain()
2019-06-19T18:21:14.745+0800 E QUERY [js] Error: explain failed: {
"ok" : 0,
"errmsg" : "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",
"code" : 96,
"codeName" : "OperationFailed"
}

直接報錯，這裏有個疑問爲啥連執行計劃都看不了？先不急，咱們先刪除對於排序字段的組合索引 Num_1_Name_1_id_1 後，再查看執行計劃：

> db.data_test.dropIndex('Num_1_Name_1__id_1')
{ "nIndexesWas" : 4, "ok" : 1 }
db.data_test.find({'Num':{"$gt":500000}}).sort({"Num":1,"_id":1}).explain('executionStats')
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mongobench.data_test",
"indexFilterSet" : false,
"parsedQuery" : {
"Num" : {
"$gt" : 500000
}
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"Num" : 1,
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
·······
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : false,
"errorMessage" : "Exec error resulting in state FAILURE :: caused by :: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",
"errorCode" : 96,
"nReturned" : 0,
"executionTimeMillis" : 1504,
"totalKeysExamined" : 275037,
"totalDocsExamined" : 275037,
"executionStages" : {
"stage" : "SORT",
"nReturned" : 0,
"executionTimeMillisEstimate" : 188,
····
"memUsage" : 33554514,
"memLimit" : 33554432,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"nReturned" : 275037,
·····

查詢計劃中關鍵參數的解釋：

1. queryPlanner：explain中三種模式之一，默認模式。表示不會執行查詢語句而是選出最優的查詢計劃即winning plan，剩餘兩種模式分別是 executionStats 和 allPlansExecution

winningPlan：MongoDB優化器選擇的最優執行計劃

[1]stage：包括COLLSCAN 全表掃描、IXSCAN 索引掃描、FETCH 根據索引去檢索指定文檔、SORT 在內存中進行排序（未使用索引）

[2]sortPattern：需排序的字段

[3]inputStage：winningPlan.stage的子階段

rejectedPlans：優化器棄用的執行計劃

2. executionStats：返回執行結果的狀態，如語句成功或失敗等

executionSuccess：語句執行是否成功
errorMessage：錯誤信息
nReturned：返回的記錄數
totalKeysExamined：索引掃描總行數
totalDocsExamined：文檔掃描總行數
memUsage：Sort 使用內存排序操做使用的內存大小
memLimit：MongoDB 內部限制Sort操做的最大內存

上述執行計劃代表查詢語句在未使用索引排序的狀況下若是排序使用的內存超過32M一定會報錯，那麼爲何沒有使用到索引排序，是否是跟組合索引的順序有關？

3. 創建新的組合索引進行測試

直接建立 Num 和 _id 列都爲升序的組合索引，再次查看執行計劃：

> db.data_test.ensureIndex({Num:1,_id:1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 3,
"numIndexesAfter" : 4,
"ok" : 1
}
> db.data_test.find({'Num':{"$gt":500000}}).sort({"Num":1,"_id":1}).explain('executionStats')
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mongobench.data_test",
"indexFilterSet" : false,
"parsedQuery" : {
"Num" : {
"$gt" : 500000
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"Num" : 1,
"_id" : 1
},
"indexName" : "Num_1__id_1",
·········
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"Num" : 1,
"_id" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
·········
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 499167,
"executionTimeMillis" : 1355,
"totalKeysExamined" : 499167,
"totalDocsExamined" : 499167,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 499167,
"executionTimeMillisEstimate" : 102,
"works" : 499168,
"advanced" : 499167,
"needTime" : 0,
"needYield" : 0,
"saveState" : 3901,
"restoreState" : 3901,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 499167,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 499167,
"executionTimeMillisEstimate" : 14,
"works" : 499168,
·······

上述執行計劃說明：

winningPlan.stage：優化器選擇了FETCH+IXSCAN的Stage，而不是以前的Sort；這是最優的方式之一，也就是經過索引檢索指定的文檔數據，並在索引中完成排序（」keyPattern」 : {「Num」 : 1,」_id」 : 1}），效率最高
rejectedPlans：Sort 使用內存排序的方式被優化器棄用
executionSuccess：語句執行成功
nReturned：語句返回結果數爲499167

3、引伸的組合索引問題

上文中查詢語句explain()直接報錯，是由於組合索引爲{Num_1_Name_1_id_1}，而查詢語句爲sort({「Num」:1,」_id」:1})，未遵循最左原則，索引沒法被使用到然後優化器選擇Sort Stage觸發了內存限制並Abort。

至於爲啥MongoDB連執行計劃都不返回給你，能夠後續再討論，歡迎評論

建立合適的組合索引後，查詢語句成功執行；那麼若是不按照索引的升降順序執行語句會怎樣？

1.查詢語句中，排序字段 _id 使用降序

當前的組合索引爲{「key」 : {「Num」 : 1, 「_id」 : 1} },也就是都爲升序，而咱們將查詢語句中排序字段 _id使用降序排序時，查詢語句直接報錯，說明該語句也未使用到索引排序，而是使用的Sort Stage。

> db.data_test.find({'Num':{"$gt":500000}}).sort({"Num":1,"_id":-1}).explain('executionStats')
2019-06-19T19:32:30.939+0800 E QUERY [js] Error: explain failed: {
"ok" : 0,
"errmsg" : "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",
"code" : 96,
"codeName" : "OperationFailed"
}

2.查詢語句中，排序字段 Num 和 _id 所有使用降序

咱們如今將查詢語句的排序字段所有使用降序，與組合索引所有相反再測試，執行成功。

> db.data_test.find({'Num':{"$gt":500000}}).sort({"Num":-1,"_id":-1}).explain('executionStats')
{
"queryPlanner" : {
······
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"Num" : 1,
"_id" : 1
},
"indexName" : "Num_1__id_1",
·······
"rejectedPlans" : [
{
"stage" : "SORT",
·······
"executionStats" : {
"executionSuccess" : true,
·······
"inputStage" : {
"stage" : "IXSCAN",
·······
"indexName" : "Num_1__id_1",
······
"ok" : 1
}

再次作其餘查詢組合測試 sort({「Num」:-1,」_id」:1})，執行依然失敗；說明只有在排序列的升降序只有和組合索引中的方向保持 所有相同 或 所有相反，語句執行才能成功。

4、引伸的聚合查詢問題

上文中的查詢測試語句是在 MongoDB Shell 執行的 find() 查詢方法，可是業務程序中查詢通常都是使用聚合查詢方法 aggregate()，對於聚合查詢中的Sort Stage，官方文檔說明了使用內存排序能使用最大的內存爲100M，若須要避免報錯則須要添加 {allowDiskUse : true} 參數。

The $sort stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $sort will produce an error. To allow for the handling of large datasets, set the allowDiskUse option to true to enable $sort operations to write to temporary files. See the allowDiskUse option in db.collection.aggregate() method and the aggregate command for details.

1.Sort stage 使用內存排序

將普通的 find() 方法轉爲 aggregate() 聚合方法，語義不變，特地將排序字段 _id 修改成降序 -1 ，那麼查詢計劃將沒法使用到組合索引只能使用Sort stage。下文中查詢依然報錯，Sort stage操做使用的內存超過100M

> db.data_test.explain('executionStats').aggregate([{ $match : { Num : { $gt : 500000} } },{ $sort : { "Num" : 1, _id: -1 } }])
2019-06-19T20:28:43.859+0800 E QUERY [js] Error: explain failed: {
"ok" : 0,
"errmsg" : "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.",
"code" : 16819,
"codeName" : "Location16819"
} :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
throwOrReturn@src/mongo/shell/explainable.js:31:1
constructor/this.aggregate@src/mongo/shell/explainable.js:121:1
@(shell):1:1

添加 {allowDiskUse: true} 參數,可使Sort stage操做繞過內存限制而使用磁盤，查詢語句能夠執行成功：

> db.data_test.explain('executionStats').aggregate([{ $match : { Num : { $gt : 500000} } },{ $sort : { "Num" : 1, _id: -1 } }],{allowDiskUse: true})
{
"stages" : [
······
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 499167,
"executionTimeMillis" : 4128,
"totalKeysExamined" : 499167,
"totalDocsExamined" : 499167,
······
{
"$sort" : {
"sortKey" : {
"Num" : 1,
"_id" : -1
}
}
}
],
"ok" : 1
}

5、結論

1.排序內存限制的問題

MongoDB使用內存進行排序的場景只有是Sort stage，官方文檔有說明：

If MongoDB can use an index scan to obtain the requested sort order, the result will not include a SORT stage. Otherwise, if MongoDB cannot use the index to sort, the explain result will include a SORT stage.

意思大概是若是MongoDB可使用索引掃描來進行排序，那麼結果將不包括SORT stage。不然若是MongoDB沒法使用索引進行排序，那麼查詢計劃將包括SORT stage。

使用索引掃描的效率是遠大於直接將結果集放在內存排序的，因此MongoDB爲了使查詢語句更有效率的執行，限制了排序內存的使用，於是規定了只能使用 32M，該種考慮是很是合理的。

但也可經過手工調整參數進行修改（不建議）：

# 好比調大到 128M
## 在線調整
> db.adminCommand({setParameter:1, internalQueryExecMaxBlockingSortBytes:134217728})
## 持久到配置文件
setParameter:
internalQueryExecMaxBlockingSortBytes: 134217728

2.使排序操做使用到索引

1)爲查詢語句建立合適的索引若是查詢中排序是單列排序，如sort({「Num」:1})，那麼只需添加爲 Num 列添加索引便可，排序的順序無影響

## 例如索引爲 {'Num':1},查詢無論升/降序均可使用到索引排序
db.data_test.find().sort({Num:1}) 
db.data_test.find().sort({Num:-1})

若是查詢中排序是使用組合排序，如sort({「Num」:1,」id」:1}),那麼須要創建對應的組合索引，如{「key」 : {「Num」 : 1, 「_id」 : 1} 或者 {「key」 : {「Num」 : -1, 「_id」 : -1}

## 例如索引爲{"Num" : 1, "_id" : 1}，能夠用到索引排序的場景爲
db.data_test.find().sort({Num:1,_id:1})
db.data_test.find().sort({Num:-1,_id:-1})

注意保持查詢中組合排序的升降序和組合索引中的方向保持所有相同或所有相反

2)注意前綴索引的使用

上文查詢報錯的案例分析已說明了組合索引每個鍵的順序很是重要，這將決定該組合索引在查詢過程當中可否被使用到，也將是MongoDB的索引及排序一樣需遵循最左前綴原則。

3. 聚合查詢添加allowDiskUse選項

儘量的保證查詢語句的排序可以使用索引排序，但若是業務須要規避排序內存限制報錯的問題，那麼須要在代碼中添加 {allowDiskUse : true} 參數。

6、參考文獻

https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/index.html

https://docs.mongodb.com/manual/reference/operator/aggregation/sort/#sort-memory-limit

https://docs.mongodb.com/manual/reference/explain-results/#executionstats

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。