MongoDB乾貨系列2-MongoDB執行計劃分析詳解（3）

時間 2019-11-21

標籤 mongodb 乾貨系列執行計劃分析詳解欄目 MongoDB 简体版

原文原文鏈接

寫在以前的話

做爲近年最爲火熱的文檔型數據庫，MongoDB受到了愈來愈多人的關注，可是因爲國內的MongoDB相關技術分享屈指可數，很多朋友向我抱怨無從下手。sql

《MongoDB乾貨系列》將從實際應用的角度來進行MongoDB的一些列乾貨的分享，將覆蓋調優，troubleshooting等方面，但願能對你們帶來幫助。mongodb

若是但願瞭解更多MongoDB基礎的信息，還請你們Google下。數據庫

要保證數據庫處於高效、穩定的狀態，除了良好的硬件基礎、高效高可用的數據庫架構、貼合業務的數據模型以外，高效的查詢語句也是不可少的。那麼，如何查看並判斷咱們的執行計劃呢？咱們今天就來談論下MongoDB的執行計劃分析。服務器

引子

MongoDB 3.0以後，explain的返回與使用方法與以前版本有了很多變化，介於3.0以後的優秀特點，本文僅針對MongoDB 3.0+的explain進行討論。架構

現版本explain有三種模式，分別以下：性能

queryPlanner測試
executionStats優化
allPlansExecutionspa

因爲文章字數緣由，本系列將分爲三個部分。
第一部分
 第二部分
 第三部分翻譯

本文是MongoDB執行計劃分析詳解的最後一個部分，咱們將對該如何分析exlain信息進行詳細解讀，並將針對實例進行explain分析詳解。

正文

對Explain返回逐層分析

第一層，executionTimeMillis。

首先，最爲直觀explain返回值是executionTimeMillis值，指的是咱們這條語句的執行時間，這個值固然是但願越少越好。

且executionTimeMillis 與stage有一樣的層數，即：

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 29861,
                "executionTimeMillis" : 66948,
                "totalKeysExamined" : 29861,
                "totalDocsExamined" : 29861,
                "executionStages" : {
                        "stage" : "FETCH",
                        "nReturned" : 29861,
                        "executionTimeMillisEstimate" : 66244,
                        "works" : 29862,
                        "advanced" : 29861,
                        "needTime" : 0,
                        "needFetch" : 0,
                        "saveState" : 2934,
                        "restoreState" : 2934,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "docsExamined" : 29861,
                        "alreadyHasObj" : 0,
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 29861,
                                "executionTimeMillisEstimate" : 290,
                                "works" : 29862,
                                "advanced" : 29861,
                                "needTime" : 0,
                                "needFetch" : 0,
                                "saveState" : 2934,
                                "restoreState" : 2934,
                                ...

其中有3個executionTimeMillis，分別是

executionStats.executionTimeMillis

該query的總體查詢時間

executionStats.executionStages.executionTimeMillis

該查詢根據index去檢索document獲取29861條具體數據的時間

executionStats.executionStages.inputStage.executionTimeMillis

該查詢掃描29861行index所用時間

這三個值咱們都但願越少越好，那麼是什麼影響這這三個返回值呢？

拋開硬件因素等不談，咱們來進行下一層的剝離。

第二層，index與document掃描數與查詢返回條目數

這裏主要談3個返回項，nReturned，totalKeysExamined與totalDocsExamined，分別表明該條查詢返回的條目、索引掃描條目和文檔掃描條目。

很好理解，這些都直觀的影響到executionTimeMillis，咱們須要掃描的越少速度越快。

對於一個查詢，咱們最理想的狀態是

nReturned=totalKeysExamined & totalDocsExamined=0

（cover index，僅僅使用到了index，無需文檔掃描，這是最理想狀態。）

或者

nReturned=totalKeysExamined=totalDocsExamined(須要具體狀況具體分析)

（正常index利用，無多餘index掃描與文檔掃描。）

若是有sort的時候，爲了使得sort不在內存中進行，咱們能夠在保證nReturned=totalDocsExamined

的基礎上，totalKeysExamined能夠大於totalDocsExamined與nReturned，由於量級較大的時候內存排序很是消耗性能。

後面咱們會針對例子來進行分析。

第三層，Stage狀態分析

那麼又是什麼影響到了totalKeysExamined與totalDocsExamined呢？就是Stage的類型，Stage的具體含義在上文中有說起，若是認真看的同窗就不難理解爲什麼Stage會影響到totalKeysExamined 和totalDocsExamined從而影響executionTimeMillis了。

此前有講解過stage的類型，這裏再簡單列舉下（具體意義請看上文）

COLLSCAN

IXSCAN

FETCH

SHARD_MERGE

SORT

LIMIT

SKIP

IDHACK

SHARDING_FILTER

COUNT

COUNTSCAN

COUNT_SCAN

SUBPLA

TEXT

PROJECTION

對於普通查詢，咱們最但願看到的組合有這些：

Fetch+IDHACK

Fetch+ixscan

Limit+（Fetch+ixscan）

PROJECTION+ixscan

SHARDING_FILTER+ixscan

等

不但願看到包含以下的stage：

COLLSCAN（全表掃），SORT（使用sort可是無index），不合理的SKIP，SUBPLA（未用到index的$or）

對於count查詢，但願看到的有：

COUNT_SCAN

不但願看到的有:

COUNTSCAN

Explain分析實例

表中數據以下(簡單測試用例，僅10條數據，主要是對explain分析的邏輯進行解析)：

{ "_id" : ObjectId("55b86d6bd7e3f4ccaaf20d70"), "a" : 1, "b" : 1, "c" : 1 }
{ "_id" : ObjectId("55b86d6fd7e3f4ccaaf20d71"), "a" : 1, "b" : 2, "c" : 2 }
{ "_id" : ObjectId("55b86d72d7e3f4ccaaf20d72"), "a" : 1, "b" : 3, "c" : 3 }
{ "_id" : ObjectId("55b86d74d7e3f4ccaaf20d73"), "a" : 4, "b" : 2, "c" : 3 }
{ "_id" : ObjectId("55b86d75d7e3f4ccaaf20d74"), "a" : 4, "b" : 2, "c" : 5 }
{ "_id" : ObjectId("55b86d77d7e3f4ccaaf20d75"), "a" : 4, "b" : 2, "c" : 5 }
{ "_id" : ObjectId("55b879b442bfd1a462bd8990"), "a" : 2, "b" : 1, "c" : 1 }
{ "_id" : ObjectId("55b87fe842bfd1a462bd8991"), "a" : 1, "b" : 9, "c" : 1 }
{ "_id" : ObjectId("55b87fe942bfd1a462bd8992"), "a" : 1, "b" : 9, "c" : 1 }
{ "_id" : ObjectId("55b87fe942bfd1a462bd8993"), "a" : 1, "b" : 9, "c" : 1 }

查詢語句：

db.d.find({a:1,b:{$lt:3}}).sort({c:-1})

首先，咱們看看沒有index時候的查詢計劃

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 2,
                "executionTimeMillis" : 0,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 10,
                "executionStages" : {
                        "stage" : "SORT",
                        "nReturned" : 2,
                        ...
                        "sortPattern" : {
                                "c" : -1
                        },
                        "memUsage" : 126,
                        "memLimit" : 33554432,
                        "inputStage" : {
                                "stage" : "COLLSCAN",
                                "filter" : {
                                        "$and" : [
                                                {
                                                        "a" : {
                                                                "$eq" : 1
                                                        }
                                                },
                                                {
                                                        "b" : {
                                                                "$lt" : 3
                                                        }
                                                }
                                        ]
                                },
                                "nReturned" : 2,
                               ...
                                "direction" : "forward",
                                "docsExamined" : 10
                        }

nReturned爲2，符合的條件的返回爲2條。

totalKeysExamined爲0，沒有使用index。

totalDocsExamined爲10，掃描了全部記錄。

executionStages.stage爲SORT,未使用index的sort，佔用的內存與內存限制爲」memUsage」 : 126, 「memLimit」 : 33554432。

executionStages.inputStage.stage爲COLLSCAN，全表掃描，掃描條件爲

"filter" : {
    "$and" : [
        {
            "a" : {
                  "$eq" : 1
                 }
        },
        {
            "b" : {
                "$lt" : 3
                }
        }
            ]
        },

很明顯，沒有index的時候，進行了全表掃描，沒有使用到index，在內存中sort，很顯然，和都是不可取的。

下面，咱們來對sort項c加一個索引

db.d.ensureIndex({c:1})

再來看看執行計劃

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 2,
                "executionTimeMillis" : 1,
                "totalKeysExamined" : 10,
                "totalDocsExamined" : 10,
                "executionStages" : {
                        "stage" : "FETCH",
                        "filter" : {
                                "$and" : [
                                        {
                                                "a" : {
                                                        "$eq" : 1
                                                }
                                        },
                                        {
                                                "b" : {
                                                        "$lt" : 3
                                                }
                                        }
                                ]
                        },
                        "nReturned" : 2,
                        ...
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 10,
                               ...
                                "keyPattern" : {
                                        "c" : 1
                                },
                                "indexName" : "c_1",
                                "isMultiKey" : false,
                                "direction" : "backward",
                                "indexBounds" : {
                                        "c" : [
                                                "[MaxKey, MinKey]"
                                        ]
                                },

咱們發現，Stage沒有了SORT，由於咱們sort字段有了index，可是因爲查詢仍是沒有index，故totalDocsExamined仍是10，可是因爲sort用了index，totalKeysExamined也是10，可是僅對sort排序作了優化，查詢性能仍是同樣的低效。

接下來，咱們對查詢條件作index（作多種index方案尋找最優）

咱們的查詢語句依然是:

db.d.find({a:1,b:{$lt:3}}).sort({c:-1})

使用db.d.ensureIndex({b:1,a:1,c:1})索引的執行計劃：

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 2,
                "executionTimeMillis" : 0,
                "totalKeysExamined" : 4,
                "totalDocsExamined" : 2,
                "executionStages" : {
                        "stage" : "SORT",
                        "nReturned" : 2,
                        ...
                        "sortPattern" : {
                                "c" : -1
                        },
                        "memUsage" : 126,
                        "memLimit" : 33554432,
                        "inputStage" : {
                                "stage" : "FETCH",
                                "nReturned" : 2,
                                ...
                                "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "nReturned" : 2,
                                       ...
                                        "keyPattern" : {
                                                "b" : 1,
                                                "a" : 1,
                                                "c" : 1
                                        },
                                        "indexName" : "b_1_a_1_c_1",
                                        "isMultiKey" : false,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                                "b" : [
                                                        "[-inf.0, 3.0)"
                                                ],
                                                "a" : [
                                                        "[1.0, 1.0]"
                                                ],
                                                "c" : [
                                                        "[MinKey, MaxKey]"
                                                ]
                                        },

咱們能夠看到

nReturned爲2，返回2條記錄

totalKeysExamined爲4，掃描了4個index

totalDocsExamined爲2，掃描了2個docs

此時nReturned=totalDocsExamined<totalKeysExamined，不符合咱們的指望。

且executionStages.Stage爲Sort，在內存中進行排序了，也不符合咱們的指望

使用db.d.ensureIndex({a:1,b:1,c:1})索引的執行計劃：

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 2,
                "executionTimeMillis" : 0,
                "totalKeysExamined" : 2,
                "totalDocsExamined" : 2,
                "executionStages" : {
                        "stage" : "SORT",
                        "nReturned" : 2,
                        ...
                        "sortPattern" : {
                                "c" : -1
                        },
                        "memUsage" : 126,
                        "memLimit" : 33554432,
                        "inputStage" : {
                                "stage" : "FETCH",
                                "nReturned" : 2,
                                ...
                                "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "nReturned" : 2,
                                        ...
                                        "keyPattern" : {
                                                "a" : 1,
                                                "b" : 1,
                                                "c" : 1
                                        },
                                        "indexName" : "a_1_b_1_c_1",
                                        "isMultiKey" : false,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                                "a" : [
                                                        "[1.0, 1.0]"
                                                ],
                                                "b" : [
                                                        "[-inf.0, 3.0)"
                                                ],
                                                "c" : [
                                                        "[MinKey, MaxKey]"
                                                ]
                                        },

咱們能夠看到

nReturned爲2，返回2條記錄

totalKeysExamined爲2，掃描了2個index

totalDocsExamined爲2，掃描了2個docs

此時nReturned=totalDocsExamined=totalKeysExamined，符合咱們的指望。看起來很美吧？

可是，可是，可是！重要的事情說三遍！executionStages.Stage爲Sort，在內存中進行排序了，這個在生產環境中尤爲是在數據量較大的時候，是很是消耗性能的，這個千萬不能忽視了，咱們須要改進這個點。

最後，咱們要在nReturned=totalDocsExamined的基礎上，讓排序也使用index，咱們使用db.d.ensureIndex({a:1,c:1,b:1})索引,執行計劃以下：

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 2,
                "executionTimeMillis" : 0,
                "totalKeysExamined" : 4,
                "totalDocsExamined" : 2,
                "executionStages" : {
                        "stage" : "FETCH",
                        "nReturned" : 2,
                         ...
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 2,
                                 ...
                                "keyPattern" : {
                                        "a" : 1,
                                        "c" : 1,
                                        "b" : 1
                                },
                                "indexName" : "a_1_c_1_b_1",
                                "isMultiKey" : false,
                                "direction" : "backward",
                                "indexBounds" : {
                                        "a" : [
                                                "[1.0, 1.0]"
                                        ],
                                        "c" : [
                                                "[MaxKey, MinKey]"
                                        ],
                                        "b" : [
                                                "(3.0, -inf.0]"
                                        ]
                                },
                                "keysExamined" : 4,
                                "dupsTested" : 0,
                                "dupsDropped" : 0,
                                "seenInvalidated" : 0,
                                "matchTested" : 0

咱們能夠看到

nReturned爲2，返回2條記錄

totalKeysExamined爲4，掃描了4個index

totalDocsExamined爲2，掃描了2個docs

雖然不是nReturned=totalKeysExamined=totalDocsExamined，可是Stage無Sort，即利用了index進行排序，而非內存，這個性能的提高高於多掃幾個index的代價。

綜上能夠有一個小結論，當查詢覆蓋精確匹配，範圍查詢與排序的時候，

{精確匹配字段,排序字段,範圍查詢字段}這樣的索引排序會更爲高效。

後文

執行計劃分析一文，到此便告一段落了，但願你們可以對於MongoDB的執行計劃有所瞭解。

關於做者

周李洋，社區經常使用ID eshujiushiwo，關注Mysql與MongoDB技術，數據架構，服務器架構等，現就任於DeNA，mongo-mopre，mongo-mload做者，任CSDN mongodb版主，MongoDB上海用戶組發起人，MongoDB官方翻譯組核心成員，MongoDB中文站博主，MongoDB Contribution Award得到者，MongoDB Days Beijing 2014演講嘉賓。聯繫方式：378013446 MongoDB上海用戶組：192202324 歡迎交流。