【MongoDB】MongoDB 性能優化 - BI查詢聚合

在BI服務中經過查詢聚合語句分析定位慢查詢/聚合分析,小結以下:json

  • 慢查詢定位:
    經過Profile分析慢查詢緩存

  • 對於查詢優化
    經過添加相應索引提高查詢速度;性能優化

  • 對於聚合大數據方案:
    首先要說明的一個問題是,對於OLAP型的操做,指望不該該過高。畢竟是對於大量數據的操做,光從IO就已經遠超一般的OLTP操做,因此要求達到OLTP操做的速度和併發是不現實的,也是沒有意義的。但並非說一點優化空間也沒有。session

這樣優化以後預計在能夠提高一部分查詢性能,可是並不能解決。緣由開頭說了,對OLAP就不能指望這麼高,應該從源頭入手,考慮:併發

1) 每次eventType字段和insertTime有更新或插入時就作好計數

2) 每隔一段時間作一次完整的統計,緩存統計結果,查詢的時候直接展示給用戶

問題描述

執行BI服務的接口, 發現返回一天的記錄須要10s左右,這明顯是有問題:
app

問題定位

定位慢查詢

爲了定位查詢,須要查看當前mongo profile的級別, profile的級別有0|1|2,分別表明意思:0表明關閉,1表明記錄慢命令,2表明所有ide

db.getProfilingLevel()

顯示爲0, 表示默認下是沒有記錄的。性能

設置profile級別,設置爲記錄慢查詢模式, 全部超過1000ms的查詢語句都會被記錄下來大數據

db.setProfilingLevel(1, 1000)

再次執行BI一天的查詢接口,查看Profile, 發現確實記錄了這條慢查詢:
優化

分析慢查詢語句

經過view document查看慢查詢的profile記錄

{
    "op" : "command",
    "ns" : "standalone.application_alert",
    "command" : {
        "aggregate" : "application_alert",
        "pipeline" : [ 
            {
                "$match" : {
                    "factoryId" : "10001",
                    "$and" : [ 
                        {
                            "insertTime" : {
                                "$gte" : ISODate("2018-03-25T16:00:00.000Z"),
                                "$lte" : ISODate("2018-03-26T09:04:20.288Z")
                            }
                        }
                    ]
                }
            }, 
            {
                "$project" : {
                    "eventType" : 1,
                    "date" : {
                        "$concat" : [ 
                            {
                                "$substr" : [ 
                                    {
                                        "$year" : [ 
                                            "$insertTime"
                                        ]
                                    }, 
                                    0, 
                                    4
                                ]
                            }, 
                            "-", 
                            {
                                "$substr" : [ 
                                    {
                                        "$month" : [ 
                                            "$insertTime"
                                        ]
                                    }, 
                                    0, 
                                    2
                                ]
                            }, 
                            "-", 
                            {
                                "$substr" : [ 
                                    {
                                        "$dayOfMonth" : [ 
                                            "$insertTime"
                                        ]
                                    }, 
                                    0, 
                                    2
                                ]
                            }
                        ]
                    }
                }
            }, 
            {
                "$group" : {
                    "_id" : {
                        "date" : "$date",
                        "eventType" : "$eventType"
                    },
                    "count" : {
                        "$sum" : 1
                    }
                }
            }
        ]
    },
    "keysExamined" : 0,
    "docsExamined" : 2636052,
    "numYield" : 20651,
    "locks" : {
        "Global" : {
            "acquireCount" : {
                "r" : NumberLong(41310)
            }
        },
        "Database" : {
            "acquireCount" : {
                "r" : NumberLong(20655)
            }
        },
        "Collection" : {
            "acquireCount" : {
                "r" : NumberLong(20654)
            }
        }
    },
    "nreturned" : 0,
    "responseLength" : 196,
    "protocol" : "op_query",
    "millis" : 9484,
    "planSummary" : "COLLSCAN",
    "ts" : ISODate("2018-03-26T08:44:51.322Z"),
    "client" : "10.11.0.118",
    "allUsers" : [ 
        {
            "user" : "standalone",
            "db" : "standalone"
        }
    ],
    "user" : "standalone@standalone"
}

從上面profile中能夠看到咱們執行的BI 查詢接口對應到Mongo執行了一個pipleline:

  • 第一步: match 工廠ID是10001的記錄,時間段是當前一天
{
            "$match" : {
                "factoryId" : "10001",
                "$and" : [ 
                    {
                        "insertTime" : {
                            "$gte" : ISODate("2018-03-25T16:00:00.000Z"),
                            "$lte" : ISODate("2018-03-26T09:04:20.288Z")
                        }
                    }
                ]
            }
        },
  • 第二步: 字段映射,project:
{
                "$project" : {
                    "eventType" : 1,
                    "date" : {
                        "$concat" : [ 
                            {
                                "$substr" : [ 
                                    {
                                        "$year" : [ 
                                            "$insertTime"
                                        ]
                                    }, 
                                    0, 
                                    4
                                ]
                            }, 
                            "-", 
                            {
                                "$substr" : [ 
                                    {
                                        "$month" : [ 
                                            "$insertTime"
                                        ]
                                    }, 
                                    0, 
                                    2
                                ]
                            }, 
                            "-", 
                            {
                                "$substr" : [ 
                                    {
                                        "$dayOfMonth" : [ 
                                            "$insertTime"
                                        ]
                                    }, 
                                    0, 
                                    2
                                ]
                            }
                        ]
                    }
                }
            },

能夠看到除了對event_type作了簡單的project外,還對insertTime字段作了拼接,拼接爲yyyy-MM-dd格式,而且project爲date字段。

  • 第三步: group操做
{
                "$group" : {
                    "_id" : {
                        "date" : "$date",
                        "eventType" : "$eventType"
                    },
                    "count" : {
                        "$sum" : 1
                    }
                }

對#2中的date和event_type進行group,統計不一樣日期和事件類型所對應的事件數量(count).

對應的其它字段:

  • Mills: 花了9484毫秒返回查詢結果
  • ts:命令執行時間
  • info:命令的內容
  • query:表明查詢
  • ns: standalone.application_alert 表明查詢的庫與集合
  • nreturned:返回記錄數及用時
  • reslen:返回的結果集大小,byte數
  • nscanned:掃描記錄數量

若是發現9484毫秒時間比較長,那麼就須要做優化。

一般來講,經驗上能夠對這些指標作參考:

  • 好比nscanned數很大,或者接近記錄總數,那麼可能沒有用到索引查詢。
  • reslen很大,有可能返回不必的字段。
  • nreturned很大,那麼有可能查詢的時候沒有加限制。

查看DB/Server/Collection的狀態

  • DB status

  • 查看Server狀態

因爲server 狀態指標衆多,我這邊只列出來一部分。

{
    "host" : "OPASTORMON", #主機名 
    "version" : "3.4.1", #版本號
    "process" : "mongod", #進程名  
    "pid" : NumberLong(1462), #進程ID  
    "uptime" : 10111875.0, #運行時間 
    "uptimeMillis" : NumberLong(10111875602), #運行時間 
    "uptimeEstimate" : NumberLong(10111875), #運行時間 
    "localTime" : ISODate("2018-03-26T09:14:13.679Z"), #當前時間 
    "asserts" : {
        "regular" : 0,
        "warning" : 0,
        "msg" : 0,
        "user" : 26549,
        "rollovers" : 0
    },
    "connections" : {
        "current" : 104, #當前連接數  
        "available" : 715, #可用連接數
        "totalCreated" : 11275
    },
    "extra_info" : {
        "note" : "fields vary by platform",
        "page_faults" : 49
    },
    "globalLock" : {
        "totalTime" : NumberLong(10111875549000), #總運行時間(ns)
        "currentQueue" : {
            "total" : 0, #當前須要執行的隊列
            "readers" : 0, #讀隊列
            "writers" : 0 #寫隊列
        },
        "activeClients" : {
            "total" : 110, #當前客戶端執行的連接數  
            "readers" : 0, #讀連接數  
            "writers" : 0 #寫連接數 
        }
    },
    "locks" : {
        "Global" : {
            "acquireCount" : {
                "r" : NumberLong(8457368136),
                "w" : NumberLong(1025512487),
                "W" : NumberLong(7)
            },
            "acquireWaitCount" : {
                "r" : NumberLong(2)
            },
            "timeAcquiringMicros" : {
                "r" : NumberLong(94731)
            }
        },
        "Database" : {
            "acquireCount" : {
                "r" : NumberLong(3715927334),
                "w" : NumberLong(1025512452),
                "R" : NumberLong(194),
                "W" : NumberLong(69)
            },
            "acquireWaitCount" : {
                "r" : NumberLong(13),
                "w" : NumberLong(5),
                "R" : NumberLong(6),
                "W" : NumberLong(3)
            },
            "timeAcquiringMicros" : {
                "r" : NumberLong(530972),
                "w" : NumberLong(426173),
                "R" : NumberLong(3207),
                "W" : NumberLong(1321)
            }
        },
        "Collection" : {
            "acquireCount" : {
                "r" : NumberLong(3715046899),
                "w" : NumberLong(1025512453)
            }
        },
        "Metadata" : {
            "acquireCount" : {
                "w" : NumberLong(1),
                "W" : NumberLong(3)
            }
        }
    },
    "network" : {
        "bytesIn" : NumberLong(373939915493), #輸入數據(byte)
        "bytesOut" : NumberLong(961227224728), #輸出數據(byte)
        "physicalBytesIn" : NumberLong(373939915493),#物理輸入數據(byte)
        "physicalBytesOut" : NumberLong(961054421482),#物理輸入數據(byte)
        "numRequests" : NumberLong(3142377739) #請求數  
    },
    "opLatencies" : {
        "reads" : {
            "latency" : NumberLong(3270742192035),
            "ops" : NumberLong(540111914)
        },
        "writes" : {
            "latency" : NumberLong(261946981235),
            "ops" : NumberLong(1024301418)
        },
        "commands" : {
            "latency" : NumberLong(458086641),
            "ops" : NumberLong(6776702)
        }
    },
    "opcounters" : {
        "insert" : 6846448, #插入操做數  
        "query" : 248443106, #查詢操做數
        "update" : 1018594976, #更新操做數  
        "delete" : 1830, #刪除操做數
        "getmore" : 162213, #獲取更多的操做數
        "command" : 298306448 #其餘命令操做數
    },
    "opcountersRepl" : {
        "insert" : 0,
        "query" : 0,
        "update" : 0,
        "delete" : 0,
        "getmore" : 0,
        "command" : 0
    },
    "storageEngine" : {
        "name" : "wiredTiger",
        "supportsCommittedReads" : true,
        "readOnly" : false,
        "persistent" : true
    },
    "tcmalloc" : {
        "generic" : {
            "current_allocated_bytes" : NumberLong(3819325752),
            "heap_size" : NumberLong(6959509504)
        },
        "tcmalloc" : {
            "pageheap_free_bytes" : 199692288,
            "pageheap_unmapped_bytes" : NumberLong(2738442240),
            "max_total_thread_cache_bytes" : NumberLong(1073741824),
            "current_total_thread_cache_bytes" : 35895120,
            "total_free_bytes" : 202049224,
            "central_cache_free_bytes" : 165650360,
            "transfer_cache_free_bytes" : 503744,
            "thread_cache_free_bytes" : 35895120,
            "aggressive_memory_decommit" : 0,
            "formattedString" : "------------------------------------------------\nMALLOC:     3819325752 ( 3642.4 MiB) Bytes in use by application\nMALLOC: +    199692288 (  190.4 MiB) Bytes in page heap freelist\nMALLOC: +    165650360 (  158.0 MiB) Bytes in central cache freelist\nMALLOC: +       503744 (    0.5 MiB) Bytes in transfer cache freelist\nMALLOC: +     35895120 (   34.2 MiB) Bytes in thread cache freelists\nMALLOC: +     40001728 (   38.1 MiB) Bytes in malloc metadata\nMALLOC:   ------------\nMALLOC: =   4261068992 ( 4063.7 MiB) Actual memory used (physical + swap)\nMALLOC: +   2738442240 ( 2611.6 MiB) Bytes released to OS (aka unmapped)\nMALLOC:   ------------\nMALLOC: =   6999511232 ( 6675.3 MiB) Virtual address space used\nMALLOC:\nMALLOC:         521339              Spans in use\nMALLOC:            115              Thread heaps in use\nMALLOC:           4096              Tcmalloc page size\n------------------------------------------------\nCall ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).\nBytes released to the OS take up virtual address space but no physical memory.\n"
        }
    },
    "mem" : {
        "bits" : 64, #64位系統  
        "resident" : 4103, #佔有物理內存數  
        "virtual" : 7045, #佔有虛擬內存  
        "supported" : true, #是否支持擴展內存  
        "mapped" : 0,
        "mappedWithJournal" : 0
    },
    "ok" : 1.0
}
  • 查看application_alert這個collection的狀態
{
    "ns" : "standalone.application_alert",
    "size" : 783852548,
    "count" : 2638262,
    "avgObjSize" : 297,
    "storageSize" : 189296640,
    "capped" : false,
    "wiredTiger" : {
        "metadata" : {
            "formatVersion" : 1
        },
        "creationString" : "allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_max=15,merge_min=0),memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u",
        "type" : "file",
        "uri" : "statistics:table:collection-4-6040851502998278747",
        "LSM" : {
            "bloom filter false positives" : 0,
            "bloom filter hits" : 0,
            "bloom filter misses" : 0,
            "bloom filter pages evicted from cache" : 0,
            "bloom filter pages read into cache" : 0,
            "bloom filters in the LSM tree" : 0,
            "chunks in the LSM tree" : 0,
            "highest merge generation in the LSM tree" : 0,
            "queries that could have benefited from a Bloom filter that did not exist" : 0,
            "sleep for LSM checkpoint throttle" : 0,
            "sleep for LSM merge throttle" : 0,
            "total size of bloom filters" : 0
        },
        "block-manager" : {
            "allocations requiring file extension" : 31543,
            "blocks allocated" : 346110,
            "blocks freed" : 124238,
            "checkpoint size" : 189259776,
            "file allocation unit size" : 4096,
            "file bytes available for reuse" : 20480,
            "file magic number" : 120897,
            "file major version number" : 1,
            "file size in bytes" : 189296640,
            "minor version number" : 0
        },
        "btree" : {
            "btree checkpoint generation" : 165242,
            "column-store fixed-size leaf pages" : 0,
            "column-store internal pages" : 0,
            "column-store variable-size RLE encoded values" : 0,
            "column-store variable-size deleted values" : 0,
            "column-store variable-size leaf pages" : 0,
            "fixed-record size" : 0,
            "maximum internal page key size" : 368,
            "maximum internal page size" : 4096,
            "maximum leaf page key size" : 2867,
            "maximum leaf page size" : 32768,
            "maximum leaf page value size" : 67108864,
            "maximum tree depth" : 3,
            "number of key/value pairs" : 0,
            "overflow pages" : 0,
            "pages rewritten by compaction" : 0,
            "row-store internal pages" : 0,
            "row-store leaf pages" : 0
        },
        "cache" : {
            "bytes currently in the cache" : 1014702364,
            "bytes read into cache" : 0,
            "bytes written from cache" : 1888143292.0,
            "checkpoint blocked page eviction" : 0,
            "data source pages selected for eviction unable to be evicted" : 0,
            "hazard pointer blocked page eviction" : 0,
            "in-memory page passed criteria to be split" : 224,
            "in-memory page splits" : 112,
            "internal pages evicted" : 0,
            "internal pages split during eviction" : 0,
            "leaf pages split during eviction" : 0,
            "modified pages evicted" : 2,
            "overflow pages read into cache" : 0,
            "overflow values cached in memory" : 0,
            "page split during eviction deepened the tree" : 0,
            "page written requiring lookaside records" : 0,
            "pages read into cache" : 0,
            "pages read into cache requiring lookaside entries" : 0,
            "pages requested from the cache" : 49191856,
            "pages written from cache" : 217176,
            "pages written requiring in-memory restoration" : 0,
            "unmodified pages evicted" : 0
        },
        "cache_walk" : {
            "Average difference between current eviction generation when the page was last considered" : 0,
            "Average on-disk page image size seen" : 0,
            "Clean pages currently in cache" : 0,
            "Current eviction generation" : 0,
            "Dirty pages currently in cache" : 0,
            "Entries in the root page" : 0,
            "Internal pages currently in cache" : 0,
            "Leaf pages currently in cache" : 0,
            "Maximum difference between current eviction generation when the page was last considered" : 0,
            "Maximum page size seen" : 0,
            "Minimum on-disk page image size seen" : 0,
            "On-disk page image sizes smaller than a single allocation unit" : 0,
            "Pages created in memory and never written" : 0,
            "Pages currently queued for eviction" : 0,
            "Pages that could not be queued for eviction" : 0,
            "Refs skipped during cache traversal" : 0,
            "Size of the root page" : 0,
            "Total number of pages currently in cache" : 0
        },
        "compression" : {
            "compressed pages read" : 0,
            "compressed pages written" : 83604,
            "page written failed to compress" : 0,
            "page written was too small to compress" : 133572,
            "raw compression call failed, additional data available" : 0,
            "raw compression call failed, no additional data available" : 0,
            "raw compression call succeeded" : 0
        },
        "cursor" : {
            "bulk-loaded cursor-insert calls" : 0,
            "create calls" : 78758,
            "cursor-insert key and value bytes inserted" : 795578636,
            "cursor-remove key bytes removed" : 8857,
            "cursor-update value bytes updated" : 0,
            "insert calls" : 2642785,
            "next calls" : 5850718215.0,
            "prev calls" : 3,
            "remove calls" : 4460,
            "reset calls" : 48942545,
            "restarted searches" : 0,
            "search calls" : 10229,
            "search near calls" : 46285468,
            "truncate calls" : 0,
            "update calls" : 0
        },
        "reconciliation" : {
            "dictionary matches" : 0,
            "fast-path pages deleted" : 0,
            "internal page key bytes discarded using suffix compression" : 7946666,
            "internal page multi-block writes" : 60010,
            "internal-page overflow keys" : 0,
            "leaf page key bytes discarded using prefix compression" : 0,
            "leaf page multi-block writes" : 64250,
            "leaf-page overflow keys" : 0,
            "maximum blocks required for a page" : 253,
            "overflow values written" : 0,
            "page checksum matches" : 10496129,
            "page reconciliation calls" : 189077,
            "page reconciliation calls for eviction" : 1,
            "pages deleted" : 7
        },
        "session" : {
            "object compaction" : 0,
            "open cursor count" : 35
        },
        "transaction" : {
            "update conflicts" : 0
        }
    },
    "nindexes" : 1,
    "totalIndexSize" : 24420352,
    "indexSizes" : {
        "_id_" : 24420352
    },
    "ok" : 1.0
}

性能優化

性能優化 - 索引

經過上述的指標,須要優化的話,第一考慮的是查看是否對該collection建立了索引:

  • 查看是否有相關索引

  • 增長相關字段的搜索索引
    發現只有對id的索引,因此接下來對application_alert建立event_type和factory_id,timeStamp字段的索引
db.application_alert.ensureIndex({"insertTime": 1, "eventType": 1});
db.application_alert.ensureIndex({"insertTime": 1});
db.application_alert.ensureIndex({"eventType": 1});
db.application_alert.ensureIndex({"factoryId": 1});

查看增長index後查詢一天的數據聚合須要424ms, 基本能夠接受。

  • 查詢20天,看時間仍然須要20s

  • 經過增長索引小結
    到這裏咱們基本能夠看到添加查詢index對BI接口的影響,索引的添加只是解決了針對索引字段查詢的效率,可是並不能解決查詢以後數據的聚合問題。對一天而言因爲數據量的少,查詢速度提高顯著,可是對大量數據作聚合仍然不合適。

咱們經過增長索引解決了什麼問題?

在沒有索引的前提下,找出100萬條{eventType: "abnormal"}須要多少時間?全表掃描COLLSCAN從700w條數據中找出600w條,跟從1億條數據中找出600w條顯然是兩個概念。命中索引IXSCAN,這個差別就會小不少,幾乎能夠忽略。索引的添加只是解決了針對索引字段查詢的效率,可是並不能解決查詢以後數據的聚合問題。順便應該提一下看效率是否有差別應該看執行計劃,不要看執行時間,時間是不許確的。

性能優化 - 聚合大量數據

那問題是,如何解決這種查詢聚合大量數據的問題呢?

首先要說明的一個問題是,對於OLAP型的操做,指望不該該過高。畢竟是對於大量數據的操做,光從IO就已經遠超一般的OLTP操做,因此要求達到OLTP操做的速度和併發是不現實的,也是沒有意義的。但並非說一點優化空間也沒有。

這樣優化以後預計在能夠提高一部分查詢性能,可是並不能解決。緣由開頭說了,對OLAP就不能指望這麼高。若是你真有這方面的需求,就應該從源頭入手,考慮:

  • 每次info字段有更新或插入時就作好計數
  • 每隔一段時間作一次完整的統計,緩存統計結果,查詢的時候直接展示給用戶
相關文章
相關標籤/搜索