在BI服務中經過查詢聚合語句分析定位慢查詢/聚合分析,小結以下:json
慢查詢定位
:
經過Profile分析慢查詢緩存
對於查詢優化
:
經過添加相應索引提高查詢速度;性能優化
對於聚合大數據方案
:
首先要說明的一個問題是,對於OLAP型的操做,指望不該該過高。畢竟是對於大量數據的操做,光從IO就已經遠超一般的OLTP操做,因此要求達到OLTP操做的速度和併發是不現實的,也是沒有意義的。但並非說一點優化空間也沒有。session
這樣優化以後預計在能夠提高一部分查詢性能,可是並不能解決。緣由開頭說了,對OLAP就不能指望這麼高,應該從源頭入手,考慮:併發
1) 每次eventType字段和insertTime有更新或插入時就作好計數 2) 每隔一段時間作一次完整的統計,緩存統計結果,查詢的時候直接展示給用戶
執行BI服務的接口, 發現返回一天的記錄須要10s左右,這明顯是有問題:
app
爲了定位查詢,須要查看當前mongo profile的級別, profile的級別有0|1|2,分別表明意思:0表明關閉,1表明記錄慢命令,2表明所有ide
db.getProfilingLevel()
顯示爲0, 表示默認下是沒有記錄的。性能
設置profile級別,設置爲記錄慢查詢模式, 全部超過1000ms的查詢語句都會被記錄下來大數據
db.setProfilingLevel(1, 1000)
再次執行BI一天的查詢接口,查看Profile, 發現確實記錄了這條慢查詢:
優化
經過view document查看慢查詢的profile記錄
{ "op" : "command", "ns" : "standalone.application_alert", "command" : { "aggregate" : "application_alert", "pipeline" : [ { "$match" : { "factoryId" : "10001", "$and" : [ { "insertTime" : { "$gte" : ISODate("2018-03-25T16:00:00.000Z"), "$lte" : ISODate("2018-03-26T09:04:20.288Z") } } ] } }, { "$project" : { "eventType" : 1, "date" : { "$concat" : [ { "$substr" : [ { "$year" : [ "$insertTime" ] }, 0, 4 ] }, "-", { "$substr" : [ { "$month" : [ "$insertTime" ] }, 0, 2 ] }, "-", { "$substr" : [ { "$dayOfMonth" : [ "$insertTime" ] }, 0, 2 ] } ] } } }, { "$group" : { "_id" : { "date" : "$date", "eventType" : "$eventType" }, "count" : { "$sum" : 1 } } } ] }, "keysExamined" : 0, "docsExamined" : 2636052, "numYield" : 20651, "locks" : { "Global" : { "acquireCount" : { "r" : NumberLong(41310) } }, "Database" : { "acquireCount" : { "r" : NumberLong(20655) } }, "Collection" : { "acquireCount" : { "r" : NumberLong(20654) } } }, "nreturned" : 0, "responseLength" : 196, "protocol" : "op_query", "millis" : 9484, "planSummary" : "COLLSCAN", "ts" : ISODate("2018-03-26T08:44:51.322Z"), "client" : "10.11.0.118", "allUsers" : [ { "user" : "standalone", "db" : "standalone" } ], "user" : "standalone@standalone" }
從上面profile中能夠看到咱們執行的BI 查詢接口對應到Mongo執行了一個pipleline:
{ "$match" : { "factoryId" : "10001", "$and" : [ { "insertTime" : { "$gte" : ISODate("2018-03-25T16:00:00.000Z"), "$lte" : ISODate("2018-03-26T09:04:20.288Z") } } ] } },
{ "$project" : { "eventType" : 1, "date" : { "$concat" : [ { "$substr" : [ { "$year" : [ "$insertTime" ] }, 0, 4 ] }, "-", { "$substr" : [ { "$month" : [ "$insertTime" ] }, 0, 2 ] }, "-", { "$substr" : [ { "$dayOfMonth" : [ "$insertTime" ] }, 0, 2 ] } ] } } },
能夠看到除了對event_type作了簡單的project外,還對insertTime字段作了拼接,拼接爲yyyy-MM-dd格式,而且project爲date字段。
{ "$group" : { "_id" : { "date" : "$date", "eventType" : "$eventType" }, "count" : { "$sum" : 1 } }
對#2中的date和event_type進行group,統計不一樣日期和事件類型所對應的事件數量(count).
對應的其它字段:
若是發現9484毫秒時間比較長,那麼就須要做優化。
一般來講,經驗上能夠對這些指標作參考:
因爲server 狀態指標衆多,我這邊只列出來一部分。
{ "host" : "OPASTORMON", #主機名 "version" : "3.4.1", #版本號 "process" : "mongod", #進程名 "pid" : NumberLong(1462), #進程ID "uptime" : 10111875.0, #運行時間 "uptimeMillis" : NumberLong(10111875602), #運行時間 "uptimeEstimate" : NumberLong(10111875), #運行時間 "localTime" : ISODate("2018-03-26T09:14:13.679Z"), #當前時間 "asserts" : { "regular" : 0, "warning" : 0, "msg" : 0, "user" : 26549, "rollovers" : 0 }, "connections" : { "current" : 104, #當前連接數 "available" : 715, #可用連接數 "totalCreated" : 11275 }, "extra_info" : { "note" : "fields vary by platform", "page_faults" : 49 }, "globalLock" : { "totalTime" : NumberLong(10111875549000), #總運行時間(ns) "currentQueue" : { "total" : 0, #當前須要執行的隊列 "readers" : 0, #讀隊列 "writers" : 0 #寫隊列 }, "activeClients" : { "total" : 110, #當前客戶端執行的連接數 "readers" : 0, #讀連接數 "writers" : 0 #寫連接數 } }, "locks" : { "Global" : { "acquireCount" : { "r" : NumberLong(8457368136), "w" : NumberLong(1025512487), "W" : NumberLong(7) }, "acquireWaitCount" : { "r" : NumberLong(2) }, "timeAcquiringMicros" : { "r" : NumberLong(94731) } }, "Database" : { "acquireCount" : { "r" : NumberLong(3715927334), "w" : NumberLong(1025512452), "R" : NumberLong(194), "W" : NumberLong(69) }, "acquireWaitCount" : { "r" : NumberLong(13), "w" : NumberLong(5), "R" : NumberLong(6), "W" : NumberLong(3) }, "timeAcquiringMicros" : { "r" : NumberLong(530972), "w" : NumberLong(426173), "R" : NumberLong(3207), "W" : NumberLong(1321) } }, "Collection" : { "acquireCount" : { "r" : NumberLong(3715046899), "w" : NumberLong(1025512453) } }, "Metadata" : { "acquireCount" : { "w" : NumberLong(1), "W" : NumberLong(3) } } }, "network" : { "bytesIn" : NumberLong(373939915493), #輸入數據(byte) "bytesOut" : NumberLong(961227224728), #輸出數據(byte) "physicalBytesIn" : NumberLong(373939915493),#物理輸入數據(byte) "physicalBytesOut" : NumberLong(961054421482),#物理輸入數據(byte) "numRequests" : NumberLong(3142377739) #請求數 }, "opLatencies" : { "reads" : { "latency" : NumberLong(3270742192035), "ops" : NumberLong(540111914) }, "writes" : { "latency" : NumberLong(261946981235), "ops" : NumberLong(1024301418) }, "commands" : { "latency" : NumberLong(458086641), "ops" : NumberLong(6776702) } }, "opcounters" : { "insert" : 6846448, #插入操做數 "query" : 248443106, #查詢操做數 "update" : 1018594976, #更新操做數 "delete" : 1830, #刪除操做數 "getmore" : 162213, #獲取更多的操做數 "command" : 298306448 #其餘命令操做數 }, "opcountersRepl" : { "insert" : 0, "query" : 0, "update" : 0, "delete" : 0, "getmore" : 0, "command" : 0 }, "storageEngine" : { "name" : "wiredTiger", "supportsCommittedReads" : true, "readOnly" : false, "persistent" : true }, "tcmalloc" : { "generic" : { "current_allocated_bytes" : NumberLong(3819325752), "heap_size" : NumberLong(6959509504) }, "tcmalloc" : { "pageheap_free_bytes" : 199692288, "pageheap_unmapped_bytes" : NumberLong(2738442240), "max_total_thread_cache_bytes" : NumberLong(1073741824), "current_total_thread_cache_bytes" : 35895120, "total_free_bytes" : 202049224, "central_cache_free_bytes" : 165650360, "transfer_cache_free_bytes" : 503744, "thread_cache_free_bytes" : 35895120, "aggressive_memory_decommit" : 0, "formattedString" : "------------------------------------------------\nMALLOC: 3819325752 ( 3642.4 MiB) Bytes in use by application\nMALLOC: + 199692288 ( 190.4 MiB) Bytes in page heap freelist\nMALLOC: + 165650360 ( 158.0 MiB) Bytes in central cache freelist\nMALLOC: + 503744 ( 0.5 MiB) Bytes in transfer cache freelist\nMALLOC: + 35895120 ( 34.2 MiB) Bytes in thread cache freelists\nMALLOC: + 40001728 ( 38.1 MiB) Bytes in malloc metadata\nMALLOC: ------------\nMALLOC: = 4261068992 ( 4063.7 MiB) Actual memory used (physical + swap)\nMALLOC: + 2738442240 ( 2611.6 MiB) Bytes released to OS (aka unmapped)\nMALLOC: ------------\nMALLOC: = 6999511232 ( 6675.3 MiB) Virtual address space used\nMALLOC:\nMALLOC: 521339 Spans in use\nMALLOC: 115 Thread heaps in use\nMALLOC: 4096 Tcmalloc page size\n------------------------------------------------\nCall ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).\nBytes released to the OS take up virtual address space but no physical memory.\n" } }, "mem" : { "bits" : 64, #64位系統 "resident" : 4103, #佔有物理內存數 "virtual" : 7045, #佔有虛擬內存 "supported" : true, #是否支持擴展內存 "mapped" : 0, "mappedWithJournal" : 0 }, "ok" : 1.0 }
{ "ns" : "standalone.application_alert", "size" : 783852548, "count" : 2638262, "avgObjSize" : 297, "storageSize" : 189296640, "capped" : false, "wiredTiger" : { "metadata" : { "formatVersion" : 1 }, "creationString" : "allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_max=15,merge_min=0),memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u", "type" : "file", "uri" : "statistics:table:collection-4-6040851502998278747", "LSM" : { "bloom filter false positives" : 0, "bloom filter hits" : 0, "bloom filter misses" : 0, "bloom filter pages evicted from cache" : 0, "bloom filter pages read into cache" : 0, "bloom filters in the LSM tree" : 0, "chunks in the LSM tree" : 0, "highest merge generation in the LSM tree" : 0, "queries that could have benefited from a Bloom filter that did not exist" : 0, "sleep for LSM checkpoint throttle" : 0, "sleep for LSM merge throttle" : 0, "total size of bloom filters" : 0 }, "block-manager" : { "allocations requiring file extension" : 31543, "blocks allocated" : 346110, "blocks freed" : 124238, "checkpoint size" : 189259776, "file allocation unit size" : 4096, "file bytes available for reuse" : 20480, "file magic number" : 120897, "file major version number" : 1, "file size in bytes" : 189296640, "minor version number" : 0 }, "btree" : { "btree checkpoint generation" : 165242, "column-store fixed-size leaf pages" : 0, "column-store internal pages" : 0, "column-store variable-size RLE encoded values" : 0, "column-store variable-size deleted values" : 0, "column-store variable-size leaf pages" : 0, "fixed-record size" : 0, "maximum internal page key size" : 368, "maximum internal page size" : 4096, "maximum leaf page key size" : 2867, "maximum leaf page size" : 32768, "maximum leaf page value size" : 67108864, "maximum tree depth" : 3, "number of key/value pairs" : 0, "overflow pages" : 0, "pages rewritten by compaction" : 0, "row-store internal pages" : 0, "row-store leaf pages" : 0 }, "cache" : { "bytes currently in the cache" : 1014702364, "bytes read into cache" : 0, "bytes written from cache" : 1888143292.0, "checkpoint blocked page eviction" : 0, "data source pages selected for eviction unable to be evicted" : 0, "hazard pointer blocked page eviction" : 0, "in-memory page passed criteria to be split" : 224, "in-memory page splits" : 112, "internal pages evicted" : 0, "internal pages split during eviction" : 0, "leaf pages split during eviction" : 0, "modified pages evicted" : 2, "overflow pages read into cache" : 0, "overflow values cached in memory" : 0, "page split during eviction deepened the tree" : 0, "page written requiring lookaside records" : 0, "pages read into cache" : 0, "pages read into cache requiring lookaside entries" : 0, "pages requested from the cache" : 49191856, "pages written from cache" : 217176, "pages written requiring in-memory restoration" : 0, "unmodified pages evicted" : 0 }, "cache_walk" : { "Average difference between current eviction generation when the page was last considered" : 0, "Average on-disk page image size seen" : 0, "Clean pages currently in cache" : 0, "Current eviction generation" : 0, "Dirty pages currently in cache" : 0, "Entries in the root page" : 0, "Internal pages currently in cache" : 0, "Leaf pages currently in cache" : 0, "Maximum difference between current eviction generation when the page was last considered" : 0, "Maximum page size seen" : 0, "Minimum on-disk page image size seen" : 0, "On-disk page image sizes smaller than a single allocation unit" : 0, "Pages created in memory and never written" : 0, "Pages currently queued for eviction" : 0, "Pages that could not be queued for eviction" : 0, "Refs skipped during cache traversal" : 0, "Size of the root page" : 0, "Total number of pages currently in cache" : 0 }, "compression" : { "compressed pages read" : 0, "compressed pages written" : 83604, "page written failed to compress" : 0, "page written was too small to compress" : 133572, "raw compression call failed, additional data available" : 0, "raw compression call failed, no additional data available" : 0, "raw compression call succeeded" : 0 }, "cursor" : { "bulk-loaded cursor-insert calls" : 0, "create calls" : 78758, "cursor-insert key and value bytes inserted" : 795578636, "cursor-remove key bytes removed" : 8857, "cursor-update value bytes updated" : 0, "insert calls" : 2642785, "next calls" : 5850718215.0, "prev calls" : 3, "remove calls" : 4460, "reset calls" : 48942545, "restarted searches" : 0, "search calls" : 10229, "search near calls" : 46285468, "truncate calls" : 0, "update calls" : 0 }, "reconciliation" : { "dictionary matches" : 0, "fast-path pages deleted" : 0, "internal page key bytes discarded using suffix compression" : 7946666, "internal page multi-block writes" : 60010, "internal-page overflow keys" : 0, "leaf page key bytes discarded using prefix compression" : 0, "leaf page multi-block writes" : 64250, "leaf-page overflow keys" : 0, "maximum blocks required for a page" : 253, "overflow values written" : 0, "page checksum matches" : 10496129, "page reconciliation calls" : 189077, "page reconciliation calls for eviction" : 1, "pages deleted" : 7 }, "session" : { "object compaction" : 0, "open cursor count" : 35 }, "transaction" : { "update conflicts" : 0 } }, "nindexes" : 1, "totalIndexSize" : 24420352, "indexSizes" : { "_id_" : 24420352 }, "ok" : 1.0 }
經過上述的指標,須要優化的話,第一考慮的是查看是否對該collection建立了索引:
db.application_alert.ensureIndex({"insertTime": 1, "eventType": 1}); db.application_alert.ensureIndex({"insertTime": 1}); db.application_alert.ensureIndex({"eventType": 1}); db.application_alert.ensureIndex({"factoryId": 1});
查看增長index後查詢一天的數據聚合須要424ms, 基本能夠接受。
咱們經過增長索引解決了什麼問題?
在沒有索引的前提下,找出100萬條{eventType: "abnormal"}須要多少時間?全表掃描COLLSCAN從700w條數據中找出600w條,跟從1億條數據中找出600w條顯然是兩個概念。命中索引IXSCAN,這個差別就會小不少,幾乎能夠忽略。索引的添加只是解決了針對索引字段查詢的效率,可是並不能解決查詢以後數據的聚合問題。順便應該提一下看效率是否有差別應該看執行計劃,不要看執行時間,時間是不許確的。
那問題是,如何解決這種查詢聚合大量數據的問題呢?
首先要說明的一個問題是,對於OLAP型的操做,指望不該該過高。畢竟是對於大量數據的操做,光從IO就已經遠超一般的OLTP操做,因此要求達到OLTP操做的速度和併發是不現實的,也是沒有意義的。但並非說一點優化空間也沒有。
這樣優化以後預計在能夠提高一部分查詢性能,可是並不能解決。緣由開頭說了,對OLAP就不能指望這麼高。若是你真有這方面的需求,就應該從源頭入手,考慮: