一次 group by + order by 性能優化分析html
最近經過一個日誌表作排行的時候發現特別卡,最後問題獲得瞭解決,梳理一些索引和MySQL執行過程的經驗,可是最後仍是有5個謎題沒解開,但願你們幫忙解答下。mysql
須要分別統計本月、本週被訪問的文章的 TOP10。日誌表以下sql
CREATE TABLE `article_rank` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `aid` int(11) unsigned NOT NULL, `pv` int(11) unsigned NOT NULL DEFAULT '1', `day` int(11) NOT NULL COMMENT '日期 例如 20171016', PRIMARY KEY (`id`), KEY `idx_day_aid_pv` (`day`,`aid`,`pv`), KEY `idx_aid_day_pv` (`aid`,`day`,`pv`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
爲了可以清晰的驗證本身的一些猜測,在虛擬機裏安裝了一個 debug 版的 mysql,而後開啓了慢日誌收集,用於統計掃描行數數據庫
編輯配置文件,在[mysqld]
塊下添加編程
slow_query_log=1 slow_query_log_file=xxx long_query_time=0 log_queries_not_using_indexes=1
發現問題
假如我須要查詢2018-12-20 ~ 2018-12-24
這5
天瀏覽量最大的10
篇文章的 sql
以下,首先使用explain
看下分析結果數組
mysql> explain select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-----------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | article_rank | NULL | range | idx_day_aid_pv,idx_aid_day_pv | idx_day_aid_pv | 4 | NULL | 404607 | 100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-----------------------------------------------------------+
系統默認會走的索引是idx_day_aid_pv
,根據Extra
信息咱們能夠看到,使用idx_day_aid_pv
索引的時候,會走覆蓋索引,可是會使用臨時表,會有排序。
咱們查看下慢日誌裏的記錄信息性能優化
# Time: 2019-03-17T03:02:27.984091Z
# User@Host: root[root] @ localhost [] Id: 6
# Query_time: 56.959484 Lock_time: 0.000195 Rows_sent: 10 Rows_examined: 1337315
SET timestamp=1552791747;
select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
咱們查詢兩個數據,一個是知足條件的行數,一個是group by
統計以後的行數。微信
mysql> select count(*) from article_rank where day>=20181220 and day<=20181224;
+----------+
| count(*) |
+----------+
| 785102 |
+----------+
mysql> select count(distinct aid) from article_rank where day>=20181220 and day<=20181224;
+---------------------+
| count(distinct aid) |
+---------------------+
| 552203 |
+---------------------+
發現知足條件的總行數(785102)+group by
以後的總行數(552203)+limit
的值 =
慢日誌裏統計的 Rows_examined。
app
要解答這個問題,就必須搞清楚上面這個 sql
到底分別都是如何運行的。less
爲了便於理解,我按照索引的規則先模擬idx_day_aid_pv
索引的一小部分數據
day | aid | pv | id |
---|---|---|---|
20181220 | 1 | 23 | 1234 |
20181220 | 3 | 2 | 1231 |
20181220 | 4 | 1 | 1212 |
20181220 | 7 | 2 | 1221 |
20181221 | 1 | 5 | 1257 |
20181221 | 10 | 1 | 1251 |
20181221 | 11 | 8 | 1258 |
由於索引idx_day_aid_pv
最左列是day
,因此當咱們須要查找20181220~20181224
之間的文章的pv
總和的時候,咱們須要遍歷20181220~20181224
這段數據的索引。
# 開啓 optimizer_trace
set optimizer_trace='enabled=on';
# 執行 sql
select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
# 查看 trace 信息
select trace from `information_schema`.`optimizer_trace`\G;
摘取裏面最後的執行結果以下
{ "join_execution": { "select#": 1, "steps": [ { "creating_tmp_table": { "tmp_table_info": { "table": "intermediate_tmp_table", "row_length": 20, "key_length": 4, "unique_constraint": false, "location": "memory (heap)", "row_limit_estimate": 838860 } } }, { "converting_tmp_table_to_ondisk": { "cause": "memory_table_size_exceeded", "tmp_table_info": { "table": "intermediate_tmp_table", "row_length": 20, "key_length": 4, "unique_constraint": false, "location": "disk (InnoDB)", "record_format": "fixed" } } }, { "filesort_information": [ { "direction": "desc", "table": "intermediate_tmp_table", "field": "num" } ], "filesort_priority_queue_optimization": { "limit": 10, "rows_estimate": 1057, "row_size": 36, "memory_available": 262144, "chosen": true }, "filesort_execution": [ ], "filesort_summary": { "rows": 11, "examined_rows": 552203, "number_of_tmp_files": 0, "sort_buffer_size": 488, "sort_mode": "<sort_key, additional_fields>" } } ] } }
經過gdb
調試確認臨時表上的字段是aid
和num
Breakpoint 1, trace_tmp_table (trace=0x7eff94003088, table=0x7eff94937200) at /root/newdb/mysql-server/sql/sql_tmp_table.cc:2306 warning: Source file is more recent than executable. 2306 trace_tmp.add("row_length",table->s->reclength). (gdb) p table->s->reclength $1 = 20 (gdb) p table->s->fields $2 = 2 (gdb) p (*(table->field+0))->field_name $3 = 0x7eff94010b0c "aid" (gdb) p (*(table->field+1))->field_name $4 = 0x7eff94007518 "num" (gdb) p (*(table->field+0))->row_pack_length() $5 = 4 (gdb) p (*(table->field+1))->row_pack_length() $6 = 15 (gdb) p (*(table->field+0))->type() $7 = MYSQL_TYPE_LONG (gdb) p (*(table->field+1))->type() $8 = MYSQL_TYPE_NEWDECIMAL (gdb)
經過上面的打印,確認了字段類型,一個aid
是MYSQL_TYPE_LONG
,佔4字節,num
是MYSQL_TYPE_NEWDECIMAL
,佔15字節。
可是經過咱們上面打印信息能夠看到兩個字段的長度加起來是19,而optimizer_trace裏的tmp_table_info.reclength是20。經過其餘實驗也發現table->s->reclength的長度就是table->field數組裏面全部字段的字段長度和再加1。
# Query_time: 4.406927 Lock_time: 0.000200 Rows_sent: 10 Rows_examined: 1337315 SET timestamp=1552791804; select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
掃描行數都是1337315,爲何執行消耗的時間上快了12倍呢?
爲何性能上比 SQL1 高了,不少呢,緣由之一是idx_aid_day_pv
索引上aid
是肯定有序的,那麼執行group by
的時候,則不會建立臨時表,排序的時候才須要臨時表。若是印證這一點呢,咱們經過下面的執行計劃就能看到
使用idx_day_aid_pv
索引的效果:
mysql> explain select aid,sum(pv) as num from article_rank force index(idx_day_aid_pv) where day>=20181220 and day<=20181224 group by aid order by null limit 10; +----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-------------------------------------------+ | 1 | SIMPLE | article_rank | NULL | range | idx_day_aid_pv,idx_aid_day_pv | idx_day_aid_pv | 4 | NULL | 404607 | 100.00 | Using where; Using index; Using temporary | +----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-------------------------------------------+
注意我上面使用了order by null
表示強制對group by
的結果不作排序。若是不加order by null
,上面的 sql
則會出現Using filesort
使用idx_aid_day_pv
索引的效果:
mysql> explain select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by null limit 10; +----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+------+----------+--------------------------+ | 1 | SIMPLE | article_rank | NULL | index | idx_day_aid_pv,idx_aid_day_pv | idx_aid_day_pv | 12 | NULL | 10 | 11.11 | Using where; Using index | +----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+------+----------+--------------------------+
# 開啓optimizer_trace set optimizer_trace='enabled=on'; # 執行 sql select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10; # 查看 trace 信息 select trace from `information_schema`.`optimizer_trace`\G;
摘取裏面最後的執行結果以下
{ "join_execution": { "select#": 1, "steps": [ { "creating_tmp_table": { "tmp_table_info": { "table": "intermediate_tmp_table", "row_length": 20, "key_length": 0, "unique_constraint": false, "location": "memory (heap)", "row_limit_estimate": 838860 } } }, { "filesort_information": [ { "direction": "desc", "table": "intermediate_tmp_table", "field": "num" } ], "filesort_priority_queue_optimization": { "limit": 10, "rows_estimate": 552213, "row_size": 24, "memory_available": 262144, "chosen": true }, "filesort_execution": [ ], "filesort_summary": { "rows": 11, "examined_rows": 552203, "number_of_tmp_files": 0, "sort_buffer_size": 352, "sort_mode": "<sort_key, rowid>" } } ] } }
那麼若是該表的數據存儲的不是5天的數據,而是10天的數據呢,更或者是365天的數據呢?這個方案是否還可行呢?先模擬10天的數據,在現有時間基礎上日後加5天,行數與如今同樣785102行。
drop procedure if exists idata; delimiter ;; create procedure idata() begin declare i int; declare aid int; declare pv int; declare post_day int; set i=1; while(i<=785102)do set aid = round(rand()*500000); set pv = round(rand()*100); set post_day = 20181225 + i%5; insert into article_rank (`aid`,`pv`,`day`) values(aid, pv, post_day); set i=i+1; end while; end;; delimiter ; call idata(); # Query_time: 9.151270 Lock_time: 0.000508 Rows_sent: 10 Rows_examined: 2122417 SET timestamp=1552889936; select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
這裏掃描行數2122417是由於掃描索引的時候須要遍歷整個索引,整個索引的行數就是全錶行數,由於我剛剛又插入了785102行。
當我數據量翻倍以後,這裏查詢時間明顯已經翻倍。因此這個優化方式不穩定。
默認的臨時表空間大小是16MB
mysql> show global variables like '%table_size'; +---------------------+----------+ | Variable_name | Value | +---------------------+----------+ | max_heap_table_size | 16777216 | | tmp_table_size | 16777216 | +---------------------+----------+
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_heap_table_size https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_tmp_table_size max_heap_table_size This variable sets the maximum size to which user-created MEMORY tables are permitted to grow. The value of the variable is used to calculate MEMORY table MAX_ROWS values. Setting this variable has no effect on any existing MEMORY table, unless the table is re-created with a statement such as CREATE TABLE or altered with ALTER TABLE or TRUNCATE TABLE. A server restart also sets the maximum size of existing MEMORY tables to the global max_heap_table_size value. tmp_table_size The maximum size of internal in-memory temporary tables. This variable does not apply to user-created MEMORY tables. The actual limit is determined from whichever of the values of tmp_table_size and max_heap_table_size is smaller. If an in-memory temporary table exceeds the limit, MySQL automatically converts it to an on-disk temporary table. The internal_tmp_disk_storage_engine option defines the storage engine used for on-disk temporary tables.
也就是說這裏臨時表的限制是16M,max_heap_table_size
大小也受tmp_table_size
大小的限制。
因此咱們這裏調整爲32MB
,而後執行原始的SQL
set tmp_table_size=33554432; set max_heap_table_size=33554432; # Query_time: 5.910553 Lock_time: 0.000210 Rows_sent: 10 Rows_examined: 1337315 SET timestamp=1552803869; select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
告訴優化器,查詢結果比較多,臨時表直接走磁盤存儲。
# Query_time: 6.144315 Lock_time: 0.000183 Rows_sent: 10 Rows_examined: 2122417 SET timestamp=1552802804; select SQL_BIG_RESULT aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
掃描行數是 2x知足條件的總行數(785102)+group by
以後的總行數(552203)+limit
的值。
順便值得一提的是: 當我把數據量翻倍以後,使用該方式,查詢時間基本沒變。由於掃描的行數仍是不變的。實際測試耗時6.197484
方案1優化效果不穩定,當總表數據量與查詢範圍的總數相同時,且不超出內存臨時表大小限制時,性能達到最佳。當查詢數據量佔據總表數據量越大,優化效果越不明顯;
方案2須要調整臨時表內存的大小,可行;不過當數據庫超過32MB時,若是使用該方式,還須要繼續提高臨時表大小;
方案3直接聲明使用磁盤來放臨時表,雖然掃描行數多了一次符合條件的總行數的掃描。可是總體響應時間比方案2就慢了0.1秒。由於咱們這裏數據量比較,我以爲這個時間差還能接受。
因此最後對比,選擇方案3比較合適。
# SQL1 select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10; # SQL2 select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
關注微信公衆號:編程學習分享
轉自:https://mengkang.net/1355.htm