待優化場景
發現SLOW QUERY LOG中有下面這樣一條記錄:mysql
... # Query_time: 59.503827 Lock_time: 0.000198 Rows_sent: 641227 Rows_examined: 13442472 Rows_affected: 0 ... select uid,sum(power) powerup from t1 where date>='2017-03-31' and UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))>=1490965200 and UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))<1492174801 and aType in (1,6,9) group by uid;
實話說,看到這個SQL我也忍不住想罵人啊,到底是哪一個腦殘的XX狗設計的?sql
居然把日期時間中的 date 和 hour 給獨立出來成兩列,查詢時再合併成一個新的條件,簡直無力吐槽。函數
吐槽歸吐槽,該幹活還得幹活,誰讓咱是DBA呢,SQL優化是咱的拿手好戲不是嘛~性能
SQL優化之路
一、SQL優化思路
不厭其煩地再說一遍SQL優化思路。優化
想要優化一個SQL,通常來講就是先看執行計劃,觀察是否儘量用到索引,ui
同時要關注預計掃描的行數,插件
以及是否產生了臨時表(Using temporary) 或者 設計
是否須要進行排序(Using filesort),code
想辦法消除這些狀況。排序
二、SQL性能瓶頸定位
毫無疑問,想要優化,先看錶DDL以及執行計劃:
CREATE TABLE `t1` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `date` date NOT NULL DEFAULT '0000-00-00', `hour` char(2) NOT NULL DEFAULT '00', `kid` int(4) NOT NULL DEFAULT '0', `uid` int(11) NOT NULL DEFAULT '0', `aType` tinyint(2) NOT NULL DEFAULT '0', `src` tinyint(2) NOT NULL DEFAULT '1', `aid` int(11) NOT NULL DEFAULT '1', `acount` int(11) NOT NULL DEFAULT '1', `power` decimal(20,2) DEFAULT '0.00', PRIMARY KEY (`id`,`date`), UNIQUE KEY `did` (`date`,`hour`,`kid`,`uid`,`aType`,`src`,`aid`) ) ENGINE=InnoDB AUTO_INCREMENT=50486620 DEFAULT CHARSET=utf8mb4 /*!50500 PARTITION BY RANGE COLUMNS(`date`) (PARTITION p20170316 VALUES LESS THAN ('2017-03-17') ENGINE = InnoDB, PARTITION p20170317 VALUES LESS THAN ('2017-03-18') ENGINE = InnoDB ... yejr@imysql.com[myDB]> EXPLAIN select uid,sum(power) powerup from t1 where date>='2017-03-31' and UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))>=1490965200 and UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))<1492174801 and aType in (1,6,9) group by uid\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 partitions: p20170324,p20170325,....all partition type: ALL possible_keys: did key: NULL key_len: NULL ref: NULL rows: 25005577 filtered: 15.00 Extra: Using where; Using temporary; Using filesort
明顯的,這個SQL效率很是低,全表掃描、沒有索引、有臨時表、須要額外排序,什麼倒黴催的全遇上了。
三、優化思考
這個SQL是想統計符合條件的power列總和,雖然 date 列已有索引,但WHERE子句中卻對 date 列加了函數,並且仍是 date 和 hour 兩列的組合條件,那就沒法用到這個索引了。
還好,有個聰明伶俐的妹子,突發起想(事實上這位妹子原本就擅長作SQL優化的~),能夠用 CASE WHEN 方法來改造下SQL,改爲像下面這樣的:
select uid,sum(powerup+powerup1) from ( select uid, case when concat(date,' ',hour) >='2017-03-24 13:00' then power else '0' end as powerup, case when concat(date,' ',hour) < '2017-03-25 13:00' then power else '0' end as powerup1 from t1 where date>='2017-03-24' and date <'2017-03-25' and aType in (1,6,9) ) a group by uid;
是否是頗有才,直接把這個沒辦法用到索引的條件給用CASE WHEN來改造了。看看新的SQL執行計劃:
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 partitions: p20170324 type: range possible_keys: did key: idx2_date_addRedType key_len: 4 ref: NULL rows: 876375 filtered: 30.00 Extra: Using index condition; Using temporary; Using filesort
看看這個SQL的執行代價:
+----------------------------+---------+ | Variable_name | Value | +----------------------------+---------+ | Handler_read_first | 1 | | Handler_read_key | 1834590 | | Handler_read_last | 0 | | Handler_read_next | 1834589 | | Handler_read_prev | 0 | | Handler_read_rnd | 232276 | | Handler_read_rnd_next | 232277 | +----------------------------+---------+
及其SLOW QUERY LOG記錄的信息:
# Query_time: 6.381254 Lock_time: 0.000166 Rows_sent: 232276 Rows_examined: 2299141 Rows_affected: 0 # Bytes_sent: 4237347 Tmp_tables: 1 Tmp_disk_tables: 0 Tmp_table_sizes: 4187168 # InnoDB_trx_id: 0 # QC_Hit: No Full_scan: No Full_join: No Tmp_table: Yes Tmp_table_on_disk: No # Filesort: Yes Filesort_on_disk: No Merge_passes: 0 # InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000 # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000 # InnoDB_pages_distinct: 9311
看起來還不是太理想啊,雖然再也不掃描全表了,但畢竟仍是 有臨時表 和 額外排序,想辦法消除後再對比看下。
有個變化不知道你們注意到沒,新的SLOW QUERY LOG記錄多了很多信息,這是由於用了Percona分支版本的插件才支持,這個功能確實不錯,甚至還能記錄Profiling的詳細信息,強烈推薦。
咱們新建個 uid 列上的索引,看看能除臨時表及排序後的代價如何,看看這個的開銷會不會更低。
yejr@imysql.com[myDB]> ALTER TABLE t1 ADD INDEX idx_uid(uid); yejr@imysql.com[myDB]> EXPLAIN select uid,sum(powerup+powerup1) from ( select uid, case when concat(date,' ',hour) >='2017-03-24 13:00' then power else '0' end as powerup, case when concat(date,' ',hour) < '2017-03-25 13:00' then power else '0' end as powerup1 from t1 where date>='2017-03-24' and date <'2017-03-25' and aType in (1,6,9) ) a group by uid\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: if_date_hour_army_count partitions: p20170331,p20170401... type: index possible_keys: did,idx_uid key: idx_uid key_len: 4 ref: NULL rows: 12701520 filtered: 15.00 Extra: Using where
看看添加索引後SQL的執行代價:
+----------------------------+---------+ | Variable_name | Value | +----------------------------+---------+ | Handler_read_first | 1 | | Handler_read_key | 1 | | Handler_read_last | 0 | | Handler_read_next | 1834589 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 0 | +----------------------------+---------+
及其SLOW QUERY LOG記錄的信息:
# Query_time: 5.772286 Lock_time: 0.000330 Rows_sent: 232276 Rows_examined: 1834589 Rows_affected: 0 # Bytes_sent: 4215071 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 0 # QC_Hit: No Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 # InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000 # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000 # InnoDB_pages_distinct: 11470
咱們注意到,雖然加了 uid 列索引後的SQL掃描的data page更多了,但執行效率實際上是更高的,由於消除了 臨時表 和 額外排序,這從 Handlerread% 的結果中也能看出來,很顯然它的順序I/O更多,隨機I/O更少,因此雖然須要掃描的 data page 更多,實際上效率倒是更快的。
後記再想一想這個SQL還有優化空間嗎,顯然是有的,那就是把數據表從新設計,將date和hour列整合到一塊兒,這樣就不用費勁的拼湊條件而且也能用到索引了。