優化案例 | CASE WHEN進行SQL改寫優化

時間 2020-01-13

標籤優化案例 case 進行 sql 改寫欄目 SQL 简体版

原文原文鏈接

待優化場景
發現SLOW QUERY LOG中有下面這樣一條記錄：mysql

...
# Query_time: 59.503827  Lock_time: 0.000198  Rows_sent: 641227  Rows_examined: 13442472  Rows_affected: 0
...
select uid,sum(power) powerup from t1 where 
date>='2017-03-31' and 
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))>=1490965200 and 
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))<1492174801  and 
aType in (1,6,9) group by uid;

實話說，看到這個SQL我也忍不住想罵人啊，到底是哪一個腦殘的XX狗設計的？sql

居然把日期時間中的 date 和 hour 給獨立出來成兩列，查詢時再合併成一個新的條件，簡直無力吐槽。函數

吐槽歸吐槽，該幹活還得幹活，誰讓咱是DBA呢，SQL優化是咱的拿手好戲不是嘛~性能

SQL優化之路
一、SQL優化思路
不厭其煩地再說一遍SQL優化思路。優化

想要優化一個SQL，通常來講就是先看執行計劃，觀察是否儘量用到索引，ui

同時要關注預計掃描的行數，插件

以及是否產生了臨時表（Using temporary）或者設計

是否須要進行排序（Using filesort），code

想辦法消除這些狀況。排序

二、SQL性能瓶頸定位
毫無疑問，想要優化，先看錶DDL以及執行計劃：

CREATE TABLE `t1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `date` date NOT NULL DEFAULT '0000-00-00',
  `hour` char(2) NOT NULL DEFAULT '00',
  `kid` int(4) NOT NULL DEFAULT '0',
  `uid` int(11) NOT NULL DEFAULT '0',
  `aType` tinyint(2) NOT NULL DEFAULT '0',
  `src` tinyint(2) NOT NULL DEFAULT '1',
  `aid` int(11) NOT NULL DEFAULT '1',
  `acount` int(11) NOT NULL DEFAULT '1',
  `power` decimal(20,2) DEFAULT '0.00',
  PRIMARY KEY (`id`,`date`),
  UNIQUE KEY `did` (`date`,`hour`,`kid`,`uid`,`aType`,`src`,`aid`)
) ENGINE=InnoDB AUTO_INCREMENT=50486620 DEFAULT CHARSET=utf8mb4
/*!50500 PARTITION BY RANGE  COLUMNS(`date`)
(PARTITION p20170316 VALUES LESS THAN ('2017-03-17') ENGINE = InnoDB,
 PARTITION p20170317 VALUES LESS THAN ('2017-03-18') ENGINE = InnoDB
...

yejr@imysql.com[myDB]> EXPLAIN select uid,sum(power) powerup from t1 where 
date>='2017-03-31' and 
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))>=1490965200 and 
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))<1492174801  and 
aType in (1,6,9) group by uid\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p20170324,p20170325,....all partition
         type: ALL
possible_keys: did
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25005577
     filtered: 15.00
        Extra: Using where; Using temporary; Using filesort

明顯的，這個SQL效率很是低，全表掃描、沒有索引、有臨時表、須要額外排序，什麼倒黴催的全遇上了。

三、優化思考
這個SQL是想統計符合條件的power列總和，雖然 date 列已有索引，但WHERE子句中卻對 date 列加了函數，並且仍是 date 和 hour 兩列的組合條件，那就沒法用到這個索引了。

還好，有個聰明伶俐的妹子，突發起想（事實上這位妹子原本就擅長作SQL優化的~），能夠用 CASE WHEN 方法來改造下SQL，改爲像下面這樣的：

select uid,sum(powerup+powerup1) from
(
   select uid,
          case when concat(date,' ',hour) >='2017-03-24 13:00' then power else '0' end as powerup,
          case when concat(date,' ',hour) < '2017-03-25 13:00' then power else '0' end as powerup1
   from t1
   where date>='2017-03-24' 
   and   date <'2017-03-25'
   and  aType in (1,6,9)
) a  group by uid;

是否是頗有才，直接把這個沒辦法用到索引的條件給用CASE WHEN來改造了。看看新的SQL執行計劃：

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p20170324
         type: range
possible_keys: did
          key: idx2_date_addRedType
      key_len: 4
          ref: NULL
         rows: 876375
     filtered: 30.00
        Extra: Using index condition; Using temporary; Using filesort

看看這個SQL的執行代價：

+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_read_first         | 1       |
| Handler_read_key           | 1834590 |
| Handler_read_last          | 0       |
| Handler_read_next          | 1834589 |
| Handler_read_prev          | 0       |
| Handler_read_rnd           | 232276  |
| Handler_read_rnd_next      | 232277  |
+----------------------------+---------+

及其SLOW QUERY LOG記錄的信息：

# Query_time: 6.381254  Lock_time: 0.000166  Rows_sent: 232276  Rows_examined: 2299141  Rows_affected: 0
# Bytes_sent: 4237347  Tmp_tables: 1  Tmp_disk_tables: 0  Tmp_table_sizes: 4187168
# InnoDB_trx_id: 0
# QC_Hit: No  Full_scan: No  Full_join: No  Tmp_table: Yes  Tmp_table_on_disk: No
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 0  InnoDB_IO_r_bytes: 0  InnoDB_IO_r_wait: 0.000000
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 9311

看起來還不是太理想啊，雖然再也不掃描全表了，但畢竟仍是有臨時表和額外排序，想辦法消除後再對比看下。

有個變化不知道你們注意到沒，新的SLOW QUERY LOG記錄多了很多信息，這是由於用了Percona分支版本的插件才支持，這個功能確實不錯，甚至還能記錄Profiling的詳細信息，強烈推薦。

咱們新建個 uid 列上的索引，看看能除臨時表及排序後的代價如何，看看這個的開銷會不會更低。

yejr@imysql.com[myDB]> ALTER TABLE t1 ADD INDEX idx_uid(uid);
yejr@imysql.com[myDB]> EXPLAIN select uid,sum(powerup+powerup1) from
(
   select uid,
          case when concat(date,' ',hour) >='2017-03-24 13:00' then power else '0' end as powerup,
          case when concat(date,' ',hour) < '2017-03-25 13:00' then power else '0' end as powerup1
   from t1
   where date>='2017-03-24' 
   and   date <'2017-03-25'
   and  aType in (1,6,9)
) a  group by uid\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: if_date_hour_army_count
   partitions: p20170331,p20170401...
         type: index
possible_keys: did,idx_uid
          key: idx_uid
      key_len: 4
          ref: NULL
         rows: 12701520
     filtered: 15.00
        Extra: Using where

看看添加索引後SQL的執行代價：

+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_read_first         | 1       |
| Handler_read_key           | 1       |
| Handler_read_last          | 0       |
| Handler_read_next          | 1834589 |
| Handler_read_prev          | 0       |
| Handler_read_rnd           | 0       |
| Handler_read_rnd_next      | 0       |
+----------------------------+---------+

及其SLOW QUERY LOG記錄的信息：

# Query_time: 5.772286  Lock_time: 0.000330  Rows_sent: 232276  Rows_examined: 1834589  Rows_affected: 0
# Bytes_sent: 4215071  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 0
# QC_Hit: No  Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 0  InnoDB_IO_r_bytes: 0  InnoDB_IO_r_wait: 0.000000
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 11470

咱們注意到，雖然加了 uid 列索引後的SQL掃描的data page更多了，但執行效率實際上是更高的，由於消除了臨時表和額外排序，這從 Handlerread% 的結果中也能看出來，很顯然它的順序I/O更多，隨機I/O更少，因此雖然須要掃描的 data page 更多，實際上效率倒是更快的。

後記再想一想這個SQL還有優化空間嗎，顯然是有的，那就是把數據表從新設計，將date和hour列整合到一塊兒，這樣就不用費勁的拼湊條件而且也能用到索引了。