覆盤MySQL分頁查詢優化方案

1、前言

MySQL分頁查詢做爲Java面試的一道高頻面試題,這裏有必要實踐一下,畢竟實踐出真知。 不少同窗在作測試時苦於沒有海量數據,官方實際上是有一套測試庫的。mysql

2、模擬數據

這裏模擬數據分2種狀況導入,若是隻是須要數據測試下,那麼推薦官方數據。若是官方數據知足不了需求的話,那麼咱們本身模擬數據。git

1. 導入官方測試庫

下載 官方數據庫文件 或者在 github 上下載。github

該測試庫含有6個表。面試

首先進入 employees_db, 執行導入數據指令sql

mysql -uroot -proot -t < employees.sql
複製代碼

有些環境可能會報錯數據庫

ERROR 1193 (HY000) at line 38: Unknown system variable 'storage_engine'
複製代碼

鏈接mysql查看默認引擎,發現不是本地環境的問題。緩存

mysql> show variables like '%engine%';
+----------------------------------+--------+
| Variable_name                    | Value  |
+----------------------------------+--------+
| default_storage_engine           | InnoDB |
| default_tmp_storage_engine       | InnoDB |
| disabled_storage_engines         |        |
| internal_tmp_disk_storage_engine | InnoDB |
+----------------------------------+--------+
4 rows in set (0.01 sec)
複製代碼

修改 employees.sql 腳本bash

set default_storage_engine = InnoDB;
-- set storage_engine = MyISAM;
-- set storage_engine = Falcon;
-- set storage_engine = PBXT;
-- set storage_engine = Maria;

select CONCAT('storage engine: ', @@default_storage_engine) as INFO;
複製代碼

再次執行發現導入成功微信

➜  employees_db mysql -uroot -proot -t < employees.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------------------+
| INFO                        |
+-----------------------------+
| CREATING DATABASE STRUCTURE |
+-----------------------------+
+------------------------+
| INFO                   |
+------------------------+
| storage engine: InnoDB |
+------------------------+
+---------------------+
| INFO                |
+---------------------+
| LOADING departments |
+---------------------+
+-------------------+
| INFO              |
+-------------------+
| LOADING employees |
+-------------------+
+------------------+
| INFO             |
+------------------+
| LOADING dept_emp |
+------------------+
+----------------------+
| INFO                 |
+----------------------+
| LOADING dept_manager |
+----------------------+
+----------------+
| INFO           |
+----------------+
| LOADING titles |
+----------------+
+------------------+
| INFO             |
+------------------+
| LOADING salaries |
+------------------+
複製代碼

驗證結果(配置修改同上)markdown

➜  employees_db mysql -uroot -proot -t < test_employees_sha.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
+----------------------+
| INFO                 |
+----------------------+
| TESTING INSTALLATION |
+----------------------+
+--------------+------------------+------------------------------------------+
| table_name   | expected_records | expected_crc                             |
+--------------+------------------+------------------------------------------+
| departments  |                9 | 4b315afa0e35ca6649df897b958345bcb3d2b764 |
| dept_emp     |           331603 | d95ab9fe07df0865f592574b3b33b9c741d9fd1b |
| dept_manager |               24 | 9687a7d6f93ca8847388a42a6d8d93982a841c6c |
| employees    |           300024 | 4d4aa689914d8fd41db7e45c2168e7dcb9697359 |
| salaries     |          2844047 | b5a1785c27d75e33a4173aaa22ccf41ebd7d4a9f |
| titles       |           443308 | d12d5f746b88f07e69b9e36675b6067abb01b60e |
+--------------+------------------+------------------------------------------+
複製代碼

咱們能夠看到emp大概有33萬條數據。

2. 存儲過程導入模擬數據

這裏咱們能夠選擇存儲過程批量導入。

首先建立一張表

drop table if exists `user`;
create table `user`(
  `id` int unsigned auto_increment,
  `username` varchar(64) not null default '',
  `score` int(11) not null default 0,
    primary key(`id`)
)ENGINE = InnoDB;
複製代碼

建立存儲過程

DROP PROCEDURE IF EXISTS batchInsert;
delimiter ?  -- 聲明存儲過程結束符號
create procedure batchInsert() -- 建立存儲過程
begin   -- 存儲過程主體開始
    declare num int; -- 聲明變量
    set num=1; -- 初始值
    while num<=3000000 do -- 循環條件
        insert into user(`username`,`score`) values(concat('user-', num),num); -- 執行語句
        set num=num+1; -- 循環變量自增
    end while; -- 結束循環
end? -- 存儲過程主體結束
delimiter ; #恢復;表示結束

CALL batchInsert; -- 執行存儲過程
複製代碼

能夠看到測試300W條數據大概1046s插入完成。好吧,原本計劃導入1000w的結果時間太長了。

3、經常使用的MySQL分頁查詢問題復現及優化。

咱們拿現有的表 user 進行測試,該表有 300w 條數據。

1. 前置檢查

首先查看下該表結構以及目前存在哪些索引

mysql> desc user;
+----------+------------------+------+-----+---------+----------------+
| Field    | Type             | Null | Key | Default | Extra          |
+----------+------------------+------+-----+---------+----------------+
| id       | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| username | varchar(30)      | NO   |     |         |                |
| score    | int(11)          | NO   |     | 0       |                |
+----------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> show index from user;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| user  |          0 | PRIMARY  |            1 | id          | A         |     2991886 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)
複製代碼

能夠看到只有 id 主鍵索引。


其次查看是否開啓 緩存 (避免查詢緩存對執行效率產生影響)

mysql> show variables like '%query_cache%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| have_query_cache             | YES     |
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 1048576 |
| query_cache_type             | OFF     |
| query_cache_wlock_invalidate | OFF     |
+------------------------------+---------+
6 rows in set (0.00 sec)

mysql> show profiles;
Empty set, 1 warning (0.00 sec)
複製代碼

have_query_cachequery_cache_type 說明支持緩存但並未開啓。 show profiles 顯示爲空,說明profiles功能是關閉的。


開啓 profiles

mysql> SET profiling = 1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show profiles;
+----------+------------+-------------------+
| Query_ID | Duration   | Query             |
+----------+------------+-------------------+
|        1 | 0.00012300 | SET profiling = 1 |
+----------+------------+-------------------+
1 row in set, 1 warning (0.00 sec)
複製代碼

2. 無索引分頁查詢

通常咱們最經常使用的分頁查詢的方式爲 order by + limit m,n 的方式, 如今咱們測試下分頁性能

select * from user order by score limit 0,10; -- 10 rows in set (0.65 sec)
select * from user order by score limit 10000,10; -- 10 rows in set (0.83 sec)
select * from user order by score limit 100000,10; -- 10 rows in set (1.03 sec)
select * from user order by score limit 1000000,10; -- 10 rows in set (1.14 sec)
複製代碼

這裏咱們確認下是否用到了索引

mysql> explain select * from user order by score limit 1000000,10;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------+
|  1 | SIMPLE      | user  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2991995 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------+
1 row in set, 1 warning (0.00 sec)
複製代碼

能夠看到確實沒有用到索引,全表掃描100W數據分頁大概須要1.14s的時間。

3. 有索引分頁查詢

select * from user order by id limit 10000,10; -- 10 rows in set (0.01 sec)
select * from user order by id limit 1000000,10; -- 10 rows in set (0.18 sec)
select * from user order by id limit 2000000,10; -- 10 rows in set (0.35 sec)
複製代碼

該查詢用到了主鍵索引,因此查詢效率比較高。 能夠看到,當數據量變大時,查詢效率明顯降低。

這裏咱們確認下是否使用到了索引

mysql> explain select * from user order by id limit 2000000,10;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
|  1 | SIMPLE      | user  | NULL       | index | NULL          | PRIMARY | 4       | NULL | 2000010 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
複製代碼

能夠看到用了全索引掃描,共查詢了2000010行數據。

4. 優化

咱們根據MYSQL自帶的一種query診斷分析工具查看下sql語句執行各個操做的耗時詳情。能夠看到查詢獲取到的2000010條記錄都返回給客戶端了,耗時主要集中在Sending data階段。可是客戶端只須要10條數據,咱們可否只給客戶端返回10條數據呢?

mysql> show profiles;
+----------+------------+---------------------------------------------------------+
| Query_ID | Duration   | Query                                                   |
+----------+------------+---------------------------------------------------------+
|        1 | 0.00012300 | SET profiling = 1                                       |
|        2 | 0.00009200 | SET profiling = 1                                       |
|        3 | 0.35689500 | select * from user order by id limit 2000000,10         |
|        4 | 0.00023900 | explain select * from user order by id limit 2000000,10 |
+----------+------------+---------------------------------------------------------+
4 rows in set, 1 warning (0.00 sec)

mysql> show profile for query 3;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000071 |
| checking permissions | 0.000007 |
| Opening tables       | 0.000012 |
| init                 | 0.000017 |
| System lock          | 0.000008 |
| optimizing           | 0.000005 |
| statistics           | 0.000024 |
| preparing            | 0.000016 |
| Sorting result       | 0.000004 |
| executing            | 0.000003 |
| Sending data         | 0.356653 |
| end                  | 0.000013 |
| query end            | 0.000005 |
| closing tables       | 0.000008 |
| freeing items        | 0.000019 |
| cleaning up          | 0.000030 |
+----------------------+----------+
16 rows in set, 1 warning (0.00 sec)
複製代碼

網上的優化方案: 子查詢 + 覆蓋索引

mysql> select * from user where id > (select id from user order by id limit 2000000, 1) limit 10;
+---------+--------------+---------+
| id      | username     | score   |
+---------+--------------+---------+
| 2000002 | user-2000002 | 2000002 |
| 2000003 | user-2000003 | 2000003 |
| 2000004 | user-2000004 | 2000004 |
| 2000005 | user-2000005 | 2000005 |
| 2000006 | user-2000006 | 2000006 |
| 2000007 | user-2000007 | 2000007 |
| 2000008 | user-2000008 | 2000008 |
| 2000009 | user-2000009 | 2000009 |
| 2000010 | user-2000010 | 2000010 |
| 2000011 | user-2000011 | 2000011 |
+---------+--------------+---------+
10 rows in set (0.29 sec)

mysql> explain select * from user where id > (select id from user order by id limit 2000000, 1) limit 10;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
|  1 | PRIMARY     | user  | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL | 1495997 |   100.00 | Using where |
|  2 | SUBQUERY    | user  | NULL       | index | NULL          | PRIMARY | 4       | NULL | 2000001 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
2 rows in set, 1 warning (0.30 sec)
複製代碼

然而並無提高查詢性能。沒看到問題出在哪裏呢?從執行計劃能夠看出,索引和咱們指望是一致的。rows這裏檢索了不少行。單獨看下子查詢

mysql> select id from user order by id limit 2000000, 1;
+---------+
| id      |
+---------+
| 2000001 |
+---------+
1 row in set (0.29 sec)

mysql> explain select id from user order by id limit 2000000, 1;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | user  | NULL       | index | NULL          | PRIMARY | 4       | NULL | 2000001 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
複製代碼

這裏能夠看出子查詢即便走了覆蓋索引,依舊消耗3s左右,我以爲這就是正常的索引IO花費的時間。沒找到官方測試數據作對比,以及MySQL一次IO查詢花費的時間來作對比。

理論上int主鍵一頁能夠存儲1000個鍵,根常駐內存,那麼B+Tree第二層大概100W個鍵,測試數據在200W的分頁,理論上須要2次IO能夠找到數據。2次IO花費的時間是3s的話,1次應該在1.5s左右, 咱們查詢下99W左右的分頁看是否符合假想。

mysql> select id from user order by id limit 990000,1;
+--------+
| id     |
+--------+
| 990001 |
+--------+
1 row in set (0.15 sec)
複製代碼

因此這裏筆者大膽的猜測結果是正常開銷

4、最後

原本想覆盤網上的分頁優化方案是否可靠,可是預期結果仍是有區別。但願聰明的讀者有不一樣看法的不吝賜教。公衆號裏有筆者的微信二維碼。

相關文章
相關標籤/搜索