MySQL分頁查詢做爲Java面試的一道高頻面試題,這裏有必要實踐一下,畢竟實踐出真知。
不少同窗在作測試時苦於沒有海量數據,官方實際上是有一套測試庫的。mysql
<!-- more-->git
這裏模擬數據分2種狀況導入,若是隻是須要數據測試下,那麼推薦官方數據。若是官方數據知足不了需求的話,那麼咱們本身模擬數據。github
該測試庫含有6個表。sql
首先進入 employees_db
, 執行導入數據指令數據庫
mysql -uroot -proot -t < employees.sql
有些環境可能會報錯緩存
ERROR 1193 (HY000) at line 38: Unknown system variable 'storage_engine'
鏈接mysql查看默認引擎,發現不是本地環境的問題。bash
mysql> show variables like '%engine%'; +----------------------------------+--------+ | Variable_name | Value | +----------------------------------+--------+ | default_storage_engine | InnoDB | | default_tmp_storage_engine | InnoDB | | disabled_storage_engines | | | internal_tmp_disk_storage_engine | InnoDB | +----------------------------------+--------+ 4 rows in set (0.01 sec)
修改 employees.sql
腳本微信
set default_storage_engine = InnoDB; -- set storage_engine = MyISAM; -- set storage_engine = Falcon; -- set storage_engine = PBXT; -- set storage_engine = Maria; select CONCAT('storage engine: ', @@default_storage_engine) as INFO;
再次執行發現導入成功工具
➜ employees_db mysql -uroot -proot -t < employees.sql mysql: [Warning] Using a password on the command line interface can be insecure. +-----------------------------+ | INFO | +-----------------------------+ | CREATING DATABASE STRUCTURE | +-----------------------------+ +------------------------+ | INFO | +------------------------+ | storage engine: InnoDB | +------------------------+ +---------------------+ | INFO | +---------------------+ | LOADING departments | +---------------------+ +-------------------+ | INFO | +-------------------+ | LOADING employees | +-------------------+ +------------------+ | INFO | +------------------+ | LOADING dept_emp | +------------------+ +----------------------+ | INFO | +----------------------+ | LOADING dept_manager | +----------------------+ +----------------+ | INFO | +----------------+ | LOADING titles | +----------------+ +------------------+ | INFO | +------------------+ | LOADING salaries | +------------------+
驗證結果(配置修改同上)
➜ employees_db mysql -uroot -proot -t < test_employees_sha.sql mysql: [Warning] Using a password on the command line interface can be insecure. +----------------------+ | INFO | +----------------------+ | TESTING INSTALLATION | +----------------------+ +--------------+------------------+------------------------------------------+ | table_name | expected_records | expected_crc | +--------------+------------------+------------------------------------------+ | departments | 9 | 4b315afa0e35ca6649df897b958345bcb3d2b764 | | dept_emp | 331603 | d95ab9fe07df0865f592574b3b33b9c741d9fd1b | | dept_manager | 24 | 9687a7d6f93ca8847388a42a6d8d93982a841c6c | | employees | 300024 | 4d4aa689914d8fd41db7e45c2168e7dcb9697359 | | salaries | 2844047 | b5a1785c27d75e33a4173aaa22ccf41ebd7d4a9f | | titles | 443308 | d12d5f746b88f07e69b9e36675b6067abb01b60e | +--------------+------------------+------------------------------------------+
咱們能夠看到emp大概有33萬條數據。
這裏咱們能夠選擇存儲過程批量導入。
首先建立一張表
drop table if exists `user`; create table `user`( `id` int unsigned auto_increment, `username` varchar(64) not null default '', `score` int(11) not null default 0, primary key(`id`) )ENGINE = InnoDB;
建立存儲過程
DROP PROCEDURE IF EXISTS batchInsert; delimiter $$ -- 聲明存儲過程結束符號 create procedure batchInsert() -- 建立存儲過程 begin -- 存儲過程主體開始 declare num int; -- 聲明變量 set num=1; -- 初始值 while num<=3000000 do -- 循環條件 insert into user(`username`,`score`) values(concat('user-', num),num); -- 執行語句 set num=num+1; -- 循環變量自增 end while; -- 結束循環 end$$ -- 存儲過程主體結束 delimiter ; #恢復;表示結束 CALL batchInsert; -- 執行存儲過程
能夠看到測試300W條數據大概1046s插入完成。好吧,原本計劃導入1000w的結果時間太長了。
咱們拿現有的表 user
進行測試,該表有 300w 條數據。
首先查看下該表結構以及目前存在哪些索引
mysql> desc user; +----------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | username | varchar(30) | NO | | | | | score | int(11) | NO | | 0 | | +----------+------------------+------+-----+---------+----------------+ 3 rows in set (0.00 sec) mysql> show index from user; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | user | 0 | PRIMARY | 1 | id | A | 2991886 | NULL | NULL | | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 1 row in set (0.00 sec)
能夠看到只有 id
主鍵索引。
其次查看是否開啓 緩存
(避免查詢緩存對執行效率產生影響)
mysql> show variables like '%query_cache%'; +------------------------------+---------+ | Variable_name | Value | +------------------------------+---------+ | have_query_cache | YES | | query_cache_limit | 1048576 | | query_cache_min_res_unit | 4096 | | query_cache_size | 1048576 | | query_cache_type | OFF | | query_cache_wlock_invalidate | OFF | +------------------------------+---------+ 6 rows in set (0.00 sec) mysql> show profiles; Empty set, 1 warning (0.00 sec)
have_query_cache
和 query_cache_type
說明支持緩存但並未開啓。show profiles
顯示爲空,說明profiles功能是關閉的。
開啓 profiles
mysql> SET profiling = 1; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> show profiles; +----------+------------+-------------------+ | Query_ID | Duration | Query | +----------+------------+-------------------+ | 1 | 0.00012300 | SET profiling = 1 | +----------+------------+-------------------+ 1 row in set, 1 warning (0.00 sec)
通常咱們最經常使用的分頁查詢的方式爲 order by
+ limit m,n
的方式, 如今咱們測試下分頁性能
select * from user order by score limit 0,10; -- 10 rows in set (0.65 sec) select * from user order by score limit 10000,10; -- 10 rows in set (0.83 sec) select * from user order by score limit 100000,10; -- 10 rows in set (1.03 sec) select * from user order by score limit 1000000,10; -- 10 rows in set (1.14 sec)
這裏咱們確認下是否用到了索引
mysql> explain select * from user order by score limit 1000000,10; +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------+ | 1 | SIMPLE | user | NULL | ALL | NULL | NULL | NULL | NULL | 2991995 | 100.00 | Using filesort | +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------+ 1 row in set, 1 warning (0.00 sec)
能夠看到確實沒有用到索引,全表掃描100W數據分頁大概須要1.14s的時間。
select * from user order by id limit 10000,10; -- 10 rows in set (0.01 sec) select * from user order by id limit 1000000,10; -- 10 rows in set (0.18 sec) select * from user order by id limit 2000000,10; -- 10 rows in set (0.35 sec)
該查詢用到了主鍵索引,因此查詢效率比較高。
能夠看到,當數據量變大時,查詢效率明顯降低。
這裏咱們確認下是否使用到了索引
mysql> explain select * from user order by id limit 2000000,10; +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------+ | 1 | SIMPLE | user | NULL | index | NULL | PRIMARY | 4 | NULL | 2000010 | 100.00 | NULL | +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------+ 1 row in set, 1 warning (0.00 sec)
能夠看到用了全索引掃描,共查詢了2000010行數據。
咱們根據MYSQL自帶的一種query診斷分析工具查看下sql語句執行各個操做的耗時詳情。能夠看到查詢獲取到的2000010條記錄都返回給客戶端了,耗時主要集中在Sending data階段。可是客戶端只須要10條數據,咱們可否只給客戶端返回10條數據呢?
mysql> show profiles; +----------+------------+---------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+---------------------------------------------------------+ | 1 | 0.00012300 | SET profiling = 1 | | 2 | 0.00009200 | SET profiling = 1 | | 3 | 0.35689500 | select * from user order by id limit 2000000,10 | | 4 | 0.00023900 | explain select * from user order by id limit 2000000,10 | +----------+------------+---------------------------------------------------------+ 4 rows in set, 1 warning (0.00 sec) mysql> show profile for query 3; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000071 | | checking permissions | 0.000007 | | Opening tables | 0.000012 | | init | 0.000017 | | System lock | 0.000008 | | optimizing | 0.000005 | | statistics | 0.000024 | | preparing | 0.000016 | | Sorting result | 0.000004 | | executing | 0.000003 | | Sending data | 0.356653 | | end | 0.000013 | | query end | 0.000005 | | closing tables | 0.000008 | | freeing items | 0.000019 | | cleaning up | 0.000030 | +----------------------+----------+ 16 rows in set, 1 warning (0.00 sec)
網上的優化方案: 子查詢 + 覆蓋索引
mysql> select * from user where id > (select id from user order by id limit 2000000, 1) limit 10; +---------+--------------+---------+ | id | username | score | +---------+--------------+---------+ | 2000002 | user-2000002 | 2000002 | | 2000003 | user-2000003 | 2000003 | | 2000004 | user-2000004 | 2000004 | | 2000005 | user-2000005 | 2000005 | | 2000006 | user-2000006 | 2000006 | | 2000007 | user-2000007 | 2000007 | | 2000008 | user-2000008 | 2000008 | | 2000009 | user-2000009 | 2000009 | | 2000010 | user-2000010 | 2000010 | | 2000011 | user-2000011 | 2000011 | +---------+--------------+---------+ 10 rows in set (0.29 sec) mysql> explain select * from user where id > (select id from user order by id limit 2000000, 1) limit 10; +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ | 1 | PRIMARY | user | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 1495997 | 100.00 | Using where | | 2 | SUBQUERY | user | NULL | index | NULL | PRIMARY | 4 | NULL | 2000001 | 100.00 | Using index | +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ 2 rows in set, 1 warning (0.30 sec)
然而並無提高查詢性能。沒看到問題出在哪裏呢?從執行計劃能夠看出,索引和咱們指望是一致的。rows這裏檢索了不少行。單獨看下子查詢
mysql> select id from user order by id limit 2000000, 1; +---------+ | id | +---------+ | 2000001 | +---------+ 1 row in set (0.29 sec) mysql> explain select id from user order by id limit 2000000, 1; +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | user | NULL | index | NULL | PRIMARY | 4 | NULL | 2000001 | 100.00 | Using index | +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)
這裏能夠看出子查詢即便走了覆蓋索引,依舊消耗3s左右,我以爲這就是正常的索引IO花費的時間。沒找到官方測試數據作對比,以及MySQL一次IO查詢花費的時間來作對比。
理論上int主鍵一頁能夠存儲1000個鍵,根常駐內存,那麼B+Tree第二層大概100W個鍵,測試數據在200W的分頁,理論上須要2次IO能夠找到數據。2次IO花費的時間是3s的話,1次應該在1.5s左右, 咱們查詢下99W左右的分頁看是否符合假想。
mysql> select id from user order by id limit 990000,1; +--------+ | id | +--------+ | 990001 | +--------+ 1 row in set (0.15 sec)
因此這裏筆者大膽的猜測結果是正常開銷
原本想覆盤網上的分頁優化方案是否可靠,可是預期結果仍是有區別。但願聰明的讀者有不一樣看法的不吝賜教。公衆號裏有筆者的微信二維碼。