MySQL 5.6.3提供了對SQL語句的跟蹤功能,經過trace文件能夠進一步瞭解優化器是如何選擇某個執行計劃的,和Oracle的10053事件相似。使用時須要先打開設置,而後執行一次SQL,最後查看INFORMATION_SCHEMA.OPTIMIZER_TRACE表的內容。須要注意的是,該表爲臨時表,只能在當前會話進行查詢,每次查詢返回的都是最近一次執行的SQL語句。mysql
設置時相關的參數:sql
mysql> show variables like '%trace%';json
+------------------------------+----------------------------------------------------------------------------+app
| Variable_name | Value |ide
+------------------------------+----------------------------------------------------------------------------+函數
| optimizer_trace | enabled=off,one_line=off |優化
| optimizer_trace_features | greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on |ui
| optimizer_trace_limit | 1 |this
| optimizer_trace_max_mem_size | 16384 |.net
| optimizer_trace_offset | -1 |
+------------------------------+----------------------------------------------------------------------------+
5 rows in set (0.02 sec)
如下是打開設置的命令:
SET optimizer_trace='enabled=on'; #打開設置
SET OPTIMIZER_TRACE_MAX_MEM_SIZE=1000000; #最大內存根據實際狀況而定, 能夠不設置
SET END_MARKERS_IN_JSON=ON; #增長JSON格式註釋,默認爲OFF
SET optimizer_trace_limit = 1;
MySQL索引選擇不正確並詳細解析OPTIMIZER_TRACE格式
http://blog.csdn.net/melody_mr/article/details/48950601
一 表結構以下:
CREATE TABLE t_audit_operate_log (
Fid bigint(16) AUTO_INCREMENT,
Fcreate_time int(10) unsigned NOT NULL DEFAULT '0',
Fuser varchar(50) DEFAULT '',
Fip bigint(16) DEFAULT NULL,
Foperate_object_id bigint(20) DEFAULT '0',
PRIMARY KEY (Fid),
KEY indx_ctime (Fcreate_time),
KEY indx_user (Fuser),
KEY indx_objid (Foperate_object_id),
KEY indx_ip (Fip)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
執行查詢:
MySQL> explain select count(*) from t_audit_operate_log where Fuser='XX@XX.com' and Fcreate_time>=1407081600 and Fcreate_time<=1407427199\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t_audit_operate_log
type: ref
possible_keys: indx_ctime,indx_user
key: indx_user
key_len: 153
ref: const
rows: 2007326
Extra: Using where
發現,使用了一個不合適的索引, 不是很理想,因而改爲指定索引:
mysql> explain select count(*) from t_audit_operate_log use index(indx_ctime) where Fuser='CY6016@cyou-inc.com' and Fcreate_time>=1407081600 and Fcreate_time<=1407427199\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t_audit_operate_log
type: range
possible_keys: indx_ctime
key: indx_ctime
key_len: 5
ref: NULL
rows: 670092
Extra: Using where
實際執行耗時,後者比前者快了接近10
問題: 很奇怪,優化器爲什麼不選擇使用 indx_ctime 索引,而選擇了明顯會掃描更多行的 indx_user 索引。
分析2個索引的數據量以下: 兩個條件的惟一性對比:
select count(*) from t_audit_operate_log where Fuser='XX@XX.com';
+----------+
| count(*) |
+----------+
| 1238382 |
+----------+
select count(*) from t_audit_operate_log where Fcreate_time>=1407254400 and Fcreate_time<=1407427199;
+----------+
| count(*) |
+----------+
| 198920 |
+----------+
顯然,使用索引indx_ctime好於indx_user,但MySQL卻選擇了indx_user. 爲何?
因而,使用 OPTIMIZER_TRACE進一步探索.
二 OPTIMIZER_TRACE的過程說明
以本處事例簡要說明OPTIMIZER_TRACE的過程.
查看OPTIMIZER_TRACE方法:
1.set optimizer_trace='enabled=on'; --- 開啓trace
2.set optimizer_trace_max_mem_size=1000000; --- 設置trace大小
3.set end_markers_in_json=on; --- 增長trace中註釋
4.select * from information_schema.optimizer_trace\G;
[plain] view plain copy
三 其餘一個類似問題
單表掃描,使用ref和range從索引獲取數據一例
http://blog.163.com/li_hx/blog/static/183991413201461853637715/
四 問題的解決方式
遇到單表上有多個索引的時候,在MySQL5.6.20版本以前的版本,須要人工強制使用索引,以達到最好的效果.
注:原創地址 http://blog.csdn.net/xj626852095/article/details/52767963
我最近遇到線上一個select語句,explain選擇的索引是同樣的,這個索引是兩個字段
好比select * from t1 where a='xxx' and b>='123123',索引是a_b(a,b)
默認狀況explain顯示的索引訪問方式是ref,而force index a_b則使用了range,range訪問效果實際更好
--貼查詢執行計劃所有內容
| 1 | SIMPLE | subscribe_f8 | ref | PRIMARY,uid | uid | 8 | const | 13494670 | Using where; Using index
force index 以後
| 1 | SIMPLE | subscribe_f8 | range | uid | uid | 12 | NULL | 13494674 | Using where; Using index |
--2者計劃差異不大
就是type從ref變成range了. force 以前key_length是8,force以後是12 . 其實應該是12纔是合理的
--版本支持expalin format=JSON命令嗎?支持則試試,有更詳細的代價計算值
--show create table 看看?
發來詳細的執行計劃,見 執行計劃結果一 。
執行計劃結果一
select uid_from,create_time from subscribe_f8 where uid=12345678 and create_time > '2013-09-08 09:54:07.0' order by create_time asc limit 5000 | { "steps": [ { "join_preparation": { "select#": 1, "steps": [ { "expanded_query": "/* select#1 */ select `subscribe_f8`.`uid_from` AS `uid_from`,`subscribe_f8`.`create_time` AS `create_time` from `subscribe_f8` where ((`subscribe_f8`.`uid` = 12345678) and (`subscribe_f8`.`create_time` > '2013-09-08 09:54:07.0')) order by `subscribe_f8`.`create_time` limit 5000" } ] } }, { ...... { "considered_execution_plans": [ { "plan_prefix": [ ], "table": "`subscribe_f8`", "best_access_path": { "considered_access_paths": [ { "access_type": "ref", "index": "PRIMARY", "rows": 1.36e7, "cost": 3.01e6, "chosen": true }, { "access_type": "ref", "index": "uid", "rows": 1.36e7, "cost": 2.77e6, "chosen": true }, { "access_type": "range", "rows": 1.02e7, "cost": 5.46e6, "chosen": false } ] }, "cost_for_plan": 2.77e6, "rows_for_plan": 1.36e7, "chosen": true } ] }, ... }
分析: 這個問題,執行計劃指示使用ref效果更好,但實際執行時,指定使用range方式sql執行效率更高一些。 並且,一般狀況下,ref的效率比range的效率要高,因此MySQL優先使用ref方式(這是一條啓發式規則)。 但到底是否使用ref或range,MySQL還須要經過代價估算進行比較再作決定。 代價估算是一個求近似值的過程,由於計算基於的一些值是估算得來的,並不十分精準,這就形成了計算偏差。 可是,若是索引的選擇率較低(如低於10%),則使用ref的效果好於range的效果的機率大。反過來講,若是索引的選擇率較高,則ref未必range的效果好,可是因計算偏差,使得執行計劃獲得了ref好於range的錯誤結論。 進一步講,若是索引的選擇率很高(如遠高於10%,這是大概值,不精確),甚至數據存放是順序連續的,有可能的是,儘管索引存在,但索引掃描的效果還差與全表掃描。 其餘說明:儘管這個事例中的SQL使用了LIMIT子句,但其對ref和range方式的計算和比較,不構成影響。
進一步瞭解狀況:
--這個查詢,能獲得多少行元組? 佔全表的全部元組的百分比是多少? 去掉limit後,符合那個時間段的記錄數佔那個uid的88%,佔全表記錄數的的40%
進一步分析: 從更詳細的查詢執行計劃看,查詢執行計劃結果一,顯示了ref的cost是'2.77e6', 而range的cost是’5.46e6‘,這說明優化器理所固然地認爲ref比range好。 但是,鑑於實際上索引選擇率過高,使得使用索引已經沒有意義(但優化器不知道這一信息),因此實際上使用’force index (uid) ‘會獲得更好的執行效果。 這就是這個想象的答案。
深刻代碼分析: 在best_access_path()函數中,比較了各類路徑的代價。因此是使用ref仍是range甚至full table scan,在這個函數中有計算和比較。 摘錄代碼中部分註釋以下,能代表一些含義。 /* Don't test table scan if it can't be better. Prefer key lookup if we would use the same key for scanning.
Don't do a table scan on InnoDB tables, if we can read the used parts of the row from any of the used index. This is because table scans uses index and we would not win anything by using a table scan. The only exception is INDEX_MERGE quick select. We can not say for sure that INDEX_MERGE quick select is always faster than ref access. So it's necessary to check if ref access is more expensive.
We do not consider index/table scan or range access if:
1a) The best 'ref' access produces fewer records than a table scan (or index scan, or range acces), and 1b) The best 'ref' executed for all partial row combinations, is cheaper than a single scan. The rationale for comparing
COST(ref_per_partial_row) * E(#partial_rows) vs COST(single_scan)
is that if join buffering is used for the scan, then scan will not be performed E(#partial_rows) times, but E(#partial_rows)/E(#partial_rows_fit_in_buffer). At this point in best_access_path() we don't know this ratio, but it is somewhere between 1 and E(#partial_rows). To avoid overestimating the total cost of scanning, the heuristic used here has to assume that the ratio is 1. A more fine-grained cost comparison will be done later in this function. (2) This doesn't hold: the best way to perform table scan is to to perform 'range' access using index IDX, and the best way to perform 'ref' access is to use the same index IDX, with the same or more key parts. (note: it is not clear how this rule is/should be extended to index_merge quick selects) (3) See above note about InnoDB. (4) NOT ("FORCE INDEX(...)" is used for table and there is 'ref' access path, but there is no quick select) If the condition in the above brackets holds, then the only possible "table scan" access method is ALL/index (there is no quick select). Since we have a 'ref' access path, and FORCE INDEX instructs us to choose it over ALL/index, there is no need to consider a full table scan. */