【MySQL】如何對SQL語句進行跟蹤（trace）？

時間 2021-04-12

標籤 mysql sql json app ide 函數優化 this .net 欄目 MySQL 简体版

原文原文鏈接

MySQL 5.6.3提供了對SQL語句的跟蹤功能，經過trace文件能夠進一步瞭解優化器是如何選擇某個執行計劃的，和Oracle的10053事件相似。使用時須要先打開設置，而後執行一次SQL，最後查看INFORMATION_SCHEMA.OPTIMIZER_TRACE表的內容。須要注意的是，該表爲臨時表，只能在當前會話進行查詢，每次查詢返回的都是最近一次執行的SQL語句。mysql

設置時相關的參數：sql

mysql> show variables like '%trace%';json

+------------------------------+----------------------------------------------------------------------------+app

| Variable_name | Value |ide

+------------------------------+----------------------------------------------------------------------------+函數

| optimizer_trace | enabled=off,one_line=off |優化

| optimizer_trace_features | greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on |ui

| optimizer_trace_limit | 1 |this

| optimizer_trace_max_mem_size | 16384 |.net

| optimizer_trace_offset | -1 |

+------------------------------+----------------------------------------------------------------------------+

5 rows in set (0.02 sec)

如下是打開設置的命令：

SET optimizer_trace='enabled=on'; #打開設置

SET OPTIMIZER_TRACE_MAX_MEM_SIZE=1000000; #最大內存根據實際狀況而定，能夠不設置

SET END_MARKERS_IN_JSON=ON; #增長JSON格式註釋，默認爲OFF

SET optimizer_trace_limit = 1;

MySQL索引選擇不正確並詳細解析OPTIMIZER_TRACE格式

http://blog.csdn.net/melody_mr/article/details/48950601

一表結構以下:

CREATE TABLE t_audit_operate_log (
Fid bigint(16) AUTO_INCREMENT,
Fcreate_time int(10) unsigned NOT NULL DEFAULT '0',
Fuser varchar(50) DEFAULT '',
Fip bigint(16) DEFAULT NULL,
Foperate_object_id bigint(20) DEFAULT '0',
PRIMARY KEY (Fid),
KEY indx_ctime (Fcreate_time),
KEY indx_user (Fuser),
KEY indx_objid (Foperate_object_id),
KEY indx_ip (Fip)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

執行查詢:

MySQL> explain select count(*) from t_audit_operate_log where Fuser='XX@XX.com' and Fcreate_time>=1407081600 and Fcreate_time<=1407427199\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: t_audit_operate_log

type: ref

possible_keys: indx_ctime,indx_user

key: indx_user

key_len: 153

ref: const

rows: 2007326

Extra: Using where

發現,使用了一個不合適的索引, 不是很理想，因而改爲指定索引：

mysql> explain select count(*) from t_audit_operate_log use index(indx_ctime) where Fuser='CY6016@cyou-inc.com' and Fcreate_time>=1407081600 and Fcreate_time<=1407427199\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: t_audit_operate_log

type: range

possible_keys: indx_ctime

key: indx_ctime

key_len: 5

ref: NULL

rows: 670092

Extra: Using where

實際執行耗時，後者比前者快了接近10

問題: 很奇怪，優化器爲什麼不選擇使用 indx_ctime 索引，而選擇了明顯會掃描更多行的 indx_user 索引。

分析2個索引的數據量以下: 兩個條件的惟一性對比：

select count(*) from t_audit_operate_log where Fuser='XX@XX.com';
+----------+
| count(*) |
+----------+
| 1238382 |
+----------+

select count(*) from t_audit_operate_log where Fcreate_time>=1407254400 and Fcreate_time<=1407427199;
+----------+
| count(*) |
+----------+
| 198920 |
+----------+

顯然,使用索引indx_ctime好於indx_user,但MySQL卻選擇了indx_user. 爲何?

因而,使用 OPTIMIZER_TRACE進一步探索.

二 OPTIMIZER_TRACE的過程說明

以本處事例簡要說明OPTIMIZER_TRACE的過程.

查看OPTIMIZER_TRACE方法：

1.set optimizer_trace='enabled=on'; --- 開啓trace

2.set optimizer_trace_max_mem_size=1000000; --- 設置trace大小

3.set end_markers_in_json=on; --- 增長trace中註釋

4.select * from information_schema.optimizer_trace\G;

[plain] view plain copy

{\
"steps": [\
{\
"join_preparation": {\ ---優化準備工做
"select#": 1,\
"steps": [\
{\
"expanded_query": "/* select#1 */ select count(0) AS `count(*)` from `t_audit_operate_log` where ((`t_audit_operate_log`.`Fuser` = 'XX@XX.com') and (`t_audit_operate_log`.`Fcreate_time` >= 1407081600) and (`t_audit_operate_log`.`Fcreate_time` <= 1407427199))"\
}\
] /* steps */\
} /* join_preparation */\
},\
{\
"join_optimization": {\ ---優化工做的主要階段,包括邏輯優化和物理優化兩個階段
"select#": 1,\
"steps": [\ ---優化工做的主要階段, 邏輯優化階段
{\
"condition_processing": {\ ---邏輯優化,條件化簡
"condition": "WHERE",\
"original_condition": "((`t_audit_operate_log`.`Fuser` = 'XX@XX.com') and (`t_audit_operate_log`.`Fcreate_time` >= 1407081600) and (`t_audit_operate_log`.`Fcreate_time` <= 1407427199))",\
"steps": [\
{\
"transformation": "equality_propagation",\ ---邏輯優化,條件化簡,等式處理
"resulting_condition": "((`t_audit_operate_log`.`Fuser` = 'XX@XX.com') and (`t_audit_operate_log`.`Fcreate_time` >= 1407081600) and (`t_audit_operate_log`.`Fcreate_time` <= 1407427199))"\
},\
{\
"transformation": "constant_propagation",\ ---邏輯優化,條件化簡,常量處理
"resulting_condition": "((`t_audit_operate_log`.`Fuser` = 'XX@XX.com') and (`t_audit_operate_log`.`Fcreate_time` >= 1407081600) and (`t_audit_operate_log`.`Fcreate_time` <= 1407427199))"\
},\
{\
"transformation": "trivial_condition_removal",\ ---邏輯優化,條件化簡,條件去除
"resulting_condition": "((`t_audit_operate_log`.`Fuser` = 'XX@XX.com') and (`t_audit_operate_log`.`Fcreate_time` >= 1407081600) and (`t_audit_operate_log`.`Fcreate_time` <= 1407427199))"\
}\
] /* steps */\
} /* condition_processing */\
},\ ---邏輯優化,條件化簡,結束
{\
"table_dependencies": [\ ---邏輯優化, 找出表之間的相互依賴關係. 非直接可用的優化方式.
{\
"table": "`t_audit_operate_log`",\
"row_may_be_null": false,\
"map_bit": 0,\
"depends_on_map_bits": [\
] /* depends_on_map_bits */\
}\
] /* table_dependencies */\
},\
{\
"ref_optimizer_key_uses": [\ ---邏輯優化, 找出備選的索引
{\
"table": "`t_audit_operate_log`",\
"field": "Fuser",\
"equals": "'XX@XX.com'",\
"null_rejecting": false\
}\
] /* ref_optimizer_key_uses */\
},\
{\
"rows_estimation": [\ ---邏輯優化, 估算每一個表的元組個數. 單表上進行全表掃描和索引掃描的代價估算. 每一個索引都估算索引掃描代價
{\
"table": "`t_audit_operate_log`",\
"range_analysis": {\
"table_scan": {\---邏輯優化, 估算每一個表的元組個數. 單表上進行全表掃描的代價
"rows": 8150516,\
"cost": 1.73e6\
} /* table_scan */,\
"potential_range_indices": [\ ---邏輯優化, 列出備選的索引. 後續版本字符串變爲potential_range_indexes
{\
"index": "PRIMARY",\---邏輯優化, 本行代表主鍵索引不可用
"usable": false,\
"cause": "not_applicable"\
},\
{\
"index": "indx_ctime",\---邏輯優化, 索引indx_ctime
"usable": true,\
"key_parts": [\
"Fcreate_time",\
"Fid"\
] /* key_parts */\
},\
{\
"index": "indx_user",\---邏輯優化, 索引indx_user
"usable": true,\
"key_parts": [\
"Fuser",\
"Fid"\
] /* key_parts */\
},\
{\
"index": "indx_objid",\---邏輯優化, 索引
"usable": false,\
"cause": "not_applicable"\
},\
{\
"index": "indx_ip",\---邏輯優化, 索引
"usable": false,\
"cause": "not_applicable"\
}\
] /* potential_range_indices */,\
"setup_range_conditions": [\ ---邏輯優化, 若是有可下推的條件,則帶條件考慮範圍查詢
] /* setup_range_conditions */,\
"group_index_range": {\---邏輯優化, 如帶有GROUPBY或DISTINCT,則考慮是否有索引可優化這種操做. 並考慮帶有MIN/MAX的狀況
"chosen": false,\
"cause": "not_group_by_or_distinct"\
} /* group_index_range */,\
"analyzing_range_alternatives": {\---邏輯優化,開始計算每一個索引作範圍掃描的花費(等值比較是範圍掃描的特例)
"range_scan_alternatives": [\
{\
"index": "indx_ctime",\ ---[A]
"ranges": [\
"1407081600 <= Fcreate_time <= 1407427199"\
] /* ranges */,\
"index_dives_for_eq_ranges": true,\
"rowid_ordered": false,\
"using_mrr": true,\
"index_only": false,\
"rows": 688362,\
"cost": 564553,\ ---邏輯優化,這個索引的代價最小
"chosen": true\ ---邏輯優化,這個索引的代價最小,被選中. (比前面的table_scan 和其餘索引的代價都小)
},\
{\
"index": "indx_user",\
"ranges": [\
"XX@XX.com <= Fuser <= XX@XX.com"\
] /* ranges */,\
"index_dives_for_eq_ranges": true,\
"rowid_ordered": true,\
"using_mrr": true,\
"index_only": false,\
"rows": 1945894,\
"cost": 1.18e6,\
"chosen": false,\
"cause": "cost"\
}\
] /* range_scan_alternatives */,\
"analyzing_roworder_intersect": {\
"usable": false,\
"cause": "too_few_roworder_scans"\
} /* analyzing_roworder_intersect */\
} /* analyzing_range_alternatives */,\---邏輯優化,開始計算每一個索引作範圍掃描的花費. 這項工做結算
"chosen_range_access_summary": {\---邏輯優化,開始計算每一個索引作範圍掃描的花費. 總結本階段最優的.
"range_access_plan": {\
"type": "range_scan",\
"index": "indx_ctime",\
"rows": 688362,\
"ranges": [\
"1407081600 <= Fcreate_time <= 1407427199"\
] /* ranges */\
} /* range_access_plan */,\
"rows_for_plan": 688362,\
"cost_for_plan": 564553,\
"chosen": true\ -- 這裏看到的cost和rows都比 indx_user 要來的小不少---這個和[A]處是同樣的,是信息彙總.
} /* chosen_range_access_summary */\
} /* range_analysis */\
}\
] /* rows_estimation */\ ---邏輯優化, 估算每一個表的元組個數. 行估算結束
},\
{\
"considered_execution_plans": [\ ---物理優化, 開始多表鏈接的物理優化計算
{\
"plan_prefix": [\
] /* plan_prefix */,\
"table": "`t_audit_operate_log`",\
"best_access_path": {\
"considered_access_paths": [\
{\
"access_type": "ref",\ ---物理優化, 計算indx_user索引上使用ref方查找的花費,
"index": "indx_user",\
"rows": 1.95e6,\
"cost": 683515,\
"chosen": true\
},\ ---物理優化, 本應該比較全部的可用索引,即打印出多個格式相同的但索引名不一樣的內容,這裏卻沒有。推測是bug--沒有遍歷每個索引.
{\
"access_type": "range",\---物理優化,猜想對應的是indx_time（沒有實例可進行調試，對比5.7的跟蹤信息猜想而得）
"rows": 516272,\
"cost": 702225,\---物理優化，代價大於了ref方式的683515，因此沒有被選擇
"chosen": false\ -- cost比上面看到的增長了不少，但rows沒什麼變化 ---物理優化，此索引沒有被選擇
}\
] /* considered_access_paths */\
} /* best_access_path */,\
"cost_for_plan": 683515,\ ---物理優化，彙總在best_access_path 階段獲得的結果
"rows_for_plan": 1.95e6,\
"chosen": true\ -- cost比上面看到的居然小了不少？雖然rows沒啥變化 ---物理優化，彙總在best_access_path 階段獲得的結果
}\
] /* considered_execution_plans */\
},\
{\
"attaching_conditions_to_tables": {\---邏輯優化，儘可能把條件綁定到對應的表上
} /* attaching_conditions_to_tables */\
},\
{\
"refine_plan": [\
{\
"table": "`t_audit_operate_log`",\---邏輯優化，下推索引條件"pushed_index_condition"；其餘條件附加到表上作爲過濾條件"table_condition_attached"
}\
] /* refine_plan */\
}\
] /* steps */\
} /* join_optimization */\ \---邏輯優化和物理優化結束
},\
{\
"join_explain": {} /* join_explain */\
}\
] /* steps */\

三 其餘一個類似問題

單表掃描，使用ref和range從索引獲取數據一例

http://blog.163.com/li_hx/blog/static/183991413201461853637715/

四問題的解決方式

遇到單表上有多個索引的時候,在MySQL5.6.20版本以前的版本,須要人工強制使用索引,以達到最好的效果.

注:原創地址 http://blog.csdn.net/xj626852095/article/details/52767963

我最近遇到線上一個select語句，explain選擇的索引是同樣的，這個索引是兩個字段
好比select * from t1 where a='xxx' and b>='123123',索引是a_b(a,b)
默認狀況explain顯示的索引訪問方式是ref，而force index a_b則使用了range，range訪問效果實際更好
--貼查詢執行計劃所有內容
| 1 | SIMPLE | subscribe_f8 | ref | PRIMARY,uid | uid | 8 | const | 13494670 | Using where; Using index
force index 以後
| 1 | SIMPLE | subscribe_f8 | range | uid | uid | 12 | NULL | 13494674 | Using where; Using index |
--2者計劃差異不大
就是type從ref變成range了. force 以前key_length是8，force以後是12 . 其實應該是12纔是合理的
--版本支持expalin format=JSON命令嗎？支持則試試，有更詳細的代價計算值
--show create table 看看？

發來詳細的執行計劃，見執行計劃結果一。

執行計劃結果一

select uid_from,create_time from subscribe_f8 where uid=12345678 and create_time > '2013-09-08 09:54:07.0' order by create_time asc limit 5000 | { "steps": [ { "join_preparation": { "select#": 1, "steps": [ { "expanded_query": "/* select#1 */ select `subscribe_f8`.`uid_from` AS `uid_from`,`subscribe_f8`.`create_time` AS `create_time` from `subscribe_f8` where ((`subscribe_f8`.`uid` = 12345678) and (`subscribe_f8`.`create_time` > '2013-09-08 09:54:07.0')) order by `subscribe_f8`.`create_time` limit 5000" } ] } }, { ...... { "considered_execution_plans": [ { "plan_prefix": [ ], "table": "`subscribe_f8`", "best_access_path": { "considered_access_paths": [ { "access_type": "ref", "index": "PRIMARY", "rows": 1.36e7, "cost": 3.01e6, "chosen": true }, { "access_type": "ref", "index": "uid", "rows": 1.36e7, "cost": 2.77e6, "chosen": true }, { "access_type": "range", "rows": 1.02e7, "cost": 5.46e6, "chosen": false } ] }, "cost_for_plan": 2.77e6, "rows_for_plan": 1.36e7, "chosen": true } ] }, ... }

分析： 這個問題，執行計劃指示使用ref效果更好，但實際執行時，指定使用range方式sql執行效率更高一些。
並且，一般狀況下，ref的效率比range的效率要高，因此MySQL優先使用ref方式（這是一條啓發式規則）。
但到底是否使用ref或range，MySQL還須要經過代價估算進行比較再作決定。
代價估算是一個求近似值的過程，由於計算基於的一些值是估算得來的，並不十分精準，這就形成了計算偏差。
可是，若是索引的選擇率較低（如低於10%），則使用ref的效果好於range的效果的機率大。反過來講，若是索引的選擇率較高，則ref未必range的效果好，可是因計算偏差，使得執行計劃獲得了ref好於range的錯誤結論。
進一步講，若是索引的選擇率很高（如遠高於10%，這是大概值，不精確），甚至數據存放是順序連續的，有可能的是，儘管索引存在，但索引掃描的效果還差與全表掃描。
其餘說明：儘管這個事例中的SQL使用了LIMIT子句，但其對ref和range方式的計算和比較，不構成影響。

進一步瞭解狀況：

--這個查詢,能獲得多少行元組?  佔全表的全部元組的百分比是多少?
去掉limit後，符合那個時間段的記錄數佔那個uid的88%，佔全表記錄數的的40%

進一步分析： 從更詳細的查詢執行計劃看，查詢執行計劃結果一，顯示了ref的cost是'2.77e6', 而range的cost是’5.46e6‘，這說明優化器理所固然地認爲ref比range好。
但是，鑑於實際上索引選擇率過高，使得使用索引已經沒有意義（但優化器不知道這一信息），因此實際上使用’force index (uid) ‘會獲得更好的執行效果。
這就是這個想象的答案。

深刻代碼分析： 在best_access_path()函數中，比較了各類路徑的代價。因此是使用ref仍是range甚至full table scan，在這個函數中有計算和比較。
摘錄代碼中部分註釋以下，能代表一些含義。
 /*
    Don't test table scan if it can't be better.
    Prefer key lookup if we would use the same key for scanning.

    Don't do a table scan on InnoDB tables, if we can read the used
    parts of the row from any of the used index.
    This is because table scans uses index and we would not win
    anything by using a table scan. The only exception is INDEX_MERGE
    quick select. We can not say for sure that INDEX_MERGE quick select
    is always faster than ref access. So it's necessary to check if
    ref access is more expensive.

    We do not consider index/table scan or range access if:

    1a) The best 'ref' access produces fewer records than a table scan
        (or index scan, or range acces), and
    1b) The best 'ref' executed for all partial row combinations, is
        cheaper than a single scan. The rationale for comparing

        COST(ref_per_partial_row) * E(#partial_rows)
           vs
        COST(single_scan)

        is that if join buffering is used for the scan, then scan will
        not be performed E(#partial_rows) times, but
        E(#partial_rows)/E(#partial_rows_fit_in_buffer). At this point
        in best_access_path() we don't know this ratio, but it is
        somewhere between 1 and E(#partial_rows). To avoid
        overestimating the total cost of scanning, the heuristic used
        here has to assume that the ratio is 1. A more fine-grained
        cost comparison will be done later in this function.
    (2) This doesn't hold: the best way to perform table scan is to to perform
        'range' access using index IDX, and the best way to perform 'ref'
        access is to use the same index IDX, with the same or more key parts.
        (note: it is not clear how this rule is/should be extended to
        index_merge quick selects)
    (3) See above note about InnoDB.
    (4) NOT ("FORCE INDEX(...)" is used for table and there is 'ref' access
             path, but there is no quick select)
        If the condition in the above brackets holds, then the only possible
        "table scan" access method is ALL/index (there is no quick select).
        Since we have a 'ref' access path, and FORCE INDEX instructs us to
        choose it over ALL/index, there is no need to consider a full table
        scan.
  */