MySQL的統計信息學習總結

統計信息概念html

 

MySQL統計信息是指數據庫經過採樣、統計出來的表、索引的相關信息,例如,表的記錄數、彙集索引page個數、字段的Cardinality....。MySQL在生成執行計劃時,須要根據索引的統計信息進行估算,計算出最低代價(或者說是最小開銷)的執行計劃.MySQL支持有限的索引統計信息,因存儲引擎不一樣而統計信息收集的方式也不一樣. MySQL官方關於統計信息的概念介紹幾乎等同於無,不過對於已經接觸過其它類型數據庫的同窗而言,理解這個概念應該不在話下。相對於其它數據庫而言,MySQL統計信息沒法手工刪除。MySQL 8.0以前的版本,MySQL是沒有直方圖的。mysql

 

統計信息參數sql

 

MySQL的InnoDB存儲引擎的統計信息參數有7(個別版本有8個之多),以下所示:數據庫

 

MySQL 5.6.41 有8個參數:json

 

mysql> show variables like 'innodb_stats%';
+--------------------------------------+-------------+
| Variable_name                        | Value       |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc             | ON          |
| innodb_stats_include_delete_marked   | OFF         |
| innodb_stats_method                  | nulls_equal |
| innodb_stats_on_metadata             | OFF         |
| innodb_stats_persistent              | ON          |
| innodb_stats_persistent_sample_pages | 20          |
| innodb_stats_sample_pages            | 8           |
| innodb_stats_transient_sample_pages  | 8           |
+--------------------------------------+-------------+
8 rows in set (0.00 sec)

 

MySQL 8.0.18 有7個參數:緩存

 

mysql> show variables like 'innodb_stats%';
+--------------------------------------+-------------+
| Variable_name                        | Value       |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc             | ON          |
| innodb_stats_include_delete_marked   | OFF         |
| innodb_stats_method                  | nulls_equal |
| innodb_stats_on_metadata             | OFF         |
| innodb_stats_persistent              | ON          |
| innodb_stats_persistent_sample_pages | 20          |
| innodb_stats_transient_sample_pages  | 8           |
+--------------------------------------+-------------+

 

關於這些參數的功能,下面作了一個大概的整理、收集。服務器

 

 

參數名稱併發

參數意義app

innodb_stats_auto_recalc異步

是否自動觸發更新統計信息。當被修改的數據超過10%時就會觸發統計信息從新統計計算

innodb_stats_include_delete_marked

控制在從新計算統計信息時是否會考慮刪除標記的記錄。

innodb_stats_method

null值的統計方法

innodb_stats_on_metadata

操做元數據時是否觸發更新統計信息

innodb_stats_persistent

統計信息是否持久化

innodb_stats_sample_pages

不推薦使用,已經被innodb_stats_persistent_sample_pages替換

innodb_stats_persistent_sample_pages

持久化抽樣page

innodb_stats_transient_sample_pages

瞬時抽樣page

 

 

參數innodb_stats_auto_recalc

 

 

該參數innodb_stats_auto_recalc控制是否自動從新計算統計信息,當表中數據有大於10%被修改時就會從新計算統計信息(注意,因爲統計信息從新計算是在後臺發生,並且它是異步處理,這個可能存在延時,不會當即觸發,具體見下面介紹)。若是關閉了innodb_stats_auto_recalc,須要經過analyze table來保證統計信息的準確性。無論有沒有開啓全局變量innodb_stats_auto_recalc。即便innodb_stats_auto_recalc=OFF時,當新索引被增長到表中,全部索引的統計信息會被從新計算而且更新到innodb_index_stats表上。

 

 

 

下面驗證一下系統變量innodb_stats_auto_recalc=OFF時,建立索引時,會觸發該表全部索引從新統計計算。

 

mysql> set global innodb_stats_auto_recalc=off;
Query OK, 0 rows affected (0.00 sec)
 
mysql> show variables like 'innodb_stats_auto_recalc%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_stats_auto_recalc | OFF   |
+--------------------------+-------+
1 row in set (0.00 sec)
 
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | n_diff_pfx01 |          2 |           1 | DB_ROW_ID                         |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.00 sec)
 
mysql> create index ix_test_name on test(name);
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | n_diff_pfx01 |          2 |           1 | DB_ROW_ID                         |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | size         |          1 |        NULL | Number of pages in the index      |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | n_diff_pfx01 |          1 |           1 | name                              |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | n_diff_pfx02 |          2 |           1 | name,DB_ROW_ID                    |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.00 sec)

 

下面是我另一個測試,全局變量innodb_stats_auto_recalc=ON的狀況,修改表的屬性STATS_AUTO_RECALC=0,而後新建索引,測試驗證發現也會從新計算全部索引的統計信息。

 

mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_diff_pfx01 |          0 |           1 | id                                |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.01 sec)
 
mysql> ALTER TABLE test STATS_AUTO_RECALC=0;
Query OK, 0 rows affected (0.27 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_diff_pfx01 |          0 |           1 | id                                |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.00 sec)
 
mysql> CREATE INDEX ix_test_name ON test(name);
Query OK, 0 rows affected (1.41 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name   | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | PRIMARY      | 2019-10-30 15:54:22 | n_diff_pfx01 |          0 |           1 | id                                |
| MyDB          | test       | PRIMARY      | 2019-10-30 15:54:22 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | PRIMARY      | 2019-10-30 15:54:22 | size         |          1 |        NULL | Number of pages in the index      |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | n_diff_pfx01 |        999 |          17 | name                              |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | n_diff_pfx02 |        999 |          17 | name,id                           |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | n_leaf_pages |         17 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | size         |         18 |        NULL | Number of pages in the index      |
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.00 sec)
 
mysql> 

 

 

關於統計信息從新計算延時,官方的介紹以下:

 

Because of the asynchronous nature of automatic statistics recalculation, which occurs in the background, statistics may not be recalculated instantly after running a DML operation that affects more than 10% of a table, even when innodb_stats_auto_recalc is enabled. Statistics recalculation can be delayed by few seconds in some cases. If up-to-date statistics are required immediately, run ANALYZE TABLE to initiate a synchronous (foreground) recalculation of statistics

 

 

參數innodb_stats_include_delete_marked

 

從新計算統計信息時是否會考慮刪除標記的記錄.

innodb_stats_include_delete_marked can be enabled to ensure that delete-marked records are included when calculating persistent optimizer statistics.

 

網上有個關於innodb_stats_include_delete_marked的建議,以下所示,可是限於經驗沒法對這個建議鑑定真僞,我的以爲仍是選擇默認關閉,除非有特定場景真有這種需求。

 

·         innodb_stats_include_delete_marked建議設置開啓,這樣能夠針對未提交事務中刪除的數據也收集統計信息。

 

 

By default, InnoDB reads uncommitted data when calculating statistics. In the case of an uncommitted transaction that deletes rows from a table, delete-marked records are excluded when calculating row estimates and index statistics, which can lead to non-optimal execution plans for other transactions that are operating on the table concurrently using a transaction isolation level other than READ UNCOMMITTED. To avoid this scenario, innodb_stats_include_delete_marked can be enabled to ensure that delete-marked records are included when calculating persistent optimizer statistics.

When innodb_stats_include_delete_marked is enabled, ANALYZE TABLE considers delete-marked records when recalculating statistics.

innodb_stats_include_delete_marked is a global setting that affects all InnoDB tables, and it is only applicable to persistent optimizer statistics.

innodb_stats_include_delete_marked was introduced in MySQL 5.6.34.

 

 

 

   

 

 

 

參數innodb_stats_method

 

Specifies how InnoDB index statistics collection code should treat NULLs. Possible values are NULLS_EQUAL (default), NULLS_UNEQUAL and NULLS_IGNORED

 

·         當變量設置爲nulls_equal時,全部NULL值都被視爲相同(即,它們都造成一個 value group)

·         當變量設置爲nulls_unequal時,NULL值不被視爲相同。相反,每一個NULL value 造成一個單獨的 value group,大小爲 1

·         當變量設置爲nulls_ignored時,將忽略NULL值。

 

 

 

更多詳細信息,參考官方文檔InnoDB and MyISAM Index Statistics Collection,另外,還有一個系統變量myisam_stats_method控制MyISAM表對Null值的統計方法。

 

 

mysql> show variables like 'myisam_stat%';
+---------------------+---------------+
| Variable_name       | Value         |
+---------------------+---------------+
| myisam_stats_method | nulls_unequal |
+---------------------+---------------+
1 row in set (0.00 sec)

 

 

 

參數innodb_stats_on_metadata

 

 

參數innodb_stats_on_metadataMySQL 5.6.6以前的版本默認開啓(默認值爲O),每當查詢information_schema元數據庫裏的表時(例如,information_schema.TABLESinformation_schema.TABLE_CONSTRAINTS .... )或show table statusSHOW INDEX..這類操做時,Innodb還會隨機提取其餘數據庫每一個表索引頁的部分數據,從而更新information_schema.STATISTICS表,並返回剛纔查詢的結果。當你的表很大,且數量不少時,耗費的時間就很長,以至不少常常不訪問的數據也會進入Innodb_buffer_pool緩衝池中,形成池污染,關閉這個參數,能夠加快對於schema庫表訪問,同時也能夠改善查詢執行計劃的穩定性(對於Innodb表的訪問)。因此從MySQL 5.6.6這個版本開始,此參數默認爲OFF

 

注意僅當優化器統計信息配置爲非持久性時,此選項才生效。這個參數開啓的時候,InnoDB會更新非持久統計信息

 

 

官方文檔的介紹以下:

 

innodb_stats_on_metadata

Property

Value

Command-Line Format

--innodb-stats-on-metadata[={OFF|ON}]

System Variable

innodb_stats_on_metadata

Scope

Global

Dynamic

Yes

Type

Boolean

Default Value

OFF

 

This option only applies when optimizer statistics are configured to be non-persistent. Optimizer statistics are not persisted to disk when innodb_stats_persistent is disabled or when individual tables are created or altered with STATS_PERSISTENT=0. For more information, see Section 14.8.11.2, 「Configuring Non-Persistent Optimizer Statistics Parameters」.

 

When innodb_stats_on_metadata is enabled, InnoDB updates non-persistent statistics when metadata statements such as SHOW TABLE STATUS or when accessing the INFORMATION_SCHEMA.TABLES or INFORMATION_SCHEMA.STATISTICS tables. (These updates are similar to what happens for ANALYZE TABLE.) When disabled,InnoDB does not update statistics during these operations. Leaving the setting disabled can improve access speed for schemas that have a large number of tables or indexes. It can also improve the stability of execution plans for queries that involve InnoDB tables.

To change the setting, issue the statement SET GLOBAL innodb_stats_on_metadata=mode, where mode is either ON or OFF (or 1 or 0). Changing the setting requires privileges sufficient to set global system variables (see Section 5.1.8.1, 「System Variable Privileges」) and immediately affects the operation of all connections

 

 

參數innodb_stats_persistent

 

 

此參數控制統計信息是否持久化,若是此參數啓用,統計信息將會保存到mysql數據庫的innodb_table_statsinnodb_index_stats表中。從MySQL 5.6.6開始,MySQL默認使用持久化的統計信息,即默認INNODB_STATS_PERSISTENT=ON Persistent optimizer statistics were introduced in MySQL 5.6.2 and were made the default in MySQL 5.6.6置此參數以後咱們就不須要實時去收集統計信息了,由於實時收集統計信息在高併發下可能會形成必定的性能上影響,而且會致使執行計劃有所不一樣。

 

 

  另外,咱們可使用表的建表參數(STATS_PERSISTENT,STATS_AUTO_RECALC和STATS_SAMPLE_PAGES子句)來覆蓋系統變量設置的值,建表選項能夠在CREATE TABLE或ALTER TABLE語句中指定。表上面指定的參數會覆蓋全局變量,也就是說優先級要高於全局變量。例子以下:

 

 
mysql> ALTER TABLE test STATS_PERSISTENT=1;
Query OK, 0 rows affected (0.15 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> ALTER TABLE test STATS_AUTO_RECALC=0;
Query OK, 0 rows affected (0.27 sec)
Records: 0  Duplicates: 0  Warnings: 0

 

持久化統計新存儲在mysql.innodb_index_stats和mysql.innodb_table_stats中,這兩個表的定義以下:

 

 

innodb_table_stats

 

Column name

Description

database_name

數據庫名

table_name

表名,分區名或者子分區名

last_update

統計信息最後一次更新時間戳

n_rows

表中數據行數

clustered_index_size

彙集索引page個數

sum_of_other_index_sizes

非彙集索引page個數

 

innodb_index_stats

 

Column name

Description

database_name

數據庫名

table_name

表名,分區名或者子分區名

index_name

索引名

last_update

最後一次更新時間戳

stat_name

統計信息名

stat_value

統計信息不一樣值個數

sample_size

採樣page個數

stat_description

描述

 

 

 

非持久化(Non-persistent optimizer statistics) 存儲在內存裏,並在服務器關閉時丟失。某些業務和某些條件下也會按期更新統計數據。  注意,這裏保存在內存指保存在哪裏呢?

 

Optimizer statistics are not persisted to disk when innodb_stats_persistent=OFF or when individual tables are created or altered with STATS_PERSISTENT=0. Instead, statistics are stored in memory, and are lost when the server is shut down. Statistics are also updated periodically by certain operations and under certain conditions.

 

其實這裏指保存在內層表(MEMROY TABLE),下面有簡單介紹。

 

 

 

參數innodb_stats_persistent_sample_pages

 

若是參數innodb_stats_persistent設置爲ON,該參數表示ANALYZE TABLE更新Cardinality值時每次採樣頁的數量。默認值爲20個頁面。innodb_stats_persistent_sample_pages太少會致使統計信息不夠準確,太多會致使分析執行太慢。

 

咱們能夠在建立表的時候對不一樣的表指定不一樣的page數量、是否將統計信息持久化到磁盤上、是否自動收集統計信息,以下所示:

 

CREATE TABLE `test` (
`id` int(8) NOT NULL auto_increment,
`data` varchar(255),
`date` datetime,
P
PRIMARY KEY  (`id`),
I
INDEX `DATE_IX` (`date`)
) ENGINE=InnoDB,
  STATS_PERSISTENT=1,
  STATS_AUTO_RECALC=1,
  STATS_SAMPLE_PAGES=25;

 

 

參數innodb_stats_sample_pages 

 

 

已棄用. 已用innodb_stats_transient_sample_pages 替代。

 

 

參數innodb_stats_transient_sample_pages

 

 

innodb_stats_transient_sample_pages控制採樣pages個數,默認爲8Innodb_stats_transient_sample_pages能夠runtime設置

 

innodb_stats_transient_sample_pagesinnodb_stats_persistent=0的時候影響採樣。注意點:

 

1.若值過小,會致使評估不許

2.若果值太大,會致使disk read增長。

3.會生產很不一樣的執行計劃,由於統計信息不一樣。

 

 

還有一個參數information_schema_stats_expiry。這個參數的做用以下:

 

·         對於INFORMATION_SCHEMA下的STATISTICS表和TABLES表中的信息,8.0中經過緩存的方式,以提升查詢的性能。能夠經過設置information_schema_stats_expiry參數設置緩存數據的過時時間,默認是86400秒。查詢這兩張表的數據的時候,首先是到緩存中進行查詢,緩存中沒有緩存數據,或者緩存數據過時了,查詢會從存儲引擎中獲取最新的數據。若是須要獲取最新的數據,能夠經過設置information_schema_stats_expiry參數爲0或者ANALYZE TABLE操做

 

 

 

查看統計信息

 

 

統計信息分持久化(PERSISTENT)與非持久化統計數據(TRANSIENT),那麼它們存儲在哪裏呢?

 

 

·         持久化統計數據

 

        存儲在mysql.innodb_index_statsmysql.innodb_table_stats

 

·         非持久化統計數據

 

           MySQL 8.0以前,存儲在information_schema.INDEXESinformation_schema.TABLES中, 那麼MySQL8.0以後放在那裏呢? INFORMATION_SCHEMA.TABLESINFORMATION_SCHEMA.STATISTICSINNODB_INDEXES

 

 

       官方文檔非持久化統計信息放在內存中,其實就是內存表(MEMORY Table)中。

 

 

 

 

咱們能夠用下面腳本查看持久化統計信息信息,mysql.innodb_index_stats的數據如何看懂,要搞懂stat_namestat_value的具體含義:

 

 

select * from mysql.innodb_index_stats 
where table_name = 'test';
 
 
select * from mysql.innodb_index_stats 
where database_name='MyDB' and table_name = 'test';

 

 

 

 

stat_name=size時:stat_value表示索引的頁的數量(Number of pages in the index

 

stat_name=n_leaf_pages時:stat_value表示葉子節點的數量(Number of leaf pages in the index

 

stat_name=n_diff_pfxNN時:stat_value表示索引字段上惟一值的數量,此處作一下具體說明:

 

  *n_diff_pfxNN NN表明數字(例如: 0102等),當stat_namen_diff_pfxNN時,stat_value列值顯示索引的first column(即索引的最前索引列,從索引定義順序的第一個列開始)列的惟一值數量,例如: NN01時,stat_value列值就表示索引的第一個列的惟一值數量,當NN02時,stat_value列值就表示索引的第一和第二個列的組合惟一值數量,以此類推。 此外,在stat_name = n_diff_pfxNN的狀況下,stat_description列顯示一個以逗號分隔的計算索引統計信息列的列表。

 

 

 

MySQL的直方圖

 

 

MySQL 8.0推出了直方圖(histogram), 直方圖數據存放在information_schema.column_statistics這個系統表下,每行記錄對應一個字段的直方圖,以json格式保存。同時,新增了一個參數histogram_generation_max_mem_size來配置創建直方圖內存大小。

 

直方圖是數字數據分佈的準確表示。對於RDBMS,直方圖是特定列內數據分佈的近似值。

 

 

mysql> show variables like 'histogram_generation_max_mem_size';
+-----------------------------------+----------+
| Variable_name                     | Value    |
+-----------------------------------+----------+
| histogram_generation_max_mem_size | 20000000 |
+-----------------------------------+----------+
1 row in set (0.01 sec)
 
mysql> 
 
mysql> desc information_schema.column_statistics;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| SCHEMA_NAME | varchar(64) | NO   |     | NULL    |       |
| TABLE_NAME  | varchar(64) | NO   |     | NULL    |       |
| COLUMN_NAME | varchar(64) | NO   |     | NULL    |       |
| HISTOGRAM   | json        | NO   |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
 
mysql> 

 

 

MySQL的直方圖有兩種,等寬直方圖和等高直方圖。等寬直方圖每一個桶(bucket)保存一個值以及這個值累積頻率;等高直方圖每一個桶須要保存不一樣值的個數,上下限以及累計頻率等。MySQL會自動分配用哪一種類型的直方圖,有時候能夠經過設置合適Buckets數量來實現。?

 

 

 

建立刪除直方圖

 

 

直方圖數據會自動生成嗎? MySQL的直方圖比較特殊,不會在建立索引的時候自動生成直方圖數據,須要手工執行 ANALYZE TABLE [table] UPDATE HISTOGRAM .... 這樣的命令產生表上各列的直方圖,默認狀況下這些信息會被複制到備庫。

 

 

 

ANALYZE TABLE tbl_name UPDATE HISTOGRAM ON col_name [, col_name] WITH N BUCKETS;

ANALYZE TABLE tbl_name DROP HISTOGRAM ON col_name [, col_name];

 

ANALYZE TABLE test UPDATE HISTOGRAM ON create_date,name WITH 16 BUCKETS;

 

 

注意:可指定BUCKETS的值,也能夠不指定,它的取值範圍爲11024,若是不指定BUCKETS值的話,默認值是100

 

 

咱們測試以下,首先刪除全部的直方圖數據。而後使用下面SQL生成直方圖數據。

 

 

ANALYZE TABLE test UPDATE HISTOGRAM ON name;
 
SELECT SCHEMA_NAME
      ,TABLE_NAME
      ,COLUMN_NAME
   ,HISTOGRAM->>'$."data-type"' AS 'DATA-TYPE'
      ,HISTOGRAM->>'$."sampling-rate"'  AS SAMPLING_RATE
      ,HISTOGRAM->>'$."last-updated"' AS LAST_UPDATED
      ,HISTOGRAM->>'$."number-of-buckets-specified"' AS NUM_BUCKETS_SPECIFIED
      ,JSON_LENGTH(HISTOGRAM->>'$."buckets"') AS 'BUCKET-COUNT'
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE  TABLE_NAME = 'test';

 

 

clip_image001

 

 

其實不是全部默認的BUCKETS都是100,以下所示,若是我將記錄刪除,只剩下49條記錄,而後建立直方圖,你會看到BUCKETS的數量爲49,全部這個值還跟表的數據量有關係。若是數據量較大的話,默認是100

 

 

clip_image002

 

 

另外,以下測試所示,若是BUCKETS超過1024,就會報ERROR 1690 (22003): Number of buckets value is out of range in 'ANALYZE TABLE'

 

 

mysql> ANALYZE TABLE test UPDATE HISTOGRAM ON name WITH 1024 BUCKETS;
+-----------+-----------+----------+-------------------------------------------------+
| Table     | Op        | Msg_type | Msg_text                                        |
+-----------+-----------+----------+-------------------------------------------------+
| MyDB.test | histogram | status   | Histogram statistics created for column 'name'. |
+-----------+-----------+----------+-------------------------------------------------+
1 row in set (0.13 sec)
 
mysql> ANALYZE TABLE test UPDATE HISTOGRAM ON name WITH 1025 BUCKETS;
ERROR 1690 (22003): Number of buckets value is out of range in 'ANALYZE TABLE'
mysql> 

 

 

clip_image003

 

 

 

 

刪除刪除直方圖

 

 

 

--刪除字段上的統計直方圖信息

ANALYZE TABLE test DROP HISTOGRAM ON create_date

 

 

mysql> ANALYZE TABLE test DROP HISTOGRAM ON name;
+-----------+-----------+----------+-------------------------------------------------+
| Table     | Op        | Msg_type | Msg_text                                        |
+-----------+-----------+----------+-------------------------------------------------+
| MyDB.test | histogram | status   | Histogram statistics removed for column 'name'. |
+-----------+-----------+----------+-------------------------------------------------+
1 row in set (0.10 sec)

 

 

直方圖信息查看

 

 

    咱們知道直方圖的數據是以json格式保存的,直接將json格式展現出來,看起來很是不直觀。其實有一些SQL能夠解決這個問題。

 

 

SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) 
FROM information_schema.column_statistics 
WHERE TABLE_NAME='test'\G
 
 
SELECT SCHEMA_NAME
     ,TABLE_NAME
     ,COLUMN_NAME
     ,HISTOGRAM->>'$."data-type"' AS 'DATA-TYPE'
     ,HISTOGRAM->>'$."sampling-rate"'  AS SAMPLING_RATE
     ,HISTOGRAM->>'$."last-updated"' AS LAST_UPDATED
     ,HISTOGRAM->>'$."histogram-type"' AS HISTOGRAM_TYPE
     ,HISTOGRAM->>'$."number-of-buckets-specified"' AS NUM_BUCKETS_SPECIFIED
     ,JSON_LENGTH(HISTOGRAM->>'$."buckets"') AS 'BUCKET-COUNT'
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE  TABLE_NAME = 'test';
 
 
SELECT FROM_BASE64(SUBSTRING_INDEX(v, ':', -1)) value, concat(round(c*100,1),'%') cumulfreq, 
       CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq  
FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', 
     '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist  
WHERE schema_name  = 'MyDB' and table_name = 'test' and column_name = 'name';
 
 
 
SELECT v value, concat(round(c*100,1),'%') cumulfreq, 
       CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq  
FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', 
     '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist  
WHERE schema_name  = 'MyDB' and table_name = 'test' and column_name = 'name';

 

 

 

 

更新統計信息

 

非持久統計統計信息也會觸發自動更新,非持久化統計信息在如下狀況會被自動更新,官方文檔介紹以下:

 

Non-persistent optimizer statistics are updated when:
 
Running ANALYZE TABLE.
 
Running SHOW TABLE STATUS, SHOW INDEX, or querying the INFORMATION_SCHEMA.TABLES or INFORMATION_SCHEMA.STATISTICS tables with theinnodb_stats_on_metadata option enabled.
The default setting for innodb_stats_on_metadata is OFF. Enabling innodb_stats_on_metadata may reduce access speed for schemas that have a large number of tables or indexes, and reduce stability of execution plans for queries that involve InnoDB tables. innodb_stats_on_metadata is configured globally using a SETstatement.
SET GLOBAL innodb_stats_on_metadata=ON
Note
innodb_stats_on_metadata only applies when optimizer statistics are configured to be non-persistent (when innodb_stats_persistent is disabled).
 
Starting a mysql client with the --auto-rehash option enabled, which is the default. The auto-rehash option causes all InnoDB tables to be opened, and the open table operations cause statistics to be recalculated.
To improve the start up time of the mysql client and to updating statistics, you can turn off auto-rehash using the --disable-auto-rehash option. The auto-rehashfeature enables automatic name completion of database, table, and column names for interactive users.
 
A table is first opened.
 
InnoDB detects that 1 / 16 of table has been modified since the last time statistics were updated.

 

 

 簡單整理以下:

 

 

1 執行ANALYZE TABLE

 

2 innodb_stats_on_metadata=ON狀況下,執SHOW TABLE STATUS, SHOW INDEX, 查詢 INFORMATION_SCHEMA下的TABLES, STATISTICS

 

3 啓用--auto-rehash功能狀況下,使用mysql client登陸

 

4 表第一次被打開

 

5 距上一次更新統計信息,表1/16的數據被修改

 

 

持久統計信息的統計信息更新上面已經有介紹,還有一種方法就是手動更新統計信息,

 

 

 

一、手動更新統計信息,注意執行過程當中會加讀鎖:

 

ANALYZE TABLE TABLE_NAME;

 

二、若是更新後統計信息仍不許確,可考慮增長表採樣的數據頁,兩種方式能夠修改:

 

1) 全局變量INNODB_STATS_PERSISTENT_SAMPLE_PAGES,默認爲20;

 

2) 單個表能夠指定該表的採樣:

ALTER TABLE TABLE_NAME STATS_SAMPLE_PAGES=100;

 

經測試,此處STATS_SAMPLE_PAGES的最大值是65535,超出會報錯。

 

mysql> ALTER TABLE test STATS_SAMPLE_PAGES=65535;
 
Query OK, 0 rows affected (0.12 sec)
 
Records: 0  Duplicates: 0  Warnings: 0
 
 
 
mysql> ALTER TABLE test STATS_SAMPLE_PAGES=65536;
 
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '65536' at line 1
 
mysql>

 

 

 

參考資料:

 

https://dev.mysql.com/doc/refman/8.0/en/innodb-persistent-stats.html

https://dev.mysql.com/doc/refman/8.0/en/index-statistics.html

https://dev.mysql.com/doc/refman/8.0/en/innodb-performance-optimizer-statistics.html

https://www.percona.com/blog/2019/10/29/column-histograms-on-percona-server-and-mysql-8-0/  重點

http://chinaunix.net/uid-31396856-id-5787793.html

https://mysqlserverteam.com/histogram-statistics-in-mysql/

https://mp.weixin.qq.com/s/698g5lm9CWqbU0B_p0nLMw?

相關文章
相關標籤/搜索