MySQL count知多少

時間 2020-04-05

標籤 mysql count 多少欄目 MySQL 简体版

原文原文鏈接

統計一個表的數據量是常常遇到的需求，可是不一樣的表設計及不一樣的寫法，統計性能差異會有較大的差別，下面就簡單經過實驗進行測試(你們測試的時候注意緩存的狀況，不然影響測試結果）。mysql

一、準備工做

爲了後續測試工做的進行，先準備幾張用於測試的表及數據，爲了使測試數據具備參考意義，建議測試表的數據量大一點，以避免查詢時間過小，所以，能夠繼續使用以前經常使用的連續數生成大法，以下：sql

/* 建立連續數表 */
CREATE TABLE nums(id INT primary key);

/* 生成連續數的存儲過程,優化事後的 */
DELIMITER $$
CREATE  PROCEDURE `sp_createNum`(cnt INT )
BEGIN
    DECLARE i INT  DEFAULT 1;
    TRUNCATE TABLE nums;
    INSERT INTO nums SELECT i;
    WHILE i < cnt DO
      BEGIN
        INSERT INTO nums SELECT id + i FROM nums WHERE id + i<=cnt;
        SET i = i*2;
      END;
    END WHILE;
END$$

DELIMITER ;

生成數據，本次準備生成1kw條記錄數據庫

/* 調用存儲過程 */
mysql> call sp_createNum(10000000);
Query OK, 1611392 rows affected (32.07 sec)

若是逐條循環，那時間至關長，你們能夠自行測試，參考連接效率提高16800倍的連續整數生成方法緩存

1.1 建立innodb表

生成3張表innodb表，以下：微信

nums_1表只有字符串主鍵字段函數

/*  生成只有一個字符串類型字段主鍵的表nums_1 */
mysql> create table  nums_1 (p1 varchar(32) primary key ) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

/*  導入數據,將id經過md5函數轉換爲字符串 */
mysql> insert into  nums_1 select md5(id) from nums;
Query OK, 10000000 rows affected (1 min 12.63 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

nums_2表有5個字段，其中主鍵爲字符串類型字段的p1，其餘字段爲整型的id，非空的c1,可爲空的c2,可爲空的c3。性能

其中c1,c2字段內容徹底一致，差異是字段約束不同（c1不可爲空，c2可爲空），c3與c1,c2的差異在於c1中aa開頭的值在c3中爲null,其餘內容同樣。測試

/* 建立表nums_2 */
mysql> create table nums_2(p1 varchar(32) primary key ,id int ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb;
Query OK, 0 rows affected (1.03 sec)

/*導入數據 */
mysql> insert into  nums_2(id,p1,c1,c2,c3) select id,md5(id),left(md5(id),10),left(md5(id),10),if(,left(md5(id),10) like 'aa%',null,,left(md5(id),10)) from nums;
Query OK, 10000000 rows affected (5 min 6.68 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

nums_3表的內容與nums_2徹底同樣，區別在於主鍵字段不同，c3表爲整型的id優化

/*  建立表nums_3 */
mysql> create table nums_3(p1 varchar(32) ,id int primary key  ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

/* 由於內容徹底一致，直接從nums_2 中導入 */
mysql> insert into nums_3 select  * from nums_2;
Query OK, 10000000 rows affected (3 min 18.81 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

1.2 建立MyISAM引擎表

再建立一張MyISAM的表，表結構及內容均與nums_2也一致，只是引擎爲MyISAM。spa

/* 建立MyISAM引擎的nums_4表*/
mysql> create table nums_4(p1 varchar(32) not null  primary key ,id int  ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=MyISAM;
Query OK, 0 rows affected (0.00 sec)

/* 直接從nums_2表導入數據 */
mysql> insert into nums_4 select  * from nums_2;
Query OK, 10000000 rows affected (3 min 16.78 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

二、查詢一張表數據量的方法

查詢一張表的數據量有以下幾種：

查詢大體數據量，能夠查統計信息，2.1中會介紹具體方法

精確查找數據量，則能夠經過count(主鍵字段），count(*), count(1) [這裏的1能夠替換爲任意常量]

2.1 非精確查詢

若是隻是查一張表大體有多少數據，尤爲是很大的表只是查詢其表屬於什麼量級的（百萬、千萬仍是上億條），能夠直接查詢統計信息，查詢方式有以下幾種：

查詢索引信息，其中Cardinality 爲大體數據量（查看主鍵PRIMARY行的值，若是爲多列的複合主鍵，則查看最後一列的Cardinality 值）

mysql> show index from nums_2;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| nums_2 |          0 | PRIMARY  |            1 | p1          | A         |     9936693 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)

查看錶狀態，其中Rows爲大體數據量

mysql> show table status like  'nums_2';
+--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| Name   | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment |
+--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| nums_2 | InnoDB |      10 | Dynamic    | 9936693 |            111 |  1105182720 |               0 |   2250178560 |   4194304 |           NULL | 2020-04-04 19:31:34 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
+--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
1 row in set (0.00 sec)

直接查看STATISTICS或TABLES表，內容與查看索引信息或表狀態相似，其中TABLE_ROWS的內容爲大體的數據量

mysql> select   * from  information_schema.tables where table_schema='testdb' and table_name like  'nums_2';
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def           | testdb       | nums_2     | BASE TABLE | InnoDB |      10 | Dynamic    |    9936693 |            111 |  1105182720 |               0 |   2250178560 |   4194304 |           NULL | 2020-04-04 19:31:34 | NULL        | NULL       | utf8_general_ci |     NULL |                |               |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
1 row in set (0.00 sec)

注意：

innodb引發的表經過以上3種方式都可查詢對應表的大體數據量，且結果相同，由於均是取自相同的統計信息
MyISAM表的結果是精確值（表數據量，不包含其餘字段）

mysql> select   * from  information_schema.tables where table_schema='testdb' and table_name like  'nums_4';
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME         | CHECK_TIME          | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
| def           | testdb       | nums_4     | BASE TABLE | MyISAM |      10 | Dynamic    |   10000000 |             75 |   759686336 | 281474976710655 |    854995968 |         0 |           NULL | 2020-04-04 19:20:23 | 2020-04-04 19:21:45 | 2020-04-04 19:23:45 | utf8_general_ci |     NULL |                |               |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
1 row in set (0.00 sec)

2.2 精確查找

由於2.1中innodb的表查詢的結果都是統計值，非準備值，實際工做中大多數狀況下須要統計精確值，那麼查詢精確值的方法有以下幾種，且全部引擎的表都適用。

count(主鍵）

mysql> select count(p1) from nums_2;
+-----------+
| count(p1) |
+-----------+
|  10000000 |
+-----------+
1 row in set (1.60 sec)

count(1)

其中的1能夠是任意常量，例如 count(2),count('a‘）等

mysql> select count(1) from nums_2;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.45 sec)

count(*)

mysql> select count(*) from nums_2;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1.52 sec)

三、 count的性能對比

對比 count(主鍵） count(1) count(*) count（非空字段） count（可爲空字段）性能對比

3.1 MyISAM引擎表

3.1.1 查詢整張表數據量

若是想精確查詢一張MyISAM表的數據量，使用 count(主鍵） count(1) count(*) 效率均一致，直接查出準確結果，耗時幾乎爲0s

mysql> select count(p1) from nums_4;
+-----------+
| count(p1) |
+-----------+
|  10000000 |
+-----------+
1 row in set (0.00 sec)

mysql> select count(1) from nums_4;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from nums_4;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.00 sec)

執行計劃也均一致，能夠看出沒有經過主鍵或其餘索引掃描的方式統計

mysql> explain select count(*) from nums_4;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(p1) from nums_4;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(1) from nums_4;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

小結：

MyISAM的方法查整表數據量效率狀況爲 count(主鍵）= count(1) = count(*)

3.1.2 查詢部分數據

查詢部分數據的時候則沒法直接從統計信息獲取，所以耗時狀況大體以下：

mysql> select count(p1) from nums_4 where  p1 like 'aa%';
+-----------+
| count(p1) |
+-----------+
|     39208 |
+-----------+
1 row in set (0.14 sec)

mysql> select count(1) from nums_4 where  p1 like 'aa%';
+----------+
| count(1) |
+----------+
|    39208 |
+----------+
1 row in set (0.13 sec)

mysql> select count(*) from nums_4 where p1 like 'aa%';
+----------+
| count(*) |
+----------+
| 39208 |
+----------+
1 row in set (0.13 sec)

執行計劃其實均同樣：

mysql> explain select count(1) from nums_4 where  p1 like 'aa%';
+----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | nums_4 | NULL       | range | PRIMARY       | PRIMARY | 98      | NULL | 42603 |   100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

小結： MyISAM引擎表統計部分數據的時候直接得出數據量，也許掃描數據進行統計，幾種寫法效率相近。

3.2 innodb引擎表

innodb引擎由於要支持MVCC，所以不能整表數據量持久化保存，每次查詢均需遍歷統計，可是不一樣的寫法，查詢效率是有差異的，後面將進行不一樣維度進行對比。

3.2.1 不一樣寫法的性能對比

經過 count(主鍵),count(1) , count(*) 對比查詢效率

mysql> select count(p1) from nums_2  ;
+-----------+
| count(p1) |
+-----------+
|  10000000 |
+-----------+
1 row in set (1.68 sec)

mysql> select count(1) from nums_2  ;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.37 sec)

mysql> select count(*) from nums_2  ;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1.38 sec)

簡單的對比發現，查詢性能結果爲 count(主鍵) < count(1) ≈ count(*)

可是查看執行計劃都是以下狀況

mysql> explain select count(p1) from nums_2;
+----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | nums_2 | NULL       | index | NULL          | PRIMARY | 98      | NULL | 9936693 |   100.00 | Using index |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec

可是查詢效率不同，緣由在於統計的方式不同，以下：

count(主鍵)：innodb引擎根據對應的索引遍歷整張表，把每一行的主鍵值都取出來，返回給 server 層。server 層拿到主鍵字段後，判斷是不爲空的（此處其實能夠優化），就按行累加。
count(1)：也是遍歷整張表，由於每行的結果都是1（非空），因此能夠直接計數，無需判斷是否爲空。
count(*): innodb引擎作了優化處理的，此種方式和count(1)相似，直接按行累計統計

3.2.2 主鍵字段類型不一樣性能對比

nums_2與nums_3內容相同，區別在於num_3的主鍵字段是整型的id字段，如今對比主鍵字段不一樣時查詢性能的差異，

mysql> select /* SQL_NO_CACHE */count(1) from nums_2;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (2.02 sec)

mysql> select /* SQL_NO_CACHE */count(1) from nums_3;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.69 sec)

測試發現，相同內容數據的表表主鍵不一樣，性能存在差別，且，查詢時主鍵（索引）字段類型小的時候查詢效率更好。

注：若是nums_2的id字段上添加索引後，會發現查詢會走id的索引，緣由在於主鍵索引（彙集索引）的類型是varchar(32),而id是int,索引的大小不同，走整型的索引IO開銷會少。

所以，建議MySQL的主鍵使用自增id做爲主鍵（優點不只在數據統計上，有機會在講解）。

3.2.3 表大小不一樣的對比

準備工做中的nums_1 與nums_3差異在於主鍵都是整型的id 可是nums_3的字段更多，也就是說表更大，查詢效率對好比下：

mysql> select /* SQL_NO_CACHE */count(1) from nums_1;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.61 sec)

mysql> select /* SQL_NO_CACHE */count(1) from nums_3;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.67 sec)

查詢時間僅供參考，取決於機器性能。

因而可知表大小不一樣，查詢效率也不一樣，表越小查詢效率越高。

3.2.4 count(普通字段）

由於nums_3表的c2字段容許爲空，可是內容均不爲空，c3字段容許爲空，可是存在內容爲空的狀況。如今將nums_3表的c2,c3字段分別統計，查看結果（先添加索引，提升查詢性能）

mysql> select  count(c2) from  nums_3 ;
+-----------+
| count(c2) |
+-----------+
|  10000000 |
+-----------+
1 row in set (1.69 sec)

mysql> select  count(c3) from  nums_3 ;
+-----------+
| count(c3) |
+-----------+
|   9960792 |
+-----------+
1 row in set (1.73 sec)