insert select (製造百萬條記錄)html
在開始百萬級數據的查詢以前,本身先動手製造百萬級的記錄來供咱們使用,使用的方法是insert select方法mysql
INSERT 通常用來給表插入一個指定列值的行。可是,INSERT 還存在另外一種形式,能夠利用它將一條SELECT 語句的結果插入表中。這就是所謂的INSERT SELECT, 顧名思義,它是有一條INSERT語句和一條SELECT語句組成的。sql
如今,有一個warning_reparied表,有2447條記錄,以下:緩存
mysql> select count(*) from warning_repaired; +----------+ | count(*) | +----------+ | 2447 | +----------+ 1 row in set (0.00 sec) mysql>
使用這個warning_repaired表建立出一個百萬級數量的表:服務器
首先,建立一個新表warning_repaired1,併發
mysql> CREATE TABLE `warning_repaired1` ( -> `id` int(11) NOT NULL AUTO_INCREMENT, -> `device_moid` varchar(36) NOT NULL, -> `device_name` varchar(128) DEFAULT NULL, -> `device_type` varchar(36) DEFAULT NULL, -> `device_ip` varchar(128) DEFAULT NULL, -> `warning_type` enum('0','1','2') NOT NULL, -> `domain_moid` varchar(36) NOT NULL, -> `domain_name` varchar(128) DEFAULT NULL, -> `code` smallint(6) NOT NULL, -> `level` varchar(16) NOT NULL, -> `description` varchar(128) DEFAULT NULL, -> `start_time` datetime NOT NULL, -> `resolve_time` datetime NOT NULL, -> PRIMARY KEY (`id`), -> UNIQUE KEY `id` (`id`) -> ) ENGINE=InnoDB AUTO_INCREMENT=4895 DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.39 sec) mysql> select count(*) from warning_repaired1; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from warning_repaired; +----------+ | count(*) | +----------+ | 2447 | +----------+ 1 row in set (0.00 sec) mysql>
其次,使用insert select語句插入把warning_repaired中的記錄插入到warning_repaired1表中:dom
mysql> insert into warning_repaired1(device_moid, device_name, device_type, device_ip, warning_type, domain_moid, domain_name, code, level, description, start_time, resolve_time) select device_moid, device_name, device_type, device_ip, warning_type, domain_moid, domain_name, code, level, description, start_time, resolve_time from warning_repaired; Query OK, 2447 rows affected (1.07 sec) Records: 2447 Duplicates: 0 Warnings: 0 mysql> select count(*) from warning_repaired; +----------+ | count(*) | +----------+ | 2447 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from warning_repaired1; +----------+ | count(*) | +----------+ | 2447 | +----------+ 1 row in set (0.00 sec)
插入成功後,把INSERT SELECT語句用的查詢表也改成warning_repaired1,以下:高併發
insert into warning_repaired1(device_moid, device_name, device_type, device_ip, warning_type, domain_moid, domain_name, code, level, description, start_time, resolve_time) select device_moid, device_name, device_type, device_ip, warning_type, domain_moid, domain_name, code, level, description, start_time, resolve_time from warning_repaired1;
這樣多運行幾回(記錄指數級的增加)就能夠很快的製造出百萬條的記錄了。性能
最多見MYSQL 最基本的分頁方式limit
優化
mysql> select count(*) from warning_repaired; +----------+ | count(*) | +----------+ | 2447 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from warning_repaired5; +----------+ | count(*) | +----------+ | 5990256 | +----------+ 1 row in set (10.11 sec) mysql> select code,level,description from warning_repaired5 limit 1000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 註冊GK失敗 | | 1006 | critical | 註冊GK失敗 | +------+----------+----------------+ 2 rows in set (0.00 sec) mysql> select code,level,description from warning_repaired5 limit 10000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 註冊GK失敗 | | 1006 | critical | 註冊GK失敗 | +------+----------+----------------+ 2 rows in set (0.05 sec) mysql> select code,level,description from warning_repaired5 limit 100000,2; +------+----------+------------------------------------------------------+ | code | level | description | +------+----------+------------------------------------------------------+ | 2003 | critical | 服務器內存5分鐘內平均使用率超過閾值 | | 2019 | critical | 網卡的吞吐量超閾值 | +------+----------+------------------------------------------------------+ 2 rows in set (0.26 sec) mysql> select code,level,description from warning_repaired5 limit 1000000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 註冊GK失敗 | | 1006 | critical | 註冊GK失敗 | +------+----------+----------------+ 2 rows in set (1.56 sec) mysql> select code,level,description from warning_repaired5 limit 5000000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 註冊GK失敗 | | 1006 | critical | 註冊GK失敗 | +------+----------+----------------+ 2 rows in set (7.15 sec) mysql>
在不超過100萬條記錄時,能夠看出花費的時間仍是比較少。因此在中小數量的狀況下,這樣的SQL足夠用了,惟一須要注意的問題就是確保使用了索引。可是隨着數據量的增長,頁數會愈來愈多,在數據慢慢增加的過程當中,可能出現limit 5000000,2這樣的狀況,limit 5000000,2的意思是掃描知足條件的l5000002行,扔掉前面的5000000行,返回最後的2行,問題就在這裏,若是limit 5000000,2,須要掃描5000002行,在一個高併發的應用裏,每次查詢須要掃描超過500w行,性能確定大打折扣。
這種方式有幾個不足: 較大的偏移(OFFSET)會增長結果集,小比例的低效分頁足夠產生磁盤I/O瓶頸,須要掃描的行多。
簡單的解決辦法: 不顯示記錄總數,沒用戶在意這個數字;不讓用戶訪問頁數比較大的記錄,重定向他們;避免count(*),不顯示總數,讓用戶經過"下一頁"來翻頁,緩存總數;單獨統計總數,在插入和刪除時遞增/遞減。
mysql> select code,level,description from warning_repaired5 limit 5000000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 註冊GK失敗 | | 1006 | critical | 註冊GK失敗 | +------+----------+----------------+ 2 rows in set (2.98 sec) mysql> select code,level,description from warning_repaired5 order by id desc limit 5000000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 註冊GK失敗 | | 1006 | critical | 註冊GK失敗 | +------+----------+----------------+ 2 rows in set (8.04 sec)
從上面能夠看出再加了order by id desc後,花費的時間又增加了。
第二種就是分表,計算HASH值,這兒不作介紹。
第三種:偏移
mysql> select code,level,description from warning_repaired5 order by id desc limit 5000000,20; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 註冊GK失敗 | …… | 1006 | critical | 註冊GK失敗 | +------+----------+----------------+ 20 rows in set (4.77 sec) mysql> select code,level,description from warning_repaired5 where id <=( select id from warning_repaired5 order by id desc limit 5000000,1) order by id desc limit 20; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 註冊GK失敗 | | 1006 | critical | 註冊GK失敗 | …… | 1006 | critical | 註冊GK失敗 | +------+----------+----------------+ 20 rows in set (4.26 sec)
能夠看出時間相對第一種少了一點。
總體來講在面對百萬級數據的時候若是使用上面第三種方法來優化,系統性能上是可以獲得很好的提高,在遇到複雜的查詢時也儘可能簡化,減小運算量。 同時也儘可能多的使用內存緩存,有條件的能夠考慮分表、分庫、陣列之類的大型解決方案了。
參考文章: