記得這是曾經面阿里的一個面試題,數據量億級別的數據,提高查詢效率怎麼去處理?mysql
這裏不講什麼導ES,導Hbase之類的, 只經過mysql來處理,這時候會用到分區表;面試
分區表怎麼個意思呢? 跟hive裏面的partition相似,可是功能沒有hive強大,有不少限制,mysql 5.01後開始支持,建立分區表後,使用show create table查看,能夠看到限制,5.01後纔開始支持;sql
在業務裏面不多用到分區表,這裏簡單記錄一下使用:數據庫
建立分區表:spa
插入數據,分佈有各個時間段的,每一個時間段2條記錄;blog
這時經過explain去查詢,查詢的時候已經用到分區了,掃描的不是全表的數據:事件
查看分區的數據:事務
mysql> SELECT PARTITION_NAME,TABLE_ROWS FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 'by_year' and TABLE_SCHEMA='db_name';
+----------------+------------+
| PARTITION_NAME | TABLE_ROWS |
+----------------+------------+
| P1 | 2 |
| P2 | 2 |
| P3 | 2 |
| P4 | 4 |
+----------------+------------+
4 rows in set (0.00 sec)it
恰好最近有個任務表須要進行分區,天天的任務量有幾十萬,進行查詢的時候基本上都會帶着時間字段,想把表修改爲分區表,按天進行分區;io
直接修改表爲分區表(若是數據較大的話,須要等待很長時間,我這一千多萬的數據,花了十幾分鍾,業務高峯慎重。。。,個人線上操做是歷史數據能夠刪掉一些不用的,而後再進行修改分區表操做)
ALTER TABLE tasks_table PARTITION BY RANGE (TO_DAYS(start_time))
(
PARTITION p20180401 VALUES LESS THAN (TO_DAYS('2018-04-01')),
PARTITION p20180402 VALUES LESS THAN (TO_DAYS('2018-04-02')),
PARTITION p20180403 VALUES LESS THAN (TO_DAYS('2018-04-03')),
PARTITION p20180404 VALUES LESS THAN (TO_DAYS('2018-04-04')),
PARTITION p20180405 VALUES LESS THAN (TO_DAYS('2018-04-05')),
PARTITION p20180406 VALUES LESS THAN (TO_DAYS('2018-04-06')),
PARTITION p20180407 VALUES LESS THAN (TO_DAYS('2018-04-07')),
PARTITION p20180408 VALUES LESS THAN (TO_DAYS('2018-04-08')),
PARTITION p20180409 VALUES LESS THAN (TO_DAYS('2018-04-09')),
PARTITION p20180410 VALUES LESS THAN (TO_DAYS('2018-04-10'))
)
出現錯誤
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function
---由於分區鍵不是主鍵的其中一個組成部分而報錯
alter table tasks_table drop primary key,add primary key(task_id,start_time);
再次執行分區修改,執行完後查看分區,以下:
MariaDB [test_db]> SELECT PARTITION_NAME,TABLE_ROWS FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 'tasks_table' and TABLE_SCHEMA='test_db';
+----------------+------------+
| PARTITION_NAME | TABLE_ROWS |
+----------------+------------+
| p20180401 | 16746594 |
| p20180402 | 2808 |
| p20180403 | 2808 |
| p20180404 | 2808 |
| p20180405 | 5001 |
| p20180406 | 0 |
| p20180407 | 0 |
| p20180408 | 0 |
| p20180409 | 0 |
| p20180410 | 0 |
+----------------+------------+
10 rows in set (0.00 sec)
接下來就是按天分區了,想讓他自動的分區,可是mysql不支持自動分區,只能另想它招:
1.經過腳本生成一堆add partition的語句,這種操做比較憨厚,並且若是時間長後,有可能忘記添加分區了;
2.經過mysql的存儲過程加events操做;
DELIMITER $$
USE `root`$$
DROP PROCEDURE IF EXISTS `create_Partition_tasks`$$
CREATE DEFINER=`root`@`%` PROCEDURE `create_Partition_tasks`()
BEGIN
/* 事務回滾,其實放這裏沒什麼做用,ALTER TABLE是隱式提交,回滾不了的。*/
DECLARE EXIT HANDLER FOR SQLEXCEPTION ROLLBACK;
START TRANSACTION;
/* 到系統表查出這個表的最大分區,獲得最大分區的日期。在建立分區的時候,名稱就以日期格式存放,方便後面維護 */
SELECT REPLACE(partition_name,'p','') INTO @P12_Name FROM INFORMATION_SCHEMA.PARTITIONS
WHERE table_name='tasks_table' and TABLE_SCHEMA='test_db' ORDER BY partition_ordinal_position DESC LIMIT 1;
SET @Max_date= DATE(DATE_ADD(@P12_Name+0, INTERVAL 1 DAY))+0;
/* 修改表,在最大分區的後面增長一個分區,時間範圍加1天 */
SET @s1=CONCAT('ALTER TABLE tasks_table ADD PARTITION (PARTITION p',@Max_date,' VALUES LESS THAN (TO_DAYS (''',DATE(@Max_date),''')))');
/* 輸出查看增長分區語句*/
SELECT @s1;
PREPARE stmt2 FROM @s1;
EXECUTE stmt2;
DEALLOCATE PREPARE stmt2;
/* 取出最小的分區的名稱,並刪除掉 。
注意:刪除分區會同時刪除分區內的數據,慎重 */
/*select partition_name into @P0_Name from INFORMATION_SCHEMA.PARTITIONS
where table_name='tasks_table' order by partition_ordinal_position limit 1;
SET @s=concat('ALTER TABLE tasks_table DROP PARTITION ',@P0_Name);
PREPARE stmt1 FROM @s;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1; */
/* 提交 */
COMMIT ;
END$$
DELIMITER ;
建立事件event
DELIMITER ||
CREATE EVENT Partition_tasks_event
ON SCHEDULE
EVERY 1 day STARTS '2018-04-10 12:25:59'
DO
BEGIN
CALL create_Partition_tasks;
END ||
DELIMITER ;
這時候有一點須要注意,就是數據庫的event_scheduler須要打開
經過命令
show variables like 'event_scheduler%' 查看是否打開;
若是沒有,經過下面命令打開
set global event_scheduler = ON;
大功告成,查詢的效率會大大大的提高;
檢驗一下:
MariaDB [test_db]> explain partitions select * from task_table where start_time<'2018-04-05' and start_time>'2018-04-03';
+------+-------------+-------+---------------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+---------------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | task_table | p20180404,p20180405 | ALL | NULL | NULL | NULL | NULL | 7809 | Using where |
+------+-------------+-------+---------------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)