對於有大量重複數據的表添加惟一索引

遇到如題的這麼一個場景:須要在MySQL的一張innodb引擎的表(tableA)上添加一個惟一索引(idx_col1_u)。可是表中已經有大量重複數據,對於每一個key(col1),有的重複2行,有的重複N行。php

此時,作數據的手工清理,或者SQL處理無疑是很是耗時的。html

 

1. Alter ignore table come to help

印象中MySQL有一個獨有的 alter ignore add unique index的語法。mysql

語法以下:sql

ALTER [ONLINE | OFFLINE] [IGNORE] TABLE tbl_name

 

行爲相似於insert ignore,即遇到衝突的unique數據則直接拋棄而不報錯。對於加惟一索引的狀況來講就是建一張空表,而後加上惟一索引,將老數據用insert ignore語法插入到新表中,遇到衝突則拋棄數據。ide

文檔中對於alter ignore的註釋:詳見:http://dev.mysql.com/doc/refman/5.1/en/alter-table.htmlthis

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.spa

 

2.  #1062 - Duplicate entry 

 然而在執行了 alter ignore table tableA add unique index idx_col1_u (col1) 後,仍是報瞭如下錯誤:code

 #1062 - Duplicate entry '111' for key 'col1'.orm

不是會自動丟棄重複數據麼?世界觀被顛覆了。查了下資料原來是alter ignore的語法不支持innodb。server

得知alter ignore的實現徹底取決於存儲引擎的內部實現,而不是server端強制的,具體描述以下:

For ALTER TABLE with the IGNORE keyword, IGNORE is now part of the
information provided to the storage engine. It is up to the storage
engine whether to use this when choosing between the in-place or copy
algorithm for altering the table. For InnoDB index operations, IGNORE 
is not used if the index is unique, so the copy algorithm is used

 詳見:http://bugs.mysql.com/bug.php?id=40344

 

3. 解決方案

固然解決這個問題的tricky的方法仍是有的,也比較直白粗暴。具體以下:

ALTER TABLE tableA ENGINE MyISAM;
ALTER IGNORE TABLE tableA ADD UNIQUE INDEX idx_col1_u (col1)
ALTER TABLE table ENGINE InnoDB;

 

updated in 2013-09-26:

@jyzhou 分享提到,能夠不用改爲MyISAM,而直接使用set old_alter_table = 1; 的方法。具體作法以下:

set old_alter_table = 1;

ALTER IGNORE TABLE tableA ADD UNIQUE INDEX idx_col1_u (col1) 

具體原理:http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_old_alter_table

相關文章
相關標籤/搜索