有時候咱們會碰到這樣的狀況,在一個表中有不少重複的數據,處於某種須要,咱們須要(創建惟一索引等)清除重複的數據,重複的數據僅保留一行,這樣一來,咱們一般會想編寫一個sql語句來刪除重複的,而且保留一行,可是效率每每都不是很高。下面給一個比較通用且高效的方法,思路是:「把重複的(行或者某個字段重複,根據具體狀況)都選出一條存到一張臨時表,而後刪除原表中全部存在重複的行,再把臨時表的數據所有插入原表,這是比較通用而且高效的作法。好比有以下一張表,mysql
mysql> select * from a1;
+------+------+------+------+
| id | id_c | cd | b |
+------+------+------+------+
| 1 | v1 | bc | NULL |
| 1 | v2 | b3 | NULL |
| 1 | v3 | b4 | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v1 | bc | NULL |
+------+------+------+------+sql
顯然,重複的數據根據不一樣的須要可分爲:整條記錄重複,id 重複, id_c 重複,cd 重複,或者是b 重複ide
咱們這裏考慮 id_c 和 cd 重複的狀況下的演練操做spa
1,先查看查找出來的重複記錄有哪些,可見有兩行orm
mysql> select * from a1 group by cd having count(*) > 1;
+------+------+------+------+
| id | id_c | cd | b |
+------+------+------+------+
| 1 | v4 | bb | NULL |
| 1 | v1 | bc | NULL |
+------+------+------+------+
2 rows in set (0.00 sec)索引
2,建立臨時表it
mysql> create table a1_tmp like a1;
Query OK, 0 rows affected (0.09 sec)table
3,把重複的行存到臨時表中class
mysql> insert into a1_tmp select * from a1 group by cd having count(*) > 1;
Query OK, 2 rows affected (0.00 sec)效率
4,刪除源有重複記錄的行
#mysql> delete from a1 where exists (select cd from a1_tmp where a1.cd=a1_tmp.cd);
#Query OK, 6 rows affected (0.00 sec)
上面語句最好改爲 delete from a1 where cd in (select cd from a1_tmp where a1.cd=a1_tmp.cd);
避免數據被刪除清空表
PS:這刪以前,請先確認你要刪除的行是否是很臨時表的數據一致,不然錯刪,後果嚴重,如
mysql> select * from a1 where exists (select cd from a1_tmp where a1.cd=a1_tmp.cd);
+------+------+------+------+
| id | id_c | cd | b |
+------+------+------+------+
| 1 | v1 | bc | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v1 | bc | NULL |
+------+------+------+------+
6 rows in set (0.00 sec)
上面sql 的另一種寫法是,
select * from a1 where cd in (select cd from a1_tmp);
5,導入臨時表數據到源表
mysql> insert into a1 select * from a1_tmp;
Query OK, 2 rows affected (0.04 sec)
Records: 2 Duplicates: 0 Warnings: 0
6,確認數據正確且無重複
mysql> select * from a1;
+------+------+------+------+
| id | id_c | cd | b |
+------+------+------+------+
| 1 | v2 | b3 | NULL |
| 1 | v3 | b4 | NULL |
| 1 | v1 | bc | NULL |
| 1 | v4 | bb | NULL |+------+------+------+------+4 rows in set (0.00 sec)