常常要對數據庫中的數據進行去重,有時還須要使用外部表填衝數據,本文檔記錄數據去重與外表填充數據。
date:2016/8/17
author:wangxl數據庫
對user_info1表去重,並添加age項。code
user_info1: +----+----------+------+------+ | id | name | sex | age | +----+----------+------+------+ | 1 | xiaolong | 1 | NULL | | 2 | xiaoyun | 1 | NULL | | 3 | xiaoqin | 2 | NULL | | 4 | xiaolong | 1 | NULL | | 5 | xiaodong | 1 | NULL | | 6 | xiaokai | 1 | NULL | | 7 | xiaohong | 2 | NULL | | 8 | xiaolong | 1 | NULL | | 9 | xiaohong | 2 | NULL | | 10 | xiaofen | 2 | NULL | +----+----------+------+------+ user_info2: +----------+------+ | name | age | +----------+------+ | xiaolong | 26 | | xiaoyun | 28 | | xiaoqin | 27 | | xiaodong | 27 | | xiaokai | 27 | | xiaohong | 24 | | xiaofen | 22 | +----------+------+
(1) 找出有重複字段 select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1); (2) 找出要刪除的記錄,重複記錄是根據單個字段(name)來判斷,只留有id最小的記錄 select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1); (3) 刪除表中多餘的重複記錄 delete from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1); 報錯:ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause
更換思路:找出每組中非最小id並刪除,以下:ci
(4) 找出每組最小id select min(id) from user_info1 group by name (5) 找出每組非最小id select * from user_info1 where id not in (select min(id) from user_info1 group by name); (6) 刪除每組中非最小id所在行 delete from user_info1 where id not in (select id from select min(id) from user_info1 group by name); ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause 更正: delete from user_info1 where id not in (select minid from (select min(id) as minid from user_info1 group by name) a);、 結果展現: +----+----------+------+------+ | id | name | sex | age | +----+----------+------+------+ | 1 | xiaolong | 1 | NULL | | 2 | xiaoyun | 1 | NULL | | 3 | xiaoqin | 2 | NULL | | 5 | xiaodong | 1 | NULL | | 6 | xiaokai | 1 | NULL | | 7 | xiaohong | 2 | NULL | | 10 | xiaofen | 2 | NULL | +----+----------+------+------+
對於沒有primary key的話,怎麼去重呢?文檔
(7) 建立表test (8) insert into test select distinct(name),sex,age from user_info1 group by name; 暫時沒想出一句話解決方案.
update user_info1 t set age=(select age from user_info2 where name=t.name); 結果以下: +----+----------+------+------+ | id | name | sex | age | +----+----------+------+------+ | 1 | xiaolong | 1 | 26 | | 2 | xiaoyun | 1 | 28 | | 3 | xiaoqin | 2 | 27 | | 5 | xiaodong | 1 | 27 | | 6 | xiaokai | 1 | 27 | | 7 | xiaohong | 2 | 24 | | 10 | xiaofen | 2 | 22 | +----+----------+------+------+