MYSQL數據去重與外表填充

常常要對數據庫中的數據進行去重,有時還須要使用外部表填衝數據,本文檔記錄數據去重與外表填充數據。
date:2016/8/17
author:wangxl數據庫

1 需求

對user_info1表去重,並添加age項。code

2 表數據

user_info1:
+----+----------+------+------+
| id | name     | sex  | age  |
+----+----------+------+------+
|  1 | xiaolong | 1    | NULL |
|  2 | xiaoyun  | 1    | NULL |
|  3 | xiaoqin  | 2    | NULL |
|  4 | xiaolong | 1    | NULL |
|  5 | xiaodong | 1    | NULL |
|  6 | xiaokai  | 1    | NULL |
|  7 | xiaohong | 2    | NULL |
|  8 | xiaolong | 1    | NULL |
|  9 | xiaohong | 2    | NULL |
| 10 | xiaofen  | 2    | NULL |
+----+----------+------+------+

user_info2:
+----------+------+
| name     | age  |
+----------+------+
| xiaolong |   26 |
| xiaoyun  |   28 |
| xiaoqin  |   27 |
| xiaodong |   27 |
| xiaokai  |   27 |
| xiaohong |   24 |
| xiaofen  |   22 |
+----------+------+

3 實戰

3.1 去重

(1) 找出有重複字段
    select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1);
(2) 找出要刪除的記錄,重複記錄是根據單個字段(name)來判斷,只留有id最小的記錄
    select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1);
(3) 刪除表中多餘的重複記錄
    delete from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1);

    報錯:ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause

更換思路:找出每組中非最小id並刪除,以下:ci

(4) 找出每組最小id
    select min(id) from user_info1 group by name
(5) 找出每組非最小id
    select * from user_info1 where id not in (select min(id) from user_info1 group by name);
(6) 刪除每組中非最小id所在行
    delete from user_info1 where id not in (select id from select min(id) from user_info1 group by name);
    ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause
    更正:
    delete from user_info1 where id not in (select minid from (select min(id) as minid from user_info1 group by name) a);、

結果展現:
+----+----------+------+------+
| id | name     | sex  | age  |
+----+----------+------+------+
| 1  | xiaolong | 1    | NULL |
| 2  | xiaoyun  | 1    | NULL |
| 3  | xiaoqin  | 2    | NULL |
| 5  | xiaodong | 1    | NULL |
| 6  | xiaokai  | 1    | NULL |
| 7  | xiaohong | 2    | NULL |
| 10 | xiaofen  | 2    | NULL |
+----+----------+------+------+

對於沒有primary key的話,怎麼去重呢?文檔

(7) 建立表test
(8) insert into test select distinct(name),sex,age from user_info1 group by name; 

暫時沒想出一句話解決方案.

3.2 外表插入

update user_info1 t set age=(select age from user_info2 where name=t.name);
結果以下:
+----+----------+------+------+
| id | name     | sex  | age  |
+----+----------+------+------+
| 1  | xiaolong | 1    |   26 |
| 2  | xiaoyun  | 1    |   28 |
| 3  | xiaoqin  | 2    |   27 |
| 5  | xiaodong | 1    |   27 |
| 6  | xiaokai  | 1    |   27 |
| 7  | xiaohong | 2    |   24 |
| 10 | xiaofen  | 2    |   22 |
+----+----------+------+------+
相關文章
相關標籤/搜索