MySQL數據表合併去重

場景:

爬取的數據生成數據表,結構與另外一個主表相同,須要進行合併+去重mysql

解決:(直接舉例)

  • 首先建立兩個表pep,pep2,其中pep是主表sql

    CREATE TABLE IF NOT EXISTS `pep/pep2`(
    `id` INT UNSIGNED AUTO_INCREMENT,
    `no` VARCHAR(100) NOT NULL,
    PRIMARY KEY ( `id` )
    )ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • 而後向pep中插入兩條數據,pep2中插入一條與pep中相同的一條數據session

    insert into pep(no) values('abc');
    insert into pep(no) values('caa');
    
    insert into pep2(no) values('abc');
  • 將pep2的數據插入pep中this

    insert into pep (no) select no from pep2;
  • 分組去重建立新的臨時表tmpcode

    create table tmp select id,no from pep group by no;

    注意:建立完這個表的id字段類型已經不是主鍵自增索引

    可能也會報錯
     ```Syntax error or access violation: 1055 Expression #1 of SELECT 
     list is not in GROUP BY clause and contains nonaggregated 
     column 'XXX.Y.ZZZZ' which is not functionally dependent on 
     columns in GROUP BY clause; this is incompatible with
      sql_mode=only_full_group_by
     ```
     解決:執行如下兩個命令:
     ```
     mysql> set global sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
     
     mysql> set session sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
     ```
  • 刪除pep表,並將tmp表重命名爲pepmd5

    drop table pep;
    alter table tmp rename to pep;
  • 查看desc結構和select * from pep發現id的字段類型變了,這裏須要改回原來的類型;rem

    alter table pep add primary key (id);
    alter table pep modify id int auto_increment;

還有能夠使用join來作去重,更快的還能夠添加一個字段(能夠是幾個字段+起來的的md5值),給這個字段建立一個惟一索引unique,之後插入數據的時候,自動回過濾掉重複的數據。

相關文章
相關標籤/搜索