數據庫包含重複數據,須要清理掉重複數據,並只保留其中一條。
優化:百萬數據查詢刪除重複數據,耗時從5423秒降低到2秒左右
根據搜索到的資料:sql
四、刪除表中多餘的重複記錄(多個字段),只留有rowid最小的記錄數據庫
delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
根據搜索到的資料,編寫第一個版本的sql語句:學習
delete from lcfyjttz where (fdate, ffjdm, flcdm, ffytype, fgsbz) in( select fdate,ffjdm,flcdm, ffytype, fgsbz from lcfyjttz group by fdate,ffjdm,flcdm, ffytype, fgsbz having count(1) > 1) and rowid not in( select min(rowid) as rid from lcfyjttz group byfdate,ffjdm,flcdm, ffytype, fgsbz having count(1) > 1 )
百萬數據量的狀況下,其執行結果以下:
看這個sql的執行結果就很嚇人,做爲一個追求3秒級的人,簡直忍受不了,開始嘗試優化這條sql。經歷一段自殘式的試錯,也終因而實現了。優化
優化後sql:spa
DELETE FROM LCFYJTTZ c WHERE EXISTS ( SELECT a.ROWID FROM LCFYJTTZ a, ( SELECT fdate, ffjdm, flcdm, ffytype, fgsbz, MIN( ROWID ) rid FROM lcfyjttz GROUP BY fdate, ffjdm, flcdm, ffytype, fgsbz HAVING count( 1 ) > 1 ) b WHERE a.FDATE = b.FDATE AND a.FFJDM = b.FFJDM AND a.FLCDM = b.FLCDM AND a.ffytype = b.FFYTYPE AND a.ROWID != b.rid AND c.ROWID = a.ROWID )
其執行結果以下:code
在優化過程當中,仍是學習到不少知識,好比in和exists關鍵字的使用,with...as的語法使用,我嘗試過用但沒用上。blog