在Oracle中如何利用Rowid查找和刪除表中的重複記錄

平時工做中可能會遇到當試圖對庫表中的某一列或幾列建立惟一索引時,系統提示 ORA-01452 :不能建立惟一索引,發現重複記錄。

下面總結一下幾種查找和刪除重複記錄的方法(以表CZ爲例):
表CZ的結構以下:
SQL> desc cz
 Name                                      Null?    Type
 ----------------------------------------- -------- ------------------
sql

 C1                                                 NUMBER(10)
 C10                                                NUMBER(5)
 C20                                                VARCHAR2(3)

刪除重複記錄的方法原理:
(1).在
Oracle中,每一條記錄都有一個rowid,rowid在整個數據庫中是惟一的,rowid肯定了每條記錄是在Oracle中的哪個數據文件、塊、行上。

(2).在重複的記錄中,可能全部列的內容都相同,但rowid不會相同,因此只要肯定出重複記錄中那些具備最大rowid的就能夠了,其他所有刪除。

重複記錄判斷的標準是:
C1,C10和C20這三列的值都相同纔算是重複記錄。

經查看錶CZ總共有16條記錄:
SQL>set pagesize 100
SQL>select * from cz;
數據庫

         C1        C10 C20
---------- ---------- ---
         1          2   dsf
         1          2   dsf
         1          2   dsf
         1          2   dsf
         2          3   che
         1          2   dsf
         1          2   dsf
         1          2   dsf
         1          2   dsf
         2          3   che
         2          3   che
         2          3   che
         2          3   che
         3          4   dff
         3          4   dff
         3          4   dff
         4          5   err
         5          3   dar
         6          1   wee
         7          2   zxc
spa

20 rows selected.

1.查找重複記錄的幾種方法:
(1).SQL>select * from cz group by c1,c10,c20 having count(*) >1;
        C1        C10 C20
---------- ---------- ---
         1          2   dsf
         2          3   che
         3          4   dff

(2).SQL>select distinct * from cz;
索引

        C1        C10 C20
---------- ---------- ---
         1          2   dsf
         2          3   che
         3          4   dff

(3).SQL>select * from cz a where rowid=(select max(rowid) from cz where c1=a.c1 and c10=a.c10 and c20=a.c20);
        C1        C10 C20
---------- ---------- ---
         1          2   dsf
         2          3   che
         3          4   dff

2.刪除重複記錄的幾種方法:
(1).適用於有大量重複記錄的狀況(在C1,C10和C20列上建有索引的時候,用如下語句效率會很高):
SQL>delete cz where (c1,c10,c20) in (select c1,c10,c20 from cz group by c1,c10,c20 having count(*)>1) and rowid not in
(select min(rowid) from cz group by c1,c10,c20 having count(*)>1);


SQL>delete cz where rowid not in(select min(rowid) from cz group by c1,c10,c20);

(2).適用於有少許重複記錄的狀況(注意,對於有大量重複記錄的狀況,用如下語句效率會很低):
SQL>delete from cz a where a.rowid!=(select max(rowid) from cz b where a.c1=b.c1 and a.c10=b.c10 and a.c20=b.c20);

SQL>delete from cz a where a.rowid<(select max(rowid) from cz b where a.c1=b.c1 and a.c10=b.c10 and a.c20=b.c20);
get

SQL>delete from cz a where rowid <(select max(rowid) from cz  where c1=a.c1 and c10=a.c10 and c20=a.c20);

(3).適用於有少許重複記錄的狀況(臨時表法):
SQL>create table test as select distinct * from cz; (建一個臨時表test用來存放重複的記錄)
io

SQL>truncate table cz;  (清空cz表的數據,但保留cz表的結構) table

SQL>insert into cz select * from test;  (再將臨時表test裏的內容反插回來)

(4).適用於有大量重複記錄的狀況(Exception into 子句法):
採用alter table 命令中的 Exception into 子句也能夠肯定出庫表中重複的記錄。這種方法稍微麻煩一些,爲了使用「excepeion into 」子句,必須首先建立 EXCEPTIONS 表。建立該表的 
SQL 腳本文件爲 utlexcpt.sql 。對於win2000系統和 UNIX 系統, Oracle 存放該文件的位置稍有不一樣,在win2000系統下,該腳本文件存放在$ORACLE_HOME\Ora90\rdbms\admin 目錄下;而對於 UNIX 系統,該腳本文件存放在$ORACLE_HOME/rdbms/admin 目錄下。

具體步驟以下:
SQL>@?/rdbms/admin/utlexcpt.sql
test

Table created.

SQL>desc exceptions
 Name                                      Null?    Type
 ----------------------------------------- -------- --------------
效率

 ROW_ID                                             ROWID
 OWNER                                              VARCHAR2(30)
 TABLE_NAME                                 VARCHAR2(30)
 CONSTRAINT                                   VARCHAR2(30)

SQL>alter table cz add constraint  cz_unique unique(c1,c10,c20) exceptions into exceptions;  
                                                      *
ERROR at line 1:
ORA-02299: cannot validate (TEST.CZ_UNIQUE) - duplicate keys found

SQL>create table dups as select * from cz where rowid in (select row_id from exceptions);

Table created.

SQL>select * from dups;
原理

        C1        C10 C20
---------- ---------- ---
         1          2   dsf
         1          2   dsf
         1          2   dsf
         1          2   dsf
         2          3   che
         1          2   dsf
         1          2   dsf
         1          2   dsf
         1          2   dsf
         2          3   che
         2          3   che
         2          3   che
         2          3   che
         3          4   dff
         3          4   dff
         3          4   dff

16 rows selected.

SQL>select row_id from exceptions;

ROW_ID
------------------
AAAHD/AAIAAAADSAAA
AAAHD/AAIAAAADSAAB
AAAHD/AAIAAAADSAAC
AAAHD/AAIAAAADSAAF
AAAHD/AAIAAAADSAAH
AAAHD/AAIAAAADSAAI
AAAHD/AAIAAAADSAAG
AAAHD/AAIAAAADSAAD
AAAHD/AAIAAAADSAAE
AAAHD/AAIAAAADSAAJ
AAAHD/AAIAAAADSAAK
AAAHD/AAIAAAADSAAL
AAAHD/AAIAAAADSAAM
AAAHD/AAIAAAADSAAN
AAAHD/AAIAAAADSAAO
AAAHD/AAIAAAADSAAP

16 rows selected.

SQL>delete from cz where rowid in ( select row_id from exceptions);

16 rows deleted.

SQL>insert into cz select distinct * from dups;

3 rows created.

SQL>select *from cz;
 
        C1        C10 C20
---------- ---------- ---
         1          2   dsf
         2          3   che
         3          4   dff
         4          5   err
         5          3   dar
         6          1   wee
         7          2   zxc

7 rows selected. 從結果裏能夠看到重複記錄已經刪除。

相關文章
相關標籤/搜索