淺析MySQL中exists與in的使用

時間 2019-11-13

原文原文鏈接

轉自http://sunxiaqw.blog.163.com/blog/static/990654382013430105130443/mysql

exists對外表用loop逐條查詢，每次查詢都會查看exists的條件語句，當 exists裏的條件語句可以返回記錄行時(不管記錄行是的多少，只要能返回)，條件就爲真，返回當前loop到的這條記錄，反之若是exists裏的條件語句不能返回記錄行，則當前loop到的這條記錄被丟棄，exists的條件就像一個bool條件，當能返回結果集則爲true，不能返回結果集則爲 falsesql

以下：app

select * from user where exists (select 1);oop

對user表的記錄逐條取出，因爲子條件中的select 1永遠能返回記錄行，那麼user表的全部記錄都將被加入結果集，因此與 select * from user;是同樣的post

又以下性能

select * from user where exists (select * from user where userId = 0);blog

能夠知道對user表進行loop時，檢查條件語句(select * from user where userId = 0),因爲userId永遠不爲0，因此條件語句永遠返回空集，條件永遠爲false，那麼user表的全部記錄都將被丟棄索引

not exists與exists相反，也就是當exists條件有結果集返回時，loop到的記錄將被丟棄，不然將loop到的記錄加入結果集get

總的來講，若是A表有n條記錄，那麼exists查詢就是將這n條記錄逐條取出，而後判斷n遍exists條件 hash

in查詢至關於多個or條件的疊加，這個比較好理解，好比下面的查詢

select * from user where userId in (1, 2, 3);

等效於

select * from user where userId = 1 or userId = 2 or userId = 3;

not in與in相反，以下

select * from user where userId not in (1, 2, 3);

等效於

select * from user where userId != 1 and userId != 2 and userId != 3;

總的來講，in查詢就是先將子查詢條件的記錄全都查出來，假設結果集爲B，共有m條記錄，而後在將子查詢條件的結果集分解成m個，再進行m次查詢

值得一提的是，in查詢的子條件返回結果必須只有一個字段，例如

select * from user where userId in (select id from B);

而不能是

select * from user where userId in (select id, age from B);

而exists就沒有這個限制

下面來考慮exists和in的性能

考慮以下SQL語句

1: select * from A where exists (select * from B where B.id = A.id);

2: select * from A where A.id in (select id from B);

查詢1.能夠轉化如下僞代碼，便於理解

for ($i = 0; $i < count(A); $i++) {

　　$a = get_record(A, $i); #從A表逐條獲取記錄

　　if (B.id = $a[id]) #若是子條件成立

　　　　$result[] = $a;

}

return $result;

大概就是這麼個意思，其實能夠看到,查詢1主要是用到了B表的索引，A表如何對查詢的效率影響應該不大

假設B表的全部id爲1,2,3,查詢2能夠轉換爲

select * from A where A.id = 1 or A.id = 2 or A.id = 3;

這個好理解了，這裏主要是用到了A的索引，B表如何對查詢影響不大

下面再看not exists 和 not in

1. select * from A where not exists (select * from B where B.id = A.id);

2. select * from A where A.id not in (select id from B);

看查詢1，仍是和上面同樣，用了B的索引

而對於查詢2，能夠轉化成以下語句

select * from A where A.id != 1 and A.id != 2 and A.id != 3;

能夠知道not in是個範圍查詢，這種!=的範圍查詢沒法使用任何索引,等於說A表的每條記錄，都要在B表裏遍歷一次，查看B表裏是否存在這條記錄

故not exists比not in效率高

mysql中的in語句是把外表和內表做hash 鏈接，而exists語句是對外表做loop循環，每次loop循環再對內表進行查詢。一直你們都認爲exists比in語句的效率要高，這種說法實際上是不許確的。這個是要區分環境的。

若是查詢的兩個表大小至關，那麼用in和exists差異不大。

若是兩個表中一個較小，一個是大表，則子查詢表大的用exists，子查詢表小的用in：

例如：表A（小表），表B（大表）

1：

select * from A where cc in (select cc from B) 效率低，用到了A表上cc列的索引；

select * from A where exists(select cc from B where cc=A.cc) 效率高，用到了B表上cc列的索引。

相反的

2：

select * from B where cc in (select cc from A) 效率高，用到了B表上cc列的索引；

select * from B where exists(select cc from A where cc=B.cc) 效率低，用到了A表上cc列的索引。

not in 和not exists若是查詢語句使用了not in 那麼內外表都進行全表掃描，沒有用到索引；而not extsts 的子查詢依然能用到表上的索引。 因此不管那個表大，用not exists都比not in要快。

in 與 =的區別

select name from student where name in ('zhang','wang','li','zhao');

與

select name from student where name='zhang' or name='li' or name='wang' or name='zhao'

的結果是相同的。

實際中用到的一個例子：

echo "SELECT t2.name, t1.phone , t1.pay_amount FROM ( SELECT a.nid, a.phone, a.pay_amount FROM outflow_order a WHERE a.add_time <= UNIX_TIMESTAMP ('2018-04-04 18:00:00')*1000 AND a.status=1 AND a.grade ='B' AND a.merchant_no ='app' AND EXISTS (SELECT 1 FROM finance.borrow WHERE nid=a.nid AND STATUS IN (10,11)) AND a.pay_status=0 ) t1 LEFT JOIN finance.borrow t2 ON t2.nid=t1.nid" | mysql -h 192.168.1.100 -u root -p > /home/zhangsan/zdfk20180404.xls