Nested Loop Join - JavaShuo

咱們都知道SQL的join關聯表的使用方式，可是此次聊的是實現join的算法，join有三種算法，分別是Nested Loop Join，Hash join，Sort Merge Join。html

MySQL官方文檔中提到，MySQL只支持Nested Loop Join這一種join algorithmmysql

MySQL resolves all joins using a nested-loop join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on. explain-output算法

因此本篇只聊Nested Loop Join。sql

NLJ是經過兩層循環，用第一張表作Outter Loop，第二張表作Inner Loop，Outter Loop的每一條記錄跟Inner Loop的記錄做比較，符合條件的就輸出。而NLJ又有3種細分的算法：緩存

一、Simple Nested Loop Join（SNLJ）

// 僞代碼
    for (r in R) {
        for (s in S) {
            if (r satisfy condition s) {
                output <r, s>;
            }
        }
    }

SNLJ就是兩層循環全量掃描鏈接的兩張表，獲得符合條件的兩條記錄則輸出，這也就是讓兩張表作笛卡爾積，比較次數是R * S，是比較暴力的算法，會比較耗時。oop

二、Index Nested Loop Join（INLJ）

// 僞代碼
    for (r in R) {
        for (si in SIndex) {
            if (r satisfy condition si) {
                output <r, s>;
            }
        }
    }

INLJ是在SNLJ的基礎上作了優化，經過鏈接條件肯定可用的索引，在Inner Loop中掃描索引而不去掃描數據自己，從而提升Inner Loop的效率。而INLJ也有缺點，就是若是掃描的索引是非聚簇索引，而且須要訪問非索引的數據，會產生一個回表讀取數據的操做，這就多了一次隨機的I/O操做。優化

三、Block Nested Loop Join（BNLJ）

通常狀況下，MySQL優化器在索引可用的狀況下，會優先選擇使用INLJ算法，可是在無索引可用，或者判斷full scan可能比使用索引更快的狀況下，仍是不會選擇使用過於粗暴的SNLJ算法。這裏就出現了BNLJ算法了，BNLJ在SNLJ的基礎上使用了join buffer，會提早讀取Inner Loop所須要的記錄到buffer中，以提升Inner Loop的效率。spa

// 僞代碼
    for (r in R) {
        for (sbu in SBuffer) {
            if (r satisfy condition sbu) {
                output <r, s>;
            }
        }
    }

MySQL中控制join buffer大小的參數名是join_buffer_size。翻譯

We only store the used columns in the join buffer, not the whole rows.<br/>join-buffer-sizecode

根據MySQL手冊中的說法，join_buffer_size緩衝的是被使用到的列。

算法比較(外表大小R，內表大小S)：

\algorithm comparison\	Simple Nested Loop Join	Index Nested Loop Join	Block Nested Loop Join
外表掃描次數	1	1	1
內表掃描次數	R	0
讀取記錄次數	R + R * S	R + RS_Matches
比較次數	R * S	R * IndexHeight	R * S
回表次數	0	RS_Matches	0

在MySQL5.6中，對INLJ的回表操做進行了優化，增長了Batched Key Access Join（批量索引訪問的表關聯方式，這樣翻譯能夠不。。。）和Multi Range Read（mrr，多範圍讀取）特性，在join操做中緩存所須要的數據的rowid，再批量去獲取其數據，把I/O從屢次零散的操做優化爲更少次數批量的操做，提升效率。