semi-join Materialization 是用於semi-join的一種特殊的子查詢物化技術。一般包含兩種策略:
1.Materialization/lookup
2.Materialization/scansql
考慮一個查詢歐洲有大城市的國家:函數
select * from Country where Country.code IN (select City.Country from City where City.Population > 7*1000*1000) and Country.continent='Europe'
子查詢是非相關子查詢。也便是咱們能夠獨立運行內查詢。semi-materialization的思想是使用city.country中可能的值填充一個臨時表,而後和歐洲的國家進行關聯。優化
這個join能夠從兩個方向進行:
1.從物化表到國家表
2.從國家表到物化表code
第一個方向涉及一個全表掃描(在物化表上的全表掃描),所以被稱爲"Materialization-scan"
若是從第二個方向進行,最廉價的方式是使用主鍵從物化表中lookup出匹配的記錄。這種方式被稱爲"Materialization-lookup"。blog
Materialization-scan
若是咱們尋找人口超過700萬的城市,優化器將使用materialize-scan,EXPLAIN輸出結果也會顯示這一點:索引
MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 7*1000*1000); +----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+ | 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 15 | | | 1 | PRIMARY | Country | eq_ref | PRIMARY | PRIMARY | 3 | world.City.Country | 1 | | | 2 | MATERIALIZED | City | range | Population,Country | Population | 4 | NULL | 15 | Using index condition | +----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+ 3 rows in set (0.01 sec)
從上能夠看到:ci
1.仍然有兩個select(id=1和id=2)get
2.第二個select(id=2)的select_type是MATERIALIZED。這表示會執行並將結果存儲在一個在全部列上帶有一個惟一性索引的臨時表。這個惟一性索引能夠避免有重複的記錄it
3.第一個select中接收到一個名爲subquery2的表,這是從第二個select(id=2)獲取的物化的表io
優化器選擇在物化的表上執行全表掃描。這就是Materialization-Scan策略的示例。
至於執行成本,咱們將從表City讀取15行,將15行寫入物化表,而後讀取它們(優化器假設不會有任何重複),而後對錶Country執行15次eq_ref訪問。總共,咱們將進行45次讀取和15次寫入。
相比之下,若是你在MySQL中運行EXPLAIN,你會獲得以下結果:
MySQL [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 7*1000*1000); +----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+ | 1 | PRIMARY | Country | ALL | NULL | NULL | NULL | NULL | 239 | Using where | | 2 | DEPENDENT SUBQUERY | City | range | Population,Country | Population | 4 | NULL | 15 | Using index condition; Using where | +----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
讀的記錄是(239 + 239*15) = 3824。
Materialization-Lookup
讓咱們稍微修改一下查詢,看看哪些國家的城市人口超過1百萬(而不是7百萬):
MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 1*1000*1000) ; +----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+ | 1 | PRIMARY | Country | ALL | PRIMARY | NULL | NULL | NULL | 239 | | | 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 3 | func | 1 | | | 2 | MATERIALIZED | City | range | Population,Country | Population | 4 | NULL | 238 | Using index condition | +----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+ 3 rows in set (0.00 sec)
explain的輸出結果和Materialization-scan相似,除了:
1.subquery2表是經過eq_ref訪問的
2.access使用了索引distinct_key
這意味着優化器計劃對物化表執行索引查找。換句話說,咱們將使用Materialization-lookup策略。
在MySQL中(或者使用optimizer_switch='semi-join=off,materialization=off'),會獲得這樣的執行計劃:
MySQL [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 1*1000*1000) ; +----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+ | 1 | PRIMARY | Country | ALL | NULL | NULL | NULL | NULL | 239 | Using where | | 2 | DEPENDENT SUBQUERY | City | index_subquery | Population,Country | Country | 3 | func | 18 | Using where | +----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
能夠看出,這兩個執行計劃都將對國家表進行全面掃描。對於第二步,MariaDB將填充物化表(238行從表City讀取並寫入臨時表),而後對錶Country中的每一個記錄執行唯一的鍵查找,結果是238個唯一的鍵查找。總的來講,第二步將花費(239+238)= 477讀取和238 temp.table的寫入。
MySQL的第二步計劃是使用City上的索引讀取18行。它爲表國家接收的每一個記錄的國家。計算出來的成本爲(18*239)= 4302讀取。若是有更少的子查詢調用,這個計劃將比物化的計劃更好。順便說一下,MariaDB也能夠選擇使用這樣的查詢計劃(請參閱FirstMatch策略),可是它沒有選擇。
帶有group by的子查詢
當子查詢帶有分組的時候,MariaDB可使用semi-join物化策略(這種場景下,其餘semi-join策略不適用)
這容許高效地執行搜索某個組中最佳/最後一個元素的查詢。
舉個例子,咱們來看看每一個大陸上人口最多的城市:
explain select * from City where City.Population in (select max(City.Population) from City, Country where City.Country=Country.Code group by Continent) +------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+ | 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 239 | | | 1 | PRIMARY | City | ref | Population | Population | 4 | <subquery2>.max(City.Population) | 1 | | | 2 | MATERIALIZED | Country | ALL | PRIMARY | NULL | NULL | NULL | 239 | Using temporary | | 2 | MATERIALIZED | City | ref | Country | Country | 3 | world.Country.Code | 18 | | +------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+ 4 rows in set (0.00 sec)
城市是:
+------+-------------------+---------+------------+ | ID | Name | Country | Population | +------+-------------------+---------+------------+ | 1024 | Mumbai (Bombay) | IND | 10500000 | | 3580 | Moscow | RUS | 8389200 | | 2454 | Macao | MAC | 437500 | | 608 | Cairo | EGY | 6789479 | | 2515 | Ciudad de México | MEX | 8591309 | | 206 | São Paulo | BRA | 9968485 | | 130 | Sydney | AUS | 3276207 | +------+-------------------+---------+------------+
Semi-join materialization
1.能夠用於非相關的in子查詢。子查詢能夠含有分組、和/或聚合函數
2.在explain輸出中,子查詢會有type=Materialized;父表子查詢中有table=<subqueryN>
3.開啓須要將變量optimizer_switch中的materialization=on、semijoin=on
4.Non-semijoin materialization與materialization=on|off標記共享
https://mariadb.com/kb/en/library/semi-join-materialization-strategy/