Join Reorder優化 - 論文摘要

時間 2019-11-05

標籤 join reorder 優化論文摘要欄目 SQL 简体版

原文原文鏈接

Query Simplification: Graceful Degradation for Join-Order Optimization

這篇的related work能夠參考，列的比較全面，node

Query圖分爲下面幾種，算法

Graph Simplification算法，app

Heuristic

Optimization of Large Join Queries: Combining Heuristic and Combinatorial Techniques

這篇文章的主要觀點，結合Combinatorial和Heuristic框架

Combinatorial意思是組合dom

組合優化問題就是，在狀態空間中尋找一個最優的狀態，狀態的cost由cost function來決定ide

Combinatorial優化算法主要分爲兩種，函數

Iterative算法，這裏主要是指repeat，不斷隨機重試，以找到更優解性能

退火算法， flex

Heuristic算法，優化

首先，Augmentation Heuristic

初始只有第一張關係表，

而後一張張往上加，加哪個取決於chooseNext函數

下面給出一些能夠用做chooseNext的指標，論文說實驗結果是3的效果最好

KBZ Heuristic

算法分爲三個部分，

R，給定一個rooted tree，給出optimal join ordering

T，給定一個join tree，遍歷全部的root，用R找出每一個rooted tree的optimal

G，給定一個join graph，可能cyclic，找出一個spanning tree(生成樹)，調用T

Local Improvement

分而治之，表數太多的時候，窮舉的代價很高，可是切分紅小的cluster，就會簡單許多

一樣這樣也沒法獲得最優解，cluster能夠重合

最後若是把兩個技術結合起來？

II和SA就是兩種基本的Combinatorial方法，

SAA，SAK分別把augmentation和KBZ兩種Heuristic方法用於SA，用於產生一個較優的initial state

IAI，IKI，用Heuristic的方法產生每一輪迭代的initial state

IAL，加入local improvement

AGI，KBI，先用Heuristic產生state，再用Iterative去優化

A New Heuristic for Optimizing Large Queries

查詢優化的目的是避免worst plans，而不是找到best plan，在這樣的假設下，啓發式算法可能會達到比較好的效果

當前基於combinatorial優化技術(好比iterative或退火)的cost-based searching，已經取得了必定的效果，可是當前的方法並無利用queries中inherent的semantic information

因此基本的思路就是，在當前cost-based searching的基礎上利用semantic information，從而提出Goo算法，Greedy Operator Ordering

這是一種，Greedy的bottom up算法

Node關鍵屬性是Size，Edge關鍵屬性是Selectivity

Goo的目的是逐漸合併各個node，

合併的標準是，每次都是找產生中間結果最小的edge進行合併

很明顯，Goo產生的確定不是最優解

通常的思路都是，基於啓發式的結果，進行進一步的調整和優化，找到更優解，

好比，增長一組rules，bottom up的試圖apply這些rules獲得更好的結果

Polynomial Heuristics for Query Optimization

One line of work adapts randomized techniques and combinatorial heuristics to address this problem.
These techniques consider the space of plans as points in a high-dimensional space, that can be 「traversed」 via transformations (e.g., join commutativity and
associativity).
Reference [13] surveys different such strategies, including iterative improvement, simulated annealing, and genetic algorithms.
These techniques can be seen as heuristic variations of transformation-based exhaustive enumeration algorithms.
Another line of work implements heuristic variations of dynamic programming. These approaches include reference [14] (which performs dynamic programming for a
subset of tables, picks the best k-table join, replaces it with a new 「virtual」 table, and repeats the procedure until all tables are part of the final plan),
reference [15] (which simplifies an initial join graph by disallowing non-promising join edges and then exhaustively searches the resulting, simpler problem using [8]), and references [16], [17] (which greedily build join trees one table at a time).

本文首先給出一個分類，比較新穎，

啓發式是優化的基本技術，分爲對於Transformation-based技術的啓發式優化，和動態規劃的啓發式優化

其中Heuristic DP算法都是基於graph的，能夠採用iterative的方式，根據cost等信息下降搜索空間等，或者用Greedy算法

可是文中說除了greedy的方案，其餘的性能都太差

因此文中給出一個通用的Greedy算法框架，ERM

P包含全部Plan，目的就是不斷的merge plan，最終只剩下一個plan，這個就是Greedy算法的目的，參考GOO

算法叫ERM，分爲3個階段，

首先要找出能夠用於merge的全部plan，關鍵是Valid函數，不一樣要求下，valid定義不同

邊上的例子，給出linear tree和bushy tree的差別

第二個階段是Ranking，即Maximizes函數，

若是挑出合併哪兩個plan是最優的，

有以下函數能夠選擇，本文提出MinSize，考慮tuples自己的長度，效果更好些