MySQL Distinct 相關源碼閱讀筆記

時間 2019-12-11

原文原文鏈接

Based on MySQL 8.0 community versionmysql

Aggregator_distinct

指聚合型DISTINCT，下面爲一些example：sql

SELECT COUNT(DISTINCT)
-- 除了COUNT外，MySQL目前agg DISTINTCT只支持數值類型
SELECT SUM(DISTINCT)
SELECT AVG(DISTINCT)

之前的版本distinct會單獨有Item_sum_distinct等這樣的Item去實現，但這樣的話每一個agg function的Item實現都得加個distinct實現了，不大合理。MySQL8.0將非distinct和distinct邏輯抽出來，變成了Aggregator_simple和Aggregator_distinct，服務於繼承了Item_sum的全部聚合函數。數據結構

class Aggregator_simple : public Aggregator {
 public:
  Aggregator_simple(Item_sum *sum) : Aggregator(sum) {}
  Aggregator_type Aggrtype() override { return Aggregator::SIMPLE_AGGREGATOR; }

  bool setup(THD *thd) override { return item_sum->setup(thd); }
  void clear() override { item_sum->clear(); }
  bool add() override { return item_sum->add(); }
  void endup() override{};
  my_decimal *arg_val_decimal(my_decimal *value) override;
  double arg_val_real() override;
  bool arg_is_null(bool use_null_value) override;
};

從上看 Aggregator_simple 基本只是個調用wrapper，表示非distinct的Item_sum處理，所以直接調用Item_sum的邏輯便可。app

在MySQL的實現中一個聚合函數Item_sum的步驟簡單就是三步：setup, add, endup。setup在處理以前初始化，add表示每條record的process，endup就是收尾最後計算聚合的結果。ide

對於distinct來講，去重有兩種作法：第一種是維護索引結構存儲相關的field做爲索引來去重；第二種是持久化成一個磁盤的臨時table，而且把相應的單個field或多個field的結果組合成一個新的field，並在上面作hash類型的主鍵索引（惟一索引）。MySQL主要依賴第一種方式，以索引樹的方式存下key，重複的key則拒絕插入；第二種方式僅在COUNT DISTINCT且選擇了臨時表存儲的時候纔會採用。函數

MySQL的去重樹的實現是Unique類，內部去重數據結構是紅黑樹。當tree的數量大於max_elements時會觸發flush將內存的tree刷到磁盤的一個文件。單個文件內部能夠保證unique，可是跨文件不能保證。最後會用一種merge_sort的方式，對多個文件進行歸併遍歷從而實現有序去重。oop

在實現中，COUNT DISTINCT被拎出來單獨處理，一方面多是由於COUNT比較經常使用，另外一方面COUNT有可能被優化節省存儲計算。非COUNT 的DISTINCT 在聚合這裏主要指相似 SELECT SUM(DISTINCT field)的query。優化

Aggregator_distinct COUNT操做 setup步驟：ui

構建臨時表用於持久化存儲。臨時表的存儲引擎在以下setup_tmp_table_handler方法中。判斷邏輯以下：this
- select_option如有配置強制指定TMP_TABLE_FORCE_MYISAM，則臨時表使用MyISAM引擎持久化，會寫盤。
- use_tmp_disk_storage_engine判斷到底是用disk仍是in memory的臨時表，基本是force_disk_table參數、opt_initialize(初始化階段heap engine not ready)、blob與memory table的不兼容等條件判斷。
```
bool setup_tmp_table_handler(TABLE *table, ulonglong select_options,
                             bool force_disk_table, bool schema_table);

bool use_tmp_disk_storage_engine(
    TABLE *table, ulonglong select_options, bool force_disk_table,
    enum_internal_tmp_mem_storage_engine mem_engine);
```
若是臨時表是內存引擎(MEMORY/TEMPTABLE)，則構建並直接使用去重樹，去重樹效率會被認爲比臨時表高。

Aggregator_distinct 非COUNT操做 setup 則是直接構建臨時表和去重樹。最後實際仍是依靠去重樹遍歷去完成去重和Item_sum計算。

以上不管是否COUNT操做都存在臨時表構建後不用來存數據的狀況，但代碼中依然會依賴臨時表table對象來獲取Field::is_null()的信息。

SELECT DISTINCT

// EXAMPLE
SELECT DISTINCT a FROM table;
SELECT DISTINCT a, b, c FROM table;
SELECT DISTINCT a.*  FROM tbl_applicant a

// related member From class JOIN
/**
    At construction time, set if SELECT DISTINCT. May be reset to false
    later, when we set up a temporary table operation that deduplicates for us.
   */
  bool select_distinct;

SELECT_DISTINCT有兩種方式，以tpch lineitem表舉例，假設咱們在l_orderkey上有個非unique的二級索引，在l_returnflag字段上沒有索引。

如上圖，若是distinct的列能被索引覆蓋，則會走index；不然會建立臨時表。這點和聚合DISTINCT裏提到的DISTINCT的兩種解決方法依然不違背，mysql index至關於提到的第一種索引的方法，只是無需在處理中額外維護。

在optimize階段，會先提取select distinct的涉及字段，並嘗試判斷是否屬於同一個索引：

static void add_loose_index_scan_and_skip_scan_keys(JOIN *join,
                                                    JOIN_TAB *join_tab) {
    ...
   else if (join->select_distinct) { /* Collect all query fields referenced in
                                         the SELECT clause. */
    List<Item> &select_items = join->fields_list;
    List_iterator<Item> select_items_it(select_items);
    Item *item;
    while ((item = select_items_it++))
      item->walk(&Item::collect_item_field_processor, Item::WALK_POSTFIX,
                 (uchar *)&indexed_fields);
    cause = "distinct";
  }
  ...
  
  Key_map possible_keys;
  possible_keys.set_all();

  /* Intersect the keys of all group fields. */
  while ((cur_item = indexed_fields_it++)) {
    if (cur_item->used_tables() != join_tab->table_ref->map()) {
      /*
        Doing GROUP BY or DISTINCT on a field in another table so no
        index in this table is usable
      */
      return;
    } else
      possible_keys.intersect(cur_item->field->part_of_key);
    // 此處若是possible_keys返回0，即交集爲空，則distinct不能使用index優化。
  }
  ...  
}

單表的distinct操做會轉成GROUP BY操做。而GROUP BY 的fields若是在上面的邏輯中沒有判斷成索引，則會設置need_tmp_before_win=true。該變量會在bool JOIN::make_tmp_tables_info()函數中做爲是否要建立distinct 臨時表的開關。建立臨時表的邏輯在bool JOIN::create_intermediate_table。

// JOIN::test_skip_sort()
/*
        If we are going to use semi-join LooseScan, it will depend
        on the selected index scan to be used.  If index is not used
        for the GROUP BY, we risk that sorting is put on the LooseScan
        table.  In order to avoid this, force use of temporary table.
        TODO: Explain the quick_group part of the test below.
       */
      if ((m_ordered_index_usage != ORDERED_INDEX_GROUP_BY) &&
          (tmp_table_param.quick_group ||
           (tab->emb_sj_nest &&
            tab->position()->sj_strategy == SJ_OPT_LOOSE_SCAN))) {
        need_tmp_before_win = true;
        simple_order = simple_group = false;  // Force tmp table without sort
      }

臨時表的建立核心函數是位於sql_tmp_table.cc的create_tmp_table。其中涉及distinct的邏輯以下代碼。using_unique_constraint表示須要另外加一列key的hash值用來作distinct，用於distinct的field太大或太多沒法作索引的狀況。邏輯就是能用distinct field做key的就生成key，不然using_unique_constraint=true生成一個額外的hash_key字段。

// sql_tmp_table.cc:create_tmp_table
// ...
if (group) {
    if (!param->quick_group)
      group = 0;  // Can't use group key
    else
      for (ORDER *tmp = group; tmp; tmp = tmp->next) {
        /*
          marker == MARKER_BIT means two things:
          - store NULLs in the key, and
          - convert BIT fields to 64-bit long, needed because MEMORY tables
            can't index BIT fields.
        */
        (*tmp->item)->marker = Item::MARKER_BIT;
        const uint char_len = (*tmp->item)->max_length /
                              (*tmp->item)->collation.collation->mbmaxlen;
        if (char_len > CONVERT_IF_BIGGER_TO_BLOB)
          using_unique_constraint = true;
      }
    if (group) {
      if (param->group_length >= MAX_BLOB_WIDTH) using_unique_constraint = true;
      distinct = 0;  // Can't use distinct if group key is too large
    }
  }

// ...
 update_hidden:
    /*
      Calculate length of distinct key. The goal is to decide what to use -
      key or unique constraint. As blobs force unique constraint on their
      own due to their length, they aren't taken into account.
    */
      
    if (distinct && !using_unique_constraint && hidden_field_count <= 0 &&
        new_field) {
      if (new_field->flags & BLOB_FLAG)
        // BLOB的不能作索引
        using_unique_constraint = true;
      else
        //計算總的distinct_key_length用於後面判斷是否distinct的key過大。
        distinct_key_length += new_field->pack_length();
    } 

// ...
 /*
    To enforce unique constraint we need to add a field to hold key's hash
    A1) already detected unique constraint
    A2) distinct key is too long
    A3) number of keyparts in distinct key is too big
  */
  if (using_unique_constraint ||               // 1
      distinct_key_length > max_key_length ||  // 2
      (distinct &&                             // 3
       (fieldnr - param->hidden_field_count) > max_key_parts)) {
    using_unique_constraint = true;
  }

GROUPBY 執行

在make_group_fields -> alloc_group_fields -> get_end_select_func的調用鏈中，get_end_select_func會決定end_send函數的實現。end_send函數是JOIN執行到最後一個表的時候的next_select函數。

/**
  @details
  Rows produced by a join sweep may end up in a temporary table or be sent
  to a client. Setup the function of the nested loop join algorithm which
  handles final fully constructed and matched records.
  @return
    end_select function to use. This function can't fail.
*/
Next_select_func JOIN::get_end_select_func() {
  DBUG_ENTER("get_end_select_func");
  /*
     Choose method for presenting result to user. Use end_send_group
     if the query requires grouping (has a GROUP BY clause and/or one or
     more aggregate functions). Use end_send if the query should not
     be grouped.
   */
  if (streaming_aggregation && !tmp_table_param.precomputed_group_by) {
    DBUG_PRINT("info", ("Using end_send_group"));
    DBUG_RETURN(end_send_group);
  }
  DBUG_PRINT("info", ("Using end_send"));
  DBUG_RETURN(end_send);
}

MySQL的JOIN是串行執行的，在evaluate_join_record路徑上，每一條record都會遞歸式調用下一個table的next_select（通常是sub_select）。在最後一個表next_select則會設成end_send；對於非臨時表走索引的GROUPBY則會設成end_send_group。

if (end_of_records) {  // 當前表讀完 或 這一次讀達到了buffer閾值
    enum_nested_loop_state nls =
        (*qep_tab->next_select)(join, qep_tab + 1, end_of_records);
    DBUG_RETURN(nls);
  }

end_send_group裏最直白的邏輯是update_sum_func調用，表示聚合函數的add調用(若是有聚合函數的話)。整個思路大體是同一個group內的就直接調用相關的聚合函數的add操做；不然，設置join->seen_first_record = true並更新join->group_fields的值以表示進入了新的group。

臨時表（不管是否GROUPBY）則是走sub_select_op->end_write寫到一個temptable裏（參見create_intermediate_table）。

SELECT DISTINCT的一些優化：

distinct on unique field

bool JOIN::optimize_distinct_group_order() {
  ...
   if (select_distinct &&
        list_contains_unique_index(tab, find_field_in_item_list,
                                   (void *)&fields_list)) {
      select_distinct = 0;
      trace_opt.add("distinct_is_on_unique", true)
          .add("removed_distinct", true);
    }  
  ...
}

remove const table(1 row) distinct

Optimize distinct when used on a subset of the tables.

E.g.,: SELECT DISTINCT t1.a FROM t1,t2 WHERE t1.b=t2.b
In this case we can stop scanning t2 when we have found one t1.a

void JOIN::optimize_distinct() {
  // check const table only
  for (int i = primary_tables - 1; i >= 0; --i) {
    QEP_TAB *last_tab = qep_tab + i;
    /**
    select_list_tables： The set of those tables whose fields are referenced in the select list of this select level.
  */
    if (select_lex->select_list_tables & last_tab->table_ref->map()) break;
    last_tab->not_used_in_distinct = true;
  }

  /* Optimize "select distinct b from t1 order by key_part_1 limit #" */
  if (order && skip_sort_order) {
    /* Should already have been optimized away */
    DBUG_ASSERT(m_ordered_index_usage == ORDERED_INDEX_ORDER_BY);
    if (m_ordered_index_usage == ORDERED_INDEX_ORDER_BY) {
      order = NULL;
    }
  }
}