初探InnoDB MVCC源碼實現

時間 2019-12-05

標籤初探 innodb mvcc 源碼實現欄目 MySQL 简体版

原文原文鏈接

1. 背景

本文基於MySQL InnoDB源碼對InnoDB中非鎖定一致性讀是如何實現的做一些簡單的探究。html

2. 基本概念

2.1 隱藏字段

在經典之做《高性能MySQL》的1.4節中說起了MySQL中MVCC的實現，原著中說起了mysql

InnoDB implements MVCC by storing with each row two additional, hidden values that record when the row was created and when it was expired (or deleted). Rather than storing the actual times at which these events occurred, the row stores the system version number at the time each event occurred. This is a number that increments each time a transaction begins. Each transaction keeps its own record of the current system version, as of the time it began. Each query has to check each row’s version numbers against the transaction’s version.sql

咱們知道InnoDB中聚簇索引包含了數據行的完整信息，《高性能MySQL》這裏說的就是在InnoDB的聚簇索引中的行包含了行記錄什麼時候被建立以及什麼時候被刪除的信息。《高性能MySQL》這裏的描述或許是爲了方便讀者理解。實際上聚簇索引中的行包含了兩個隱藏的字段信息：數據庫

DATA_TRX_ID 6字節最新一個對某記錄增刪改的事務id
DATA_ROLL_PTR 7字節回滾指針

關於這裏信息能夠參考storage/innobase/include/data0type.h頭文件。數據結構

而對於二級索引記錄，是不包含上面這兩個隱藏字段信息的，但對於二級索引，會在頁中會記錄一個PAGE_MAX_TRX_ID，表示對該頁數據修改過的最大事務id。
關於這裏的信息能夠參考storage/innobase/include/page0page.h頭文件mvc

2.2 Read View

Read View保存了某一時刻活躍讀寫事務的快照信息，用來判斷某個一致性讀是否可見其它事務對錶的修改。
其被定義在read0types.h頭文件中，下面來看一下其中部分字段：性能

// 事務id>=m_low_limit_id的修改對於當前讀不可見
trx_id_t    m_low_limit_id;

// 事務id<m_up_limit_id的修改對於當前讀可見
trx_id_t    m_up_limit_id;

// 建立view的事務id
trx_id_t    m_creator_trx_id;

// 建立view時處於active狀態的讀寫事務列表，這裏的ids_t能夠簡單看做是一個vector
ids_t       m_ids;

在InnoDB的事務定義（參考trx0trx.h頭文件）中包含了一個字段用來表示該事務的Read View。ui

ReadView*   read_view;

在InnoDB進行進行一致性讀時，會判斷當前事務的Read View是否存在，若是不存在則get一個新的Read View（InnoDB對於Read View有複用的機制，因此若是不存在能夠複用的Read View對象纔會去顯示地new一個新的出來）。下面是trx_assign_read_view方法實現：線程

ReadView*
trx_assign_read_view(
/*=================*/
    trx_t*      trx)    /*!< in/out: active transaction */
{
    ut_ad(trx->state == TRX_STATE_ACTIVE);

    if (srv_read_only_mode) {

        ut_ad(trx->read_view == NULL);
        return(NULL);

    } else if (!MVCC::is_view_active(trx->read_view)) {
        trx_sys->mvcc->view_open(trx->read_view, trx);
    }

    return(trx->read_view);
}

下面再來看一下Read View是如何初始化的。指針

void
ReadView::prepare(trx_id_t id)
{
    ut_ad(mutex_own(&trx_sys->mutex));

    m_creator_trx_id = id;

    // trx_sys->max_trx_id是當前最小未分配的事務id。
    m_low_limit_no = m_low_limit_id = trx_sys->max_trx_id;

    // 將當前只讀事務的id拷貝到view中的m_ids。
    if (!trx_sys->rw_trx_ids.empty()) {
        copy_trx_ids(trx_sys->rw_trx_ids);
    } else {
        m_ids.clear();
    }

    // trx_sys->serialisation_list是事務提交時會加入的一個按照trx->no排序的列表。
    // 這裏取列表中第一個（若是有的話）爲m_low_limit_no供purge線程做爲是否清理undo的依據。
    if (UT_LIST_GET_LEN(trx_sys->serialisation_list) > 0) {
        const trx_t*    trx;

        trx = UT_LIST_GET_FIRST(trx_sys->serialisation_list);

        if (trx->no < m_low_limit_no) {
            m_low_limit_no = trx->no;
        }
    }
}

void
ReadView::complete()
{
    // m_up_limit_id取活躍事務最小id。
    m_up_limit_id = !m_ids.empty() ? m_ids.front() : m_low_limit_id;

    ut_ad(m_up_limit_id <= m_low_limit_id);

    m_closed = false;
}

對於Read Committed的隔離級別，在一致性讀語句結束後，會關閉掉Read View，而對於Repeatable Read的隔離級別，Read View在建立後會一直到事務結束時才被關閉。

3 Read View如何判斷可見性

上面已經對Read View進行了大體介紹，下面就來看一下InnoDB是如何判斷記錄是否對當前事務可見的吧。這裏的入口是storage/innobase/row/row0sel.cc的row_search_mvcc方法。

3.1 走聚簇索引的狀況

假設sql查詢走的是聚簇索引，則經過下面的lock_clust_rec_cons_read_sees方法來判斷記錄rec是否對當前事務可見。

bool
lock_clust_rec_cons_read_sees(
    const rec_t*    rec,    
    dict_index_t*   index,
    const ulint*    offsets,
    ReadView*   view)   
{
    ut_ad(dict_index_is_clust(index));
    ut_ad(page_rec_is_user_rec(rec));
    ut_ad(rec_offs_validate(rec, index, offsets));

        // 對於InnoDB處於只讀模式或者表爲臨時表的狀況永遠都是可見的。
    if (srv_read_only_mode || dict_table_is_temporary(index->table)) {
        ut_ad(view == 0 || dict_table_is_temporary(index->table));
        return(true);
    }


    // 獲取行記錄上的事務id。
    trx_id_t    trx_id = row_get_rec_trx_id(rec, index, offsets);

    // 判斷是否可見。
    return(view->changes_visible(trx_id, index->table->name));
}

下面再來看看ReadView::changes_visible方法的實現源碼：

bool changes_visible(
    trx_id_t        id,
    const table_name_t& name) const
    MY_ATTRIBUTE((warn_unused_result))
{
    ut_ad(id > 0);

    // 若是行記錄上的id<m_up_limit_id或者等於m_creator_trx_id則可見。
    if (id < m_up_limit_id || id == m_creator_trx_id) {

        return(true);
    }

    check_trx_id_sanity(id, name);

    // 若是行記錄上的id>=m_low_limit_id，則不可見。
    if (id >= m_low_limit_id) {

        return(false);

    } else if (m_ids.empty()) {

        return(true);
    }

    const ids_t::value_type*    p = m_ids.data();

    // 二分判斷是否在m_ids中，若是存在則不可見。
    return(!std::binary_search(p, p + m_ids.size(), id));
}

理一下這裏判斷的依據

記錄的事務id爲m_creator_trx_id即當前事務的修改，必定可見。
記錄的事務id<m_up_limit_id，說明Read View在初始化的時候，修改此記錄的事務已經提交了，所以可見。
記錄的事務id>=m_low_limit_id，說明Read View在初始化的時候，修改改記錄的事務還沒開啓（準確說是還沒被分配到事務id），所以不可見。

若是這裏不知足的話，會走到row_sel_build_prev_vers_for_mysql->row_vers_build_for_consistent_read的調用，根據回滾段中的信息不斷構建前一個版本信息直至當前事務可見。

3.2 走二級索引的狀況

bool
lock_sec_rec_cons_read_sees(
    const rec_t*        rec,    
    const dict_index_t* index,
    const ReadView* view)
{
    ut_ad(page_rec_is_user_rec(rec));

    if (recv_recovery_is_on()) {
        return(false);
    } else if (dict_table_is_temporary(index->table)) {
        return(true);
    }
    // 取索引頁上的PAGE_MAX_TRX_ID字段。
    trx_id_t    max_trx_id = page_get_max_trx_id(page_align(rec));

    ut_ad(max_trx_id > 0);

    return(view->sees(max_trx_id));
}

下面是ReadView:sees的實現，能夠看到其實就是判斷是否PAGE_MAX_TRX_ID小於ReadView初始化時的最小事務id，也就是判斷修改頁上記錄的最大事務id是否在快照生成的時候已經提交了，簡單粗暴的很。

bool sees(trx_id_t id) const
{
    return(id < m_up_limit_id);
}

所以這裏lock_sec_rec_cons_read_sees方法若是返回true，那麼是必定可見的，返回false的話未必不可見，但下一步就須要利用聚簇索引來獲取可見版本的數據了。
在這以前InnoDB會先利用ICP(Index Push Down)根據索引信息來判斷搜索條件是否知足，若是不知足那也不必再去聚簇索引中取了；若ICP判斷出符合條件，則會走到row_sel_get_clust_rec_for_mysql方法中去聚簇索引中取可見版本數據。

4. 總結

本文經過InnoDB源碼，介紹了Read View的基本數據結構和概念以及InnoDB中是如何經過建立的Read View來判斷可見性。實際上Read View就是一個活躍事務的快照，而且RC和RR隔離級別都複用了一樣結構的Read View來判斷可見性，不一樣的是Read View的生命週期根據相應的隔離級別而有所不一樣。對於不可見的修改，InnoDB經過undo信息重建以前版本的數據直至數據可見。