CMU數據庫（15-445）實驗2-B+樹索引實現(下+課上筆記)

時間 2021-01-27

標籤 node c++ git github 算法數據庫編程安全多線程併發欄目 SQL 简体版

原文原文鏈接

4. Index_Iterator實現

這裏就是須要實現迭代器的一些操做,好比begin、end、isend等等node

下面是對於IndexIterator的構造函數c++

template <typename KeyType, typename ValueType, typename KeyComparator>
IndexIterator<KeyType, ValueType, KeyComparator>::
IndexIterator(BPlusTreeLeafPage<KeyType, ValueType, KeyComparator> *leaf,
              int index_, BufferPoolManager *buff_pool_manager):
    leaf_(leaf), index_(index_), buff_pool_manager_(buff_pool_manager) {}

1. 首先咱們來看begin函數的實現

利用key值找到葉子結點
而後獲取當前key值的index就是begin的位置

INDEX_TEMPLATE_ARGUMENTS
INDEXITERATOR_TYPE BPLUSTREE_TYPE::Begin(const KeyType &key) {
  auto leaf = reinterpret_cast<BPlusTreeLeafPage<KeyType, ValueType,KeyComparator> *>(FindLeafPage(key, false));
  int index = 0;
  if (leaf != nullptr) {
    index = leaf->KeyIndex(key, comparator_);
  }
  return IndexIterator<KeyType, ValueType, KeyComparator>(leaf, index, buffer_pool_manager_);
}

2. end函數的實現

找到最開始的結點
而後一直向後遍歷直到nextPageId=-1結束
這裏注意須要重載!=和==

end函數git

INDEX_TEMPLATE_ARGUMENTS
INDEXITERATOR_TYPE BPLUSTREE_TYPE::end() {
  KeyType key{};
  auto leaf= reinterpret_cast<BPlusTreeLeafPage<KeyType, ValueType,KeyComparator> *>( FindLeafPage(key, true));
  page_id_t new_page;
  while(leaf->GetNextPageId()!=INVALID_PAGE_ID){
    new_page=leaf->GetNextPageId();
    leaf=reinterpret_cast<BPlusTreeLeafPage<KeyType, ValueType,KeyComparator> *>(buffer_pool_manager_->FetchPage(new_page));
  }
  buffer_pool_manager_->UnpinPage(new_page,false);
  return IndexIterator<KeyType, ValueType, KeyComparator>(leaf, leaf->GetSize(), buffer_pool_manager_);
}

==和 !=函數github

bool operator==(const IndexIterator &itr) const {
  return this->index_==itr.index_&&this->leaf_==itr.leaf_;
}

bool operator!=(const IndexIterator &itr) const {
  return !this->operator==(itr);
}

3. 重載++和*(解引用符號)

重載++

簡單的index++而後設置nextPageId便可算法

template <typename KeyType, typename ValueType, typename KeyComparator>
IndexIterator<KeyType, ValueType, KeyComparator> &IndexIterator<KeyType, ValueType, KeyComparator>::
operator++() {
//
 // std::cout<<"++"<<std::endl;
  ++index_;
  if (index_ == leaf_->GetSize() && leaf_->GetNextPageId() != INVALID_PAGE_ID) {
    // first unpin leaf_, then get the next leaf
    page_id_t next_page_id = leaf_->GetNextPageId();

    auto *page = buff_pool_manager_->FetchPage(next_page_id);
    if (page == nullptr) {
      throw Exception("all page are pinned while IndexIterator(operator++)");
    }
    // first acquire next page, then release previous page
    page->RLatch();

    buff_pool_manager_->FetchPage(leaf_->GetPageId())->RUnlatch();
    buff_pool_manager_->UnpinPage(leaf_->GetPageId(), false);
    buff_pool_manager_->UnpinPage(leaf_->GetPageId(), false);

    auto next_leaf =reinterpret_cast<BPlusTreeLeafPage<KeyType, ValueType,KeyComparator> *>(page->GetData());
    assert(next_leaf->IsLeafPage());
    index_ = 0;
    leaf_ = next_leaf;
  }
  return *this;
};

重載*

return array[index]便可數據庫

template <typename KeyType, typename ValueType, typename KeyComparator>
const MappingType &IndexIterator<KeyType, ValueType, KeyComparator>::
operator*() {
  if (isEnd()) {
    throw "IndexIterator: out of range";
  }
  return leaf_->GetItem(index_);
}

5. 併發機制的實現

0. 首先複習一下讀寫🔒機制

讀操做是能夠多個進程之間共享latch的而寫操做則必須互斥
加入MaxReader數就是爲了防止等待的⌛️寫進程飢餓

首先來看若是沒有🔒機制多線程會發生什麼問題編程

線程T1想要刪除44。
線程T2 想要查找41

假設T2在執行到D位置的時候又切換到線程T1
這個時候T1進行從新分配，會把41借到I結點上
T1執行完成切換回T2這時候T2再去原來的執行尋找41就會找不到

就會出現下面的狀況。❓安全

由此咱們須要讀寫🔒的存在多線程

對於find操做

因爲咱們是隻讀操做，因此咱們到下一個結點的時候就能夠釋放上一個結點的Latch併發

剩下的操做都是同樣的

對於delete則不同

由於咱們須要寫操做

這裏咱們不能釋放結點A的Latch。由於咱們的刪除操做可能會合並根節點。

到D的時候。咱們會發現D中的38刪除以後不須要進行合併，因此對於A和B的寫Write是能夠安全釋放了

對於Insert操做

這裏咱們就能夠安全的釋放掉A的鎖。由於B中還有空位，咱們插入是不會對A形成影響的

當咱們執行到D這裏發現D中已經滿了。因此此時咱們不會釋放B的鎖，由於咱們會對B進行寫操做

上面的算法雖然是正確的可是有瓶頸問題。因爲只有一個線程能夠得到寫Latch。而插入和刪除的時候都須要對頭結點加寫Latch。因此多線程在有許多個插入或者刪除操做的時候，性能就會大打折扣

這裏要引入樂觀🔒

樂觀的假設大部分操做是不須要進行合併和分裂的。所以在咱們向下的時候都是讀Latch而不是寫Latch。只有在葉子結點纔是write Latch

從上到下都是讀Latch。並且逐步釋放
到葉子結點須要修改的時候才爲寫Latch。這個刪除是安全的因此直接結束

當咱們到最後一步發現不安全的時候。則須要像上面咱們沒有引入樂觀🔒的時候同樣。從新執行一遍

B-Link Tree簡介

延遲更新父結點

這裏用一個🌟來標記這裏須要被更新可是尚未執行

這個時候咱們執行其餘操做也是正確的好比查找31

這裏咱們執行insert 33

當執行到結點C的時候。由於這個時候有另外一個線程持有了write Latch。因此這個時候🌟操做要執行。隨後在插入33

最後一點補充關於掃描操做的

線程1在C結點上持有write Latch
線程2已經掃描完告終點B想要得到結點C的read Latch

這時候會發生問題，由於線程2沒法拿到read Latch

這裏有幾種解決方法

能夠等到T1的寫操做完成
能夠從新執行T2
能夠直接讓線程T2中止搶得這個Latch。

注意這裏的Latch和Lock並不同

1. 輔助函數`UnlockUnpinPages`的實現

若是是讀操做則釋放read鎖
不然釋放write鎖

INDEX_TEMPLATE_ARGUMENTS
void BPLUSTREE_TYPE::
UnlockUnpinPages(Operation op, Transaction *transaction) {
  if (transaction == nullptr) {
    return;
  }

  for (auto page:*transaction->GetPageSet()) {
    if (op == Operation::READ) {
      page->RUnlatch();
      buffer_pool_manager_->UnpinPage(page->GetPageId(), false);
    } else {
      page->WUnlatch();
      buffer_pool_manager_->UnpinPage(page->GetPageId(), true);
    }
  }
  transaction->GetPageSet()->clear();

  for (const auto &page_id: *transaction->GetDeletedPageSet()) {
    buffer_pool_manager_->DeletePage(page_id);
  }
  transaction->GetDeletedPageSet()->clear();

  // if root is locked, unlock it

  node_mutex_.unlock();
  }

四個自帶的解鎖和上鎖操做

/** Acquire the page write latch. */
inline void WLatch() { rwlatch_.WLock(); }

/** Release the page write latch. */
inline void WUnlatch() { rwlatch_.WUnlock(); }

/** Acquire the page read latch. */
inline void RLatch() { rwlatch_.RLock(); }

/** Release the page read latch. */
inline void RUnlatch() { rwlatch_.RUnlock(); }

這裏的rwlatch是本身實現的讀寫鎖類下面來探究一下這個類

因爲c++ 併發編程我如今還不太會。。。因此就簡單看一下啦後面學完併發編程再補充

WLock函數
1. 首先獲取一個鎖
2. 用一個記號writer_entered表示是否有寫操做
3. 若是以前已經有了如今的操做就須要等(這個線程處於阻塞狀態)
4. 當前若是有其餘線程執行讀操做。則仍須要阻塞(別人讀的時候你不能寫)
```
void WLock() {
  std::unique_lock<mutex_t> latch(mutex_);
  while (writer_entered_) {
    reader_.wait(latch);
  }
  writer_entered_ = true;
  while (reader_count_ > 0) {
    writer_.wait(latch);
  }
}
```

WunLock函數

寫標記置爲false
而後通知全部的線程

void WUnlock() {
  std::lock_guard<mutex_t> guard(mutex_);
  writer_entered_ = false;
  reader_.notify_all();
}

RLock函數
1. 若是當前有人在寫或者已經有最多的人讀了則阻塞
2. 不然只須要讓讀的計數++
由於是容許多個線程一塊兒讀這樣並不會出錯
```
void RLock() {
  std::unique_lock<mutex_t> latch(mutex_);
  while (writer_entered_ || reader_count_ == MAX_READERS) {
    reader_.wait(latch);
  }
  reader_count_++;
}
```

RUnLatch函數

計數--
若是當前有人在寫而且無人讀的話須要通知全部其餘線程
若是在計數--以前達到了最大讀數，釋放這個鎖以後須要通知其餘線程，如今又能夠讀了。

void RUnlock() {
  std::lock_guard<mutex_t> guard(mutex_);
  reader_count_--;
  if (writer_entered_) {
    if (reader_count_ == 0) {
      writer_.notify_one();
    }
  } else {
    if (reader_count_ == MAX_READERS - 1) {
      reader_.notify_one();
    }
  }
}