STL hashtable閱讀記錄

時間 2019-12-12

原文原文鏈接

unordered_map,unordered_set等相關內容總結：

unordered_map和unordered_set是在開發過程當中常見的stl數據結構。其本質是hashtable。在SGI_STL中，hashtable解決衝突的辦法是拉鍊法。下面是一些對STL中堆hashtable中有關代碼閱讀的一些記錄。

與之相關聯的幾個文件

gcc/ libstdc++-v3/ include/ bits/hashtable.h

gcc/ libstdc++-v3/ include/ bits/hashtable_policy.h

gcc/ libstdc++-v3/ include/ bits/functional_hash.h

gcc/ libstdc++-v3/ src/ c++11/hashtable_c++0x.cc

第一部分：基本數據組織說明

hashtable的重要數據組織成員及輔助理解圖（一個可能的hash數據結構狀況）

_Bucket[] _M_buckets

_Hash_node_base _M_before_begin

size_type _M_bucket_count // 初始桶編號爲1

size_type _M_element_count

由上圖能夠大體總結出在SGI-STL中hashtable數據的組織。幾個要點:

①.hashtable其實是維護了一個單鏈標。其頭節點是一個特殊的成員_M_before_begin，其沒有實際的數據。按照名字來理解，是第一個數據的前一個節點。

看一下對應的相關注釋：

The non-empty buckets contain the node before the first node,this design makes it possible to implement somethiing like a std::forward_list::insert_after on a container insertion and std::forward_list::erase_afer on container erase calls._M_before_begin is equivalent to std::forward_list_list::before_begin.Note that one of the non-empty buckets contains &_M_before_begin which is not a dereferenceable node so the node pointer ina bucket shall never be dereferenced ,only its next node can be.

②._M_buckets這個二級指針的做用是什麼，其目的是爲了通用的可使用_M_next()函數調用拿到每一個桶的頭節點（對於上圖就是A，D）。好比_M_buckets[1]._M_next就是桶號爲1的頭結點A。_M_buckets[2]._M_next就是桶號爲2的數據結構的頭結點D。同時對於上圖示例，應當有_M_before_begin和_M_buckets[1]是相等的（見黑色實線）

第二部分：幾個常見接口的調用流程及調用棧

hashtable的定義

一些方便查閱的 using和typedef

①.插入的key是如何生成的

1.對於簡單類型（int等）：key計算的相關代碼 const key_type& __k = this->_M_extract()(__node->_M_v())

計算key值是經過_M_extrace()來計算的，其是模板參數中的_ExtraceKey,

對於unordered_map在其定義中其模板參數中_ExtraceKey是__detail::Select1st 當定義爲unordered_map<int,node>,insert(std::make_pair(0,node(a)); 其就返回pair中的first，即0。

對於若是定義相似於key不是簡單類型的。須要本身定義hash函數不然不能經過編譯，舉例 https://www.zhihu.com/question/30921173

struct pairhash {public:

template <typename T, typename U>

std::size_t operator()(const std::pair<T, U> &x) const

{

return std::hash<T>()(x.first) ^ std::hash<U>()(x.second);

}};

class abc {

std::unordered_map<std::pair<int,int>, int, pairhash> rules;

};

在這種狀況下。__detail::Select1st獲取到的數據就是std::pair<int,int>做爲key對應的值

②.hash值如何生成

__hash_code __code = this->_M_hash_code(__key); ======> return _M_h1()(__k); ======> 在key爲int的時候傳遞的參數是std::hash<int>

其最後會調用一個宏_Cxx_hashtable_define_trivial_hash(int)，其在文件functional_hash.h中,該文件內涵多種不一樣類型的特化hash類。該文件定義了這些簡單類型是如何hash的，下面是該宏的定義,看到這種狀況下，其hash值就是直接強轉成size_t類型。

③.如何肯定一對pair的hash桶編號 size_type __bkt = _M_bucket_index(__k,__code) ====> return ___hash_code_base::_M_bucket_index(__k,__c,_M_bucket_count); } ====> return _M_h2()(__c,__n);

其中_M_h2獲取的是模板參數_H2,對於unordered_map傳遞進來的是__detail::_Mod_range_hashing,其內容很簡單。

可見桶編號值爲hash值mod桶數量

④.rehash流程：主要思路是根據當前桶的數量和元素數量，在一個大的素數表中lowerbound查找下一步合適的桶編號。

下面是一個斷定是否須要rehash桶以及若是須要後算出下一個合理的桶數量。至於最終的真正_M_rehash_aux流程。主題是數據指針遷移，而且保證新的new_buckets結構像前圖同樣。

相關標籤/搜索