last modified time:2014-11-9 14:07:00css
bullet 是一款開源物理引擎,它提供了碰撞檢測、重力模擬等功能,不少3D遊戲、3D設計軟件(如3D Mark)使用它做爲物理引擎。node
做爲物理引擎,對速度的要求是很是苛刻的;bullet項目之因此可以發展到今天,很大程度取決於它在速度上優異的表現。c++
翻閱bullet的源碼就能看到不少源碼級別的優化,本文將介紹的HashMap就是一個典例。算法
bullet項目首頁:http://bulletphysics.org/
數組
注:bullet不少函數定義了Debug版和Release版兩個版本,本文僅以Release版爲例。數據結構
///The btAlignedAllocator is a portable class for aligned memory allocations. ///Default implementations for unaligned and aligned allocations can be overridden by a custom allocator // using btAlignedAllocSetCustom and btAlignedAllocSetCustomAligned. template < typename T , unsigned Alignment > class btAlignedAllocator { typedef btAlignedAllocator< T , Alignment > self_type; public: //just going down a list: btAlignedAllocator() {} /* btAlignedAllocator( const self_type & ) {} */ template < typename Other > btAlignedAllocator( const btAlignedAllocator< Other , Alignment > & ) {} typedef const T* const_pointer; typedef const T& const_reference; typedef T* pointer; typedef T& reference; typedef T value_type; pointer address ( reference ref ) const { return &ref; } const_pointer address ( const_reference ref ) const { return &ref; } pointer allocate ( size_type n , const_pointer * hint = 0 ) { (void)hint; return reinterpret_cast< pointer >(btAlignedAlloc( sizeof(value_type) * n , Alignment )); } void construct ( pointer ptr , const value_type & value ) { new (ptr) value_type( value ); } void deallocate( pointer ptr ) { btAlignedFree( reinterpret_cast< void * >( ptr ) ); } void destroy ( pointer ptr ) { ptr->~value_type(); } template < typename O > struct rebind { typedef btAlignedAllocator< O , Alignment > other; }; template < typename O > self_type & operator=( const btAlignedAllocator< O , Alignment > & ) { return *this; } friend bool operator==( const self_type & , const self_type & ) { return true; } };
void* btAlignedAllocInternal (size_t size, int alignment); void btAlignedFreeInternal (void* ptr); #define btAlignedAlloc(size,alignment) btAlignedAllocInternal(size,alignment) #define btAlignedFree(ptr) btAlignedFreeInternal(ptr)而btAlignedAllocInternal/btAlignedFreeInternal及其定製化的實現爲:
static btAlignedAllocFunc *sAlignedAllocFunc = btAlignedAllocDefault; static btAlignedFreeFunc *sAlignedFreeFunc = btAlignedFreeDefault; void btAlignedAllocSetCustomAligned(btAlignedAllocFunc *allocFunc, btAlignedFreeFunc *freeFunc) { sAlignedAllocFunc = allocFunc ? allocFunc : btAlignedAllocDefault; sAlignedFreeFunc = freeFunc ? freeFunc : btAlignedFreeDefault; } void* btAlignedAllocInternal (size_t size, int alignment) { gNumAlignedAllocs++; // 和gNumAlignedFree結合用來檢查內存泄露 void* ptr; ptr = sAlignedAllocFunc(size, alignment); // printf("btAlignedAllocInternal %d, %x\n",size,ptr); return ptr; } void btAlignedFreeInternal (void* ptr) { if (!ptr) { return; } gNumAlignedFree++; // 和gNumAlignedAllocs 結合用來檢查內存泄露 // printf("btAlignedFreeInternal %x\n",ptr); sAlignedFreeFunc(ptr); }
// The developer can let all Bullet memory allocations go through a custom memory allocator, using btAlignedAllocSetCustom void btAlignedAllocSetCustom(btAllocFunc *allocFunc, btFreeFunc *freeFunc); // If the developer has already an custom aligned allocator, then btAlignedAllocSetCustomAligned can be used. // The default aligned allocator pre-allocates extra memory using the non-aligned allocator, and instruments it. void btAlignedAllocSetCustomAligned(btAlignedAllocFunc *allocFunc, btAlignedFreeFunc *freeFunc);
不管是否認制本身的Alloc/Free(或AllignedAlloc/AlignedFree),bullet內的其餘數據結構都使用btAlignedAllocator做爲內存分配(回收)的接口。隨後將會看到,btAlignedAllocator的定製化設計與std::allocator的不一樣,文末詳細討論。less
btAlignedAllocator除了定製化與std::allocator不一樣外,還增長了內存對齊功能(從它的名字也能看得出來)。繼續查看btAlignedAllocDefault/btAlignedFreeDefault的定義(btAlignedAllocator.{h|cpp})能夠看到:ide
#if defined (BT_HAS_ALIGNED_ALLOCATOR) #include <malloc.h> static void *btAlignedAllocDefault(size_t size, int alignment) { return _aligned_malloc(size, (size_t)alignment); // gcc 提供了 } static void btAlignedFreeDefault(void *ptr) { _aligned_free(ptr); } #elif defined(__CELLOS_LV2__) #include <stdlib.h> static inline void *btAlignedAllocDefault(size_t size, int alignment) { return memalign(alignment, size); } static inline void btAlignedFreeDefault(void *ptr) { free(ptr); } #else // 當前編譯環境沒有 對齊的(aligned)內存分配函數 static inline void *btAlignedAllocDefault(size_t size, int alignment) { void *ret; char *real; real = (char *)sAllocFunc(size + sizeof(void *) + (alignment-1)); // 1. 多分配一點內存 if (real) { ret = btAlignPointer(real + sizeof(void *),alignment); // 2. 指針調整 *((void **)(ret)-1) = (void *)(real); // 3. 登記實際地址 } else { ret = (void *)(real); } return (ret); } static inline void btAlignedFreeDefault(void *ptr) { void* real; if (ptr) { real = *((void **)(ptr)-1); // 取出實際內存塊 地址 sFreeFunc(real); } } #endif
bullet自己也實現了一個對齊的(aligned)內存分配函數,在系統沒有對齊的內存分配函數的狀況下,也能保證btAlignedAllocator::acllocate返回的地址是按特定字節對齊的。函數
下面就來分析btAlignedAllocDefault / btAlignedFreeDefault是如何實現aligned allocation / free的。sAllocFunc/sFreeFunc的定義及初始化:佈局
static void *btAllocDefault(size_t size) { return malloc(size); } static void btFreeDefault(void *ptr) { free(ptr); } static btAllocFunc *sAllocFunc = btAllocDefault; static btFreeFunc *sFreeFunc = btFreeDefault;
bullet同時提供了,AllocFunc/FreeFunc的定製化:
void btAlignedAllocSetCustom(btAllocFunc *allocFunc, btFreeFunc *freeFunc) { sAllocFunc = allocFunc ? allocFunc : btAllocDefault; sFreeFunc = freeFunc ? freeFunc : btFreeDefault; }默認狀況下sAllocFunc/sFreeFunc就是malloc/free,btAlignedAllocDefault中可能使人疑惑的是——爲何要多分配一點內存?後面的btAlignPointer有什麼用?
再來看看bullet是如何實現指針對齊的(btScalar.h):
///align a pointer to the provided alignment, upwards template <typename T>T* btAlignPointer(T* unalignedPtr, size_t alignment) { struct btConvertPointerSizeT { union { T* ptr; size_t integer; }; }; btConvertPointerSizeT converter; const size_t bit_mask = ~(alignment - 1); converter.ptr = unalignedPtr; converter.integer += alignment-1; converter.integer &= bit_mask; return converter.ptr; }
接下來分析btAlignPointer是如何調整指針的?
實際調用btAlignPointer時,使用的alignment都是2的指數,如btAlignedObjectArray使用的是16,下面就以16進行分析。
先假設unalignedPtr是alignment(16)的倍數,則converter.integer += alignment-1; 再 converter.integer &= bit_mask以後,unalignedPtr的值不變,仍是alignment(16)的倍數。
再假設unalignedPtr不是alignment(16)的倍數,則converter.integer += alignment-1; 再converter.integer &= bit_mask以後,unalignedPtr的值將被上調到alignment(16)的倍數。
因此btAlignPointer可以將unalignedPtr對齊到alignment倍數。】
明白了btAlignPointer的做用,天然可以明白btAlignedAllocDefault中爲何多申請一點內存,申請的大小是size + sizeof(void *) + (alignment-1):
若是sAllocFunc返回的地址已經按照alignment對齊,則sizeof(void*)和sizeof(alignment-1)及btAlignedAllocDefault的返回值關係以下圖所示:
void*前面的alignment-sizeof(void*)字節和尾部的sizeof(size)-1字節的內存會被浪費,不過很小(相對內存條而言)管他呢!
若是sAllocFunc返回的地址沒能按alignment對齊,則sizeof(void*)和sizeof(alignment-1)及btAlignedAllocDefault的返回值關係以下圖所示:
PS: 順便一提,爲何須要內存對齊?簡單地說,按照機器字長倍數對齊的內存,CPU訪問的速度更快;具體來講,則要根據具體CPU和總線控制器的廠商文檔來討論的,那將涉及不少平臺、硬件細節,因此本文不對該話題着墨太多。
btAlignedObjectArray的做用與STL的vector相似(如下稱std::vector),都是動態數組,btAlignedObjectArray的數據成員(data member)聲明以下:
template <typename T> class btAlignedObjectArray { btAlignedAllocator<T , 16> m_allocator; // 沒有data member,不會增長內存 int m_size; int m_capacity; T* m_data; //PCK: added this line bool m_ownsMemory; // ... 省略 };
btAlignedObjectArray同時封裝了QuickSort,HeapSort,BinarySearch,LinearSearch函數,可用於排序、查找,btAlignedObjectArray的全部成員函數(member function)定義以下:
template <typename T> //template <class T> class btAlignedObjectArray { btAlignedAllocator<T , 16> m_allocator; int m_size; int m_capacity; T* m_data; //PCK: added this line bool m_ownsMemory; #ifdef BT_ALLOW_ARRAY_COPY_OPERATOR public: SIMD_FORCE_INLINE btAlignedObjectArray<T>& operator=(const btAlignedObjectArray<T> &other); #else//BT_ALLOW_ARRAY_COPY_OPERATOR private: SIMD_FORCE_INLINE btAlignedObjectArray<T>& operator=(const btAlignedObjectArray<T> &other); #endif//BT_ALLOW_ARRAY_COPY_OPERATOR protected: SIMD_FORCE_INLINE int allocSize(int size); SIMD_FORCE_INLINE void copy(int start,int end, T* dest) const; SIMD_FORCE_INLINE void init(); SIMD_FORCE_INLINE void destroy(int first,int last); SIMD_FORCE_INLINE void* allocate(int size); SIMD_FORCE_INLINE void deallocate(); public: btAlignedObjectArray(); ~btAlignedObjectArray(); ///Generally it is best to avoid using the copy constructor of an btAlignedObjectArray, // and use a (const) reference to the array instead. btAlignedObjectArray(const btAlignedObjectArray& otherArray); /// return the number of elements in the array SIMD_FORCE_INLINE int size() const; SIMD_FORCE_INLINE const T& at(int n) const; SIMD_FORCE_INLINE T& at(int n); SIMD_FORCE_INLINE const T& operator[](int n) const; SIMD_FORCE_INLINE T& operator[](int n); ///clear the array, deallocated memory. Generally it is better to use array.resize(0), // to reduce performance overhead of run-time memory (de)allocations. SIMD_FORCE_INLINE void clear(); SIMD_FORCE_INLINE void pop_back(); ///resize changes the number of elements in the array. If the new size is larger, // the new elements will be constructed using the optional second argument. ///when the new number of elements is smaller, the destructor will be called, // but memory will not be freed, to reduce performance overhead of run-time memory (de)allocations. SIMD_FORCE_INLINE void resizeNoInitialize(int newsize); SIMD_FORCE_INLINE void resize(int newsize, const T& fillData=T()); SIMD_FORCE_INLINE T& expandNonInitializing( ); SIMD_FORCE_INLINE T& expand( const T& fillValue=T()); SIMD_FORCE_INLINE void push_back(const T& _Val); /// return the pre-allocated (reserved) elements, this is at least // as large as the total number of elements,see size() and reserve() SIMD_FORCE_INLINE int capacity() const; SIMD_FORCE_INLINE void reserve(int _Count); class less { public: bool operator() ( const T& a, const T& b ) { return ( a < b ); } }; template <typename L> void quickSortInternal(const L& CompareFunc,int lo, int hi); template <typename L> void quickSort(const L& CompareFunc); ///heap sort from http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Sort/Heap/ template <typename L> void downHeap(T *pArr, int k, int n, const L& CompareFunc); void swap(int index0,int index1); template <typename L> void heapSort(const L& CompareFunc); ///non-recursive binary search, assumes sorted array int findBinarySearch(const T& key) const; int findLinearSearch(const T& key) const; void remove(const T& key); //PCK: whole function void initializeFromBuffer(void *buffer, int size, int capacity); void copyFromArray(const btAlignedObjectArray& otherArray); };
btAlignedObjectArray和std::vector相似,各成員函數的具體實現這裏再也不列出。
btHashMap的內存佈局與咱們常見的HashMap的內存佈局大相徑庭,爲了和btHashMap的內存佈局對比,這裏先介紹一下std::unordered_map的內存佈局。
GCC中std::unordered_map僅是對_Hahstable的簡單包裝,_Hashtable的數據成員定義以下:
__bucket_type* _M_buckets; size_type _M_bucket_count; __before_begin _M_bbegin; size_type _M_element_count; _RehashPolicy _M_rehash_policy;其中,size_type爲std::size_t的typedef;而_RehashPlolicy是具體的策略類,只有成員函數定義,沒有數據成員(這是一種被稱做Policy Based的設計範式,具體可參閱《Modern C++ Design》,中譯本名爲《C++設計新思惟》,由侯捷先生翻譯)。
繼續跟蹤_bucket_type,能夠看到(_Hashtable):
using __bucket_type = typename __hashtable_base::__bucket_type;和(__hashtable_base):
using __node_base = __detail::_Hash_node_base; using __bucket_type = __node_base*;
至此,才知道_M_buckets的類型爲:_Hash_node_base**
繼續追蹤,能夠看到_Hash_node_base的定義:/** * struct _Hash_node_base * * Nodes, used to wrap elements stored in the hash table. A policy * template parameter of class template _Hashtable controls whether * nodes also store a hash code. In some cases (e.g. strings) this * may be a performance win. */ struct _Hash_node_base { _Hash_node_base* _M_nxt; _Hash_node_base() : _M_nxt() { } _Hash_node_base(_Hash_node_base* __next) : _M_nxt(__next) { } };
從_Hashtable::_M_buckets(二維指針)和_Hash_node_base的_M_nxt的類型(指針),能夠猜想Hashtable的內存佈局——buckets數組存放hash值相同的node鏈表的頭指針,每一個bucket上掛着一個鏈表。
繼續看__before_begin的類型(_Hashtable):
using __before_begin = __detail::_Before_begin<_Node_allocator_type>;繼續跟蹤:
/** * This type is to combine a _Hash_node_base instance with an allocator * instance through inheritance to benefit from EBO when possible. */ template<typename _NodeAlloc> struct _Before_begin : public _NodeAlloc { _Hash_node_base _M_node; _Before_begin(const _Before_begin&) = default; _Before_begin(_Before_begin&&) = default; template<typename _Alloc> _Before_begin(_Alloc&& __a) : _NodeAlloc(std::forward<_Alloc>(__a)) { } };根據對STL雙鏈表std::list的瞭解,能夠猜想Berfore_begin的做用,極可能和雙鏈表的「頭部的多餘的一個節點」相似,只是爲了方便迭代器(iterator)迭代,經過_Hashtable::begin()能夠獲得驗證:
iterator begin() noexcept { return iterator(_M_begin()); } __node_type* _M_begin() const { return static_cast<__node_type*>(_M_before_begin()._M_nxt); } const __node_base& _M_before_begin() const { return _M_bbegin._M_node; }
實際存放Value的node類型爲下面兩種的其中一種(按Hash_node_base的註釋,Key爲string時可能會用第一種,以提高性能):
/** * Specialization for nodes with caches, struct _Hash_node. * * Base class is __detail::_Hash_node_base. */ template<typename _Value> struct _Hash_node<_Value, true> : _Hash_node_base { _Value _M_v; std::size_t _M_hash_code; template<typename... _Args> _Hash_node(_Args&&... __args) : _M_v(std::forward<_Args>(__args)...), _M_hash_code() { } _Hash_node* _M_next() const { return static_cast<_Hash_node*>(_M_nxt); } }; /** * Specialization for nodes without caches, struct _Hash_node. * * Base class is __detail::_Hash_node_base. */ template<typename _Value> struct _Hash_node<_Value, false> : _Hash_node_base { _Value _M_v; template<typename... _Args> _Hash_node(_Args&&... __args) : _M_v(std::forward<_Args>(__args)...) { } _Hash_node* _M_next() const { return static_cast<_Hash_node*>(_M_nxt); } };下面經過insert源碼的追蹤,證明咱們對hashtable內存佈局的猜測:
_Hashtable::insert:
template<typename _Pair, typename = _IFconsp<_Pair>> __ireturn_type insert(_Pair&& __v) { __hashtable& __h = this->_M_conjure_hashtable(); return __h._M_emplace(__unique_keys(), std::forward<_Pair>(__v)); }
_Hashtable::_M_emplace(返回值類型寫得太複雜,已刪除):
_M_emplace(std::true_type, _Args&&... __args) { // First build the node to get access to the hash code __node_type* __node = _M_allocate_node(std::forward<_Args>(__args)...); // 申請鏈表節點 __args爲 pair<Key, Value> 類型 const key_type& __k = this->_M_extract()(__node->_M_v); // 從節點中抽取 key __hash_code __code; __try { __code = this->_M_hash_code(__k); } __catch(...) { _M_deallocate_node(__node); __throw_exception_again; } size_type __bkt = _M_bucket_index(__k, __code); // 尋找buckets上的對應hash code對應的index if (__node_type* __p = _M_find_node(__bkt, __k, __code)) // 在bucket所指鏈表上找到實際節點 { // There is already an equivalent node, no insertion _M_deallocate_node(__node); return std::make_pair(iterator(__p), false); } // Insert the node return std::make_pair(_M_insert_unique_node(__bkt, __code, __node), true); }
__node_type* _M_find_node(size_type __bkt, const key_type& __key, __hash_code __c) const { __node_base* __before_n = _M_find_before_node(__bkt, __key, __c); if (__before_n) return static_cast<__node_type*>(__before_n->_M_nxt); return nullptr; }
_M_find_before_node(size_type __n, const key_type& __k, __hash_code __code) const { __node_base* __prev_p = _M_buckets[__n]; // 取出頭指針 if (!__prev_p) return nullptr; __node_type* __p = static_cast<__node_type*>(__prev_p->_M_nxt); for (;; __p = __p->_M_next()) // 遍歷鏈表 { if (this->_M_equals(__k, __code, __p)) // key匹配? return __prev_p; if (!__p->_M_nxt || _M_bucket_index(__p->_M_next()) != __n) break; __prev_p = __p; } return nullptr; }
看到_Hashtable::_M_find_before_node的代碼,就驗證了此前咱們對於Hashtable內存佈局的猜測:這和SGI hash_map的實現體hashtable的內存佈局相同(詳情可參考《STL源碼剖析》,侯捷先生著)。
(PS:追蹤起來並不輕鬆,能夠藉助Eclipse等集成開發環境進行)
例如,std::unordered_map<int, int*>背後的Hashtable的一種可能的內存佈局以下:
std::unordered_map的內存佈局是大多數<數據結構>、<算法>類教材給出的「標準作法」,也是比較常見的實現方法。
btHashMap的內存佈局,與「標準作法」大相徑庭,以下可見btHashMap的數據成員(data member)定義:
template <class Key, class Value> class btHashMap { protected: btAlignedObjectArray<int> m_hashTable; btAlignedObjectArray<int> m_next; btAlignedObjectArray<Value> m_valueArray; btAlignedObjectArray<Key> m_keyArray; // ... 省略 };能夠看到,btHashMap的將buckets和key, value全放在一塊兒,它的內存佈局可能以下:
接下來經過分析btHashMap的幾個方法的實現,來肯定btHashMap三個btAlignedObjectArray的具體做用。
下面來看看btHashMap::findIndex的實現:
int findIndex(const Key& key) const { unsigned int hash = key.getHash() & (m_valueArray.capacity()-1); // 依賴 Key::getHash() if (hash >= (unsigned int)m_hashTable.size()) { return BT_HASH_NULL; } int index = m_hashTable[hash]; // index至關於unordered_map的buckets[hash]的鏈表頭指針 while ((index != BT_HASH_NULL) && key.equals(m_keyArray[index]) == false) // 遍歷鏈表,直到匹配,依賴 Key::equals(Key) { index = m_next[index]; } return index; }btHashMap::findIndex用到了m_hashTable,它的做用相似於unordered_map的buckets數組;m_next則相似於unordered_map鏈表節點的next指針。
接下來看看btHashMap::insert:
void insert(const Key& key, const Value& value) { int hash = key.getHash() & (m_valueArray.capacity()-1); //replace value if the key is already there int index = findIndex(key); // 找到了<Key, Value>節點 if (index != BT_HASH_NULL) { m_valueArray[index]=value; // 找到了,更行value return; } int count = m_valueArray.size(); // 當前已填充數目 int oldCapacity = m_valueArray.capacity(); m_valueArray.push_back(value); // value壓入m_valueArray的尾部,capacity可能增加 m_keyArray.push_back(key); // key壓入m_keyArray的尾部 int newCapacity = m_valueArray.capacity(); if (oldCapacity < newCapacity) { growTables(key); // 若是增加,調整其他兩個數組的大小,並調整頭指針所在位置 //hash with new capacity hash = key.getHash() & (m_valueArray.capacity()-1); } m_next[count] = m_hashTable[hash]; // 連同下一行,將新節點插入 m_hashTable[hash]鏈表頭部 m_hashTable[hash] = count; }
這裏驗證了咱們對於m_hashTables和m_next做用的斷言。
btHashMap與普通Hash表的區別在於,它可能要本身管理節點內存;好比,中間節點remove掉以後,如何保證下次insert可以複用節點內存?經過btHashMap::remove能夠知道bullet是如何實現的:
void remove(const Key& key) { int hash = key.getHash() & (m_valueArray.capacity()-1); int pairIndex = findIndex(key); // 找到<Key, Value>的 index if (pairIndex ==BT_HASH_NULL) { return; } // Remove the pair from the hash table. int index = m_hashTable[hash]; // 取出頭指針 btAssert(index != BT_HASH_NULL); int previous = BT_HASH_NULL; while (index != pairIndex) // 找index的前驅 { previous = index; index = m_next[index]; } if (previous != BT_HASH_NULL) // 將當前節點從鏈表上刪除 { btAssert(m_next[previous] == pairIndex); m_next[previous] = m_next[pairIndex]; // 當前節點位於鏈表中間 } else { m_hashTable[hash] = m_next[pairIndex]; // 當前節點是鏈表第一個節點 } // We now move the last pair into spot of the // pair being removed. We need to fix the hash // table indices to support the move. int lastPairIndex = m_valueArray.size() - 1; // If the removed pair is the last pair, we are done. if (lastPairIndex == pairIndex) // 若是<Key, Value>已是array的最後一個元素,則pop_back將減少size(capacity不變) { m_valueArray.pop_back(); m_keyArray.pop_back(); return; } // Remove the last pair from the hash table. 將最後一個<Key, Value>對從array上移除 int lastHash = m_keyArray[lastPairIndex].getHash() & (m_valueArray.capacity()-1); index = m_hashTable[lastHash]; btAssert(index != BT_HASH_NULL); previous = BT_HASH_NULL; while (index != lastPairIndex) { previous = index; index = m_next[index]; } if (previous != BT_HASH_NULL) { btAssert(m_next[previous] == lastPairIndex); m_next[previous] = m_next[lastPairIndex]; } else { m_hashTable[lastHash] = m_next[lastPairIndex]; } // Copy the last pair into the remove pair's spot. 將最後一個<Key, Value>拷貝到移除pair的空當處 m_valueArray[pairIndex] = m_valueArray[lastPairIndex]; m_keyArray[pairIndex] = m_keyArray[lastPairIndex]; // Insert the last pair into the hash table , 將移除節點插入到m_hashTable[lastHash]鏈表的頭部 m_next[pairIndex] = m_hashTable[lastHash]; m_hashTable[lastHash] = pairIndex; m_valueArray.pop_back(); m_keyArray.pop_back(); }
btHashMap的這種設計,可以保證整個Hash表內存的緊密(連續)性;而這種連續性的好處主要在於:
第一,能與數組(指針)式API兼容,好比不少OpenGL API。由於存在btHashMap內的Value和Key在內存上都是連續的,因此這一點很好理解;
第二,保證了cache命中率(表元素較少時)。因爲普通鏈表的節點內存是在每次須要時才申請的,因此基本上不會連續,一般不在相同內存頁。因此,即使是短期內屢次訪問鏈表節點,也可能因爲節點內存分散形成不能將全部節點放入cache,從而致使訪問速度的降低;而btHashMap的節點內存始終連續,於是保證較高的cache命中率,能帶來必定程度的性能提高。
btAlignedAllocator定製化接口與std::allocator徹底不一樣。std::allocator的思路是:首先實現allocator,而後將allocator做爲模板參數寫入具體數據結構上,如vector<int, allocator<int> >;
這種方法雖然能夠實現「定製化」,但存在着必定的問題:
第一,因爲全部標準庫的allcoator用的都是std::allocator,若是你使用了另一種allocator,程序中就可能存在不止一種類型的內存管理方法一塊兒工做的局面;特別是當標準庫使用的是SGI 當年實現的「程序退出時才歸還全部內存的」allocator(具體可參閱《STL源碼剖析》)時,內存爭用是不可避免的。
第二,這種設計無形中增長了編碼和調試的複雜性,相信調試過gcc STL代碼的人深有體會。
而btAlignedAllocator則徹底不存在這樣的問題:
第一,它的allocate/deallocate行爲經過全局的函數指針代理實現,不可能存在同時有兩個以上的類型底層管理內存的方法。
第二,使用btAlignedAllocator的數據結構,其模板參數相對簡單,編碼、調試的複雜性天然也下降了。
本人拙見,STL有點過分設計了,雖然Policy Based的設計可以帶來靈活性,但代碼的可讀性降低了不少(或許開發glibc++的那羣人沒打算讓別人看他們的代碼☺)。
文中提到了兩本書:
《Modern C++ Design》(中譯本名爲《C++設計新思惟》,侯捷先生譯),該書細緻描述了Policy Based Design。
《STL源碼剖析》(侯捷先生著),該書詳細剖析了SGI hashtable的實現。
本文所討論的源碼版本:
bullet 2.81
gcc 4.6.1(MinGW)
歡迎評論或email(xusiwei1236@163.com)交流觀點,轉載註明出處,勿作商用。