bullet HashMap 內存緊密的哈希表

時間 2019-12-05

標籤 bullet hashmap 內存緊密哈希简体版

原文原文鏈接

last modified time：2014-11-9 14:07:00css

bullet 是一款開源物理引擎，它提供了碰撞檢測、重力模擬等功能，不少3D遊戲、3D設計軟件（如3D Mark）使用它做爲物理引擎。node

做爲物理引擎，對速度的要求是很是苛刻的；bullet項目之因此可以發展到今天，很大程度取決於它在速度上優異的表現。c++

翻閱bullet的源碼就能看到不少源碼級別的優化，本文將介紹的HashMap就是一個典例。算法

bullet項目首頁：http://bulletphysics.org/
數組

注：bullet不少函數定義了Debug版和Release版兩個版本，本文僅以Release版爲例。數據結構

btAlignedAllocator的接口定義

btAlignedAllocator是bullet定義的一個內存分配器接口，bullet的其餘數據結構都使用它來管理內存。btAlignedAllocator的定義和STL的allocator（如下稱std::allocator）相似：

///The btAlignedAllocator is a portable class for aligned memory allocations.
///Default implementations for unaligned and aligned allocations can be overridden by a custom allocator 
//  using btAlignedAllocSetCustom and btAlignedAllocSetCustomAligned.
template < typename T , unsigned Alignment >
class btAlignedAllocator {
	
	typedef btAlignedAllocator< T , Alignment > self_type;
	
public:
	//just going down a list:
	btAlignedAllocator() {}
	/*
	btAlignedAllocator( const self_type & ) {}
	*/

	template < typename Other >
	btAlignedAllocator( const btAlignedAllocator< Other , Alignment > & ) {}

	typedef const T*         const_pointer;
	typedef const T&         const_reference;
	typedef T*               pointer;
	typedef T&               reference;
	typedef T                value_type;

	pointer       address   ( reference        ref ) const                           { return &ref; }
	const_pointer address   ( const_reference  ref ) const                           { return &ref; }
	pointer       allocate  ( size_type        n   , const_pointer *      hint = 0 ) {
		(void)hint;
		return reinterpret_cast< pointer >(btAlignedAlloc( sizeof(value_type) * n , Alignment ));
	}
	void          construct ( pointer          ptr , const value_type &   value    ) { new (ptr) value_type( value ); }
	void          deallocate( pointer          ptr ) {
		btAlignedFree( reinterpret_cast< void * >( ptr ) );
	}
	void          destroy   ( pointer          ptr )                                 { ptr->~value_type(); }
	

	template < typename O > struct rebind {
		typedef btAlignedAllocator< O , Alignment > other;
	};
	template < typename O >
	self_type & operator=( const btAlignedAllocator< O , Alignment > & ) { return *this; }

	friend bool operator==( const self_type & , const self_type & ) { return true; }
};

與std::allocator相似，btAlignedAllocator的allocate和deallocate分別負責申請和釋放內存空間，以release版編譯的bulletbtAlignedAlloc/btAlignedFree分別爲：

void*	btAlignedAllocInternal	(size_t size, int alignment);
	void	btAlignedFreeInternal	(void* ptr);

	#define btAlignedAlloc(size,alignment) btAlignedAllocInternal(size,alignment)
	#define btAlignedFree(ptr) btAlignedFreeInternal(ptr)

而btAlignedAllocInternal/btAlignedFreeInternal及其定製化的實現爲：

static btAlignedAllocFunc *sAlignedAllocFunc = btAlignedAllocDefault;
static btAlignedFreeFunc *sAlignedFreeFunc = btAlignedFreeDefault;

void btAlignedAllocSetCustomAligned(btAlignedAllocFunc *allocFunc, btAlignedFreeFunc *freeFunc)
{
  sAlignedAllocFunc = allocFunc ? allocFunc : btAlignedAllocDefault;
  sAlignedFreeFunc = freeFunc ? freeFunc : btAlignedFreeDefault;
}

void*	btAlignedAllocInternal	(size_t size, int alignment)
{
	gNumAlignedAllocs++; // 和gNumAlignedFree結合用來檢查內存泄露
	void* ptr;
	ptr = sAlignedAllocFunc(size, alignment);
//	printf("btAlignedAllocInternal %d, %x\n",size,ptr);
	return ptr;
}

void	btAlignedFreeInternal	(void* ptr)
{
	if (!ptr)
	{
		return;
	}

	gNumAlignedFree++; // 和gNumAlignedAllocs 結合用來檢查內存泄露
//	printf("btAlignedFreeInternal %x\n",ptr);
	sAlignedFreeFunc(ptr);
}

如上，bullet內存分配的定製操做並不複雜，只需調用如下兩個函數便可：

// The developer can let all Bullet memory allocations go through a custom memory allocator, using btAlignedAllocSetCustom
void btAlignedAllocSetCustom(btAllocFunc *allocFunc, btFreeFunc *freeFunc);

// If the developer has already an custom aligned allocator, then btAlignedAllocSetCustomAligned can be used. 
// The default aligned allocator pre-allocates extra memory using the non-aligned allocator, and instruments it.
void btAlignedAllocSetCustomAligned(btAlignedAllocFunc *allocFunc, btAlignedFreeFunc *freeFunc);

不管是否認制本身的Alloc/Free（或AllignedAlloc/AlignedFree），bullet內的其餘數據結構都使用btAlignedAllocator做爲內存分配（回收）的接口。隨後將會看到，btAlignedAllocator的定製化設計與std::allocator的不一樣，文末詳細討論。less

btAlignedAllocator的內存對齊

btAlignedAllocator除了定製化與std::allocator不一樣外，還增長了內存對齊功能（從它的名字也能看得出來）。繼續查看btAlignedAllocDefault/btAlignedFreeDefault的定義（btAlignedAllocator.{h|cpp}）能夠看到：ide

#if defined (BT_HAS_ALIGNED_ALLOCATOR)
#include <malloc.h>
static void *btAlignedAllocDefault(size_t size, int alignment)
{
	return _aligned_malloc(size, (size_t)alignment);  // gcc 提供了
}

static void btAlignedFreeDefault(void *ptr)
{
	_aligned_free(ptr);
}
#elif defined(__CELLOS_LV2__)
#include <stdlib.h>

static inline void *btAlignedAllocDefault(size_t size, int alignment)
{
	return memalign(alignment, size);
}

static inline void btAlignedFreeDefault(void *ptr)
{
	free(ptr);
}
#else // 當前編譯環境沒有 對齊的（aligned）內存分配函數
static inline void *btAlignedAllocDefault(size_t size, int alignment)
{
  void *ret;
  char *real;
  real = (char *)sAllocFunc(size + sizeof(void *) + (alignment-1)); // 1. 多分配一點內存
  if (real) {
    ret = btAlignPointer(real + sizeof(void *),alignment);      // 2. 指針調整
    *((void **)(ret)-1) = (void *)(real);                       // 3. 登記實際地址
  } else {
    ret = (void *)(real);
  }
  return (ret);
}

static inline void btAlignedFreeDefault(void *ptr)
{
  void* real;

  if (ptr) {
    real = *((void **)(ptr)-1); // 取出實際內存塊 地址
    sFreeFunc(real);
  }
}
#endif

bullet自己也實現了一個對齊的（aligned）內存分配函數，在系統沒有對齊的內存分配函數的狀況下，也能保證btAlignedAllocator::acllocate返回的地址是按特定字節對齊的。函數

下面就來分析btAlignedAllocDefault / btAlignedFreeDefault是如何實現aligned allocation / free的。sAllocFunc/sFreeFunc的定義及初始化：佈局

static void *btAllocDefault(size_t size)
{
	return malloc(size);
}

static void btFreeDefault(void *ptr)
{
	free(ptr);
}

static btAllocFunc *sAllocFunc = btAllocDefault;
static btFreeFunc *sFreeFunc = btFreeDefault;

bullet同時提供了，AllocFunc/FreeFunc的定製化：

void btAlignedAllocSetCustom(btAllocFunc *allocFunc, btFreeFunc *freeFunc)
{
  sAllocFunc = allocFunc ? allocFunc : btAllocDefault;
  sFreeFunc = freeFunc ? freeFunc : btFreeDefault;
}

默認狀況下sAllocFunc/sFreeFunc就是malloc/free，btAlignedAllocDefault中可能使人疑惑的是——爲何要多分配一點內存？後面的btAlignPointer有什麼用？

再來看看bullet是如何實現指針對齊的（btScalar.h）：

///align a pointer to the provided alignment, upwards
template <typename T>T* btAlignPointer(T* unalignedPtr, size_t alignment)
{
		
	struct btConvertPointerSizeT
	{
		union 
		{
				T* ptr;
				size_t integer;
		};
	};
    btConvertPointerSizeT converter;
    
    
	const size_t bit_mask = ~(alignment - 1);
    converter.ptr = unalignedPtr;
	converter.integer += alignment-1;
	converter.integer &= bit_mask;
	return converter.ptr;
}

接下來分析btAlignPointer是如何調整指針的？

實際調用btAlignPointer時，使用的alignment都是2的指數，如btAlignedObjectArray使用的是16，下面就以16進行分析。

先假設unalignedPtr是alignment（16）的倍數，則converter.integer += alignment-1; 再 converter.integer &= bit_mask以後，unalignedPtr的值不變，仍是alignment（16）的倍數。

再假設unalignedPtr不是alignment（16）的倍數，則converter.integer += alignment-1; 再converter.integer &= bit_mask以後，unalignedPtr的值將被上調到alignment（16）的倍數。

因此btAlignPointer可以將unalignedPtr對齊到alignment倍數。】

明白了btAlignPointer的做用，天然可以明白btAlignedAllocDefault中爲何多申請一點內存，申請的大小是size + sizeof(void *) + (alignment-1)：

若是sAllocFunc返回的地址已經按照alignment對齊，則sizeof(void*)和sizeof(alignment-1)及btAlignedAllocDefault的返回值關係以下圖所示：

void*前面的alignment-sizeof(void*)字節和尾部的sizeof(size)-1字節的內存會被浪費，不過很小（相對內存條而言）管他呢！

若是sAllocFunc返回的地址沒能按alignment對齊，則sizeof(void*)和sizeof(alignment-1)及btAlignedAllocDefault的返回值關係以下圖所示：

PS: 順便一提，爲何須要內存對齊？簡單地說，按照機器字長倍數對齊的內存，CPU訪問的速度更快；具體來講，則要根據具體CPU和總線控制器的廠商文檔來討論的，那將涉及不少平臺、硬件細節，因此本文不對該話題着墨太多。

btAlignedObjectArray——bullet的動態數組

btAlignedObjectArray的做用與STL的vector相似（如下稱std::vector），都是動態數組，btAlignedObjectArray的數據成員（data member）聲明以下：

template <typename T> 
class btAlignedObjectArray
{
	btAlignedAllocator<T , 16>	m_allocator; // 沒有data member，不會增長內存

	int					m_size;
	int					m_capacity;
	T*					m_data;
	//PCK: added this line
	bool				m_ownsMemory;
// ... 省略
};

btAlignedObjectArray同時封裝了QuickSort，HeapSort，BinarySearch，LinearSearch函數，可用於排序、查找，btAlignedObjectArray的全部成員函數（member function）定義以下：

template <typename T> 
//template <class T> 
class btAlignedObjectArray
{
	btAlignedAllocator<T , 16>	m_allocator;

	int					m_size;
	int					m_capacity;
	T*					m_data;
	//PCK: added this line
	bool				m_ownsMemory;

#ifdef BT_ALLOW_ARRAY_COPY_OPERATOR
public:
	SIMD_FORCE_INLINE btAlignedObjectArray<T>& operator=(const btAlignedObjectArray<T> &other);
#else//BT_ALLOW_ARRAY_COPY_OPERATOR
private:
		SIMD_FORCE_INLINE btAlignedObjectArray<T>& operator=(const btAlignedObjectArray<T> &other);
#endif//BT_ALLOW_ARRAY_COPY_OPERATOR

protected:
		SIMD_FORCE_INLINE	int	allocSize(int size);
		SIMD_FORCE_INLINE	void	copy(int start,int end, T* dest) const;
		SIMD_FORCE_INLINE	void	init();
		SIMD_FORCE_INLINE	void	destroy(int first,int last);
		SIMD_FORCE_INLINE	void* allocate(int size);
		SIMD_FORCE_INLINE	void	deallocate();

	public:		
		btAlignedObjectArray();

		~btAlignedObjectArray();

		///Generally it is best to avoid using the copy constructor of an btAlignedObjectArray,
		//  and use a (const) reference to the array instead.
		btAlignedObjectArray(const btAlignedObjectArray& otherArray);		
		
		/// return the number of elements in the array
		SIMD_FORCE_INLINE	int size() const;
		
		SIMD_FORCE_INLINE const T& at(int n) const;

		SIMD_FORCE_INLINE T& at(int n);

		SIMD_FORCE_INLINE const T& operator[](int n) const;

		SIMD_FORCE_INLINE T& operator[](int n);
		
		///clear the array, deallocated memory. Generally it is better to use array.resize(0), 
		//  to reduce performance overhead of run-time memory (de)allocations.
		SIMD_FORCE_INLINE	void	clear();

		SIMD_FORCE_INLINE	void	pop_back();

		///resize changes the number of elements in the array. If the new size is larger, 
		//  the new elements will be constructed using the optional second argument.
		///when the new number of elements is smaller, the destructor will be called,
		//  but memory will not be freed, to reduce performance overhead of run-time memory (de)allocations.
		SIMD_FORCE_INLINE	void	resizeNoInitialize(int newsize);
	
		SIMD_FORCE_INLINE	void	resize(int newsize, const T& fillData=T());

		SIMD_FORCE_INLINE	T&  expandNonInitializing( );

		SIMD_FORCE_INLINE	T&  expand( const T& fillValue=T());

		SIMD_FORCE_INLINE	void push_back(const T& _Val);

		/// return the pre-allocated (reserved) elements, this is at least 
		//   as large as the total number of elements,see size() and reserve()
		SIMD_FORCE_INLINE	int capacity() const;
		
		SIMD_FORCE_INLINE	void reserve(int _Count);

		class less
		{ public:
				bool operator() ( const T& a, const T& b ) { return ( a < b ); }
		};

		template <typename L>
		void quickSortInternal(const L& CompareFunc,int lo, int hi);

		template <typename L>
		void quickSort(const L& CompareFunc);

		///heap sort from http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Sort/Heap/
		template <typename L>
		void downHeap(T *pArr, int k, int n, const L& CompareFunc);

		void	swap(int index0,int index1);

	template <typename L>
	void heapSort(const L& CompareFunc);

	///non-recursive binary search, assumes sorted array
	int	findBinarySearch(const T& key) const;

	int	findLinearSearch(const T& key) const;

	void	remove(const T& key);

	//PCK: whole function
	void initializeFromBuffer(void *buffer, int size, int capacity);

	void copyFromArray(const btAlignedObjectArray& otherArray);
};

btAlignedObjectArray和std::vector相似，各成員函數的具體實現這裏再也不列出。

std::unordered_map的內存佈局

btHashMap的內存佈局與咱們常見的HashMap的內存佈局大相徑庭，爲了和btHashMap的內存佈局對比，這裏先介紹一下std::unordered_map的內存佈局。

GCC中std::unordered_map僅是對_Hahstable的簡單包裝，_Hashtable的數據成員定義以下：

__bucket_type*		_M_buckets;
      size_type			_M_bucket_count;
      __before_begin		_M_bbegin;
      size_type			_M_element_count;
      _RehashPolicy		_M_rehash_policy;

其中，size_type爲std::size_t的typedef；而_RehashPlolicy是具體的策略類，只有成員函數定義，沒有數據成員（這是一種被稱做Policy Based的設計範式，具體可參閱《Modern C++ Design》，中譯本名爲《C++設計新思惟》，由侯捷先生翻譯）。

繼續跟蹤_bucket_type，能夠看到（_Hashtable）：

using __bucket_type = typename __hashtable_base::__bucket_type;

和（__hashtable_base）：

using __node_base = __detail::_Hash_node_base;
    using __bucket_type = __node_base*;

至此，才知道_M_buckets的類型爲：_Hash_node_base**

繼續追蹤，能夠看到_Hash_node_base的定義：

/**
   *  struct _Hash_node_base
   *
   *  Nodes, used to wrap elements stored in the hash table.  A policy
   *  template parameter of class template _Hashtable controls whether
   *  nodes also store a hash code. In some cases (e.g. strings) this
   *  may be a performance win.
   */
  struct _Hash_node_base
  {
    _Hash_node_base* _M_nxt;

    _Hash_node_base() : _M_nxt() { }

    _Hash_node_base(_Hash_node_base* __next) : _M_nxt(__next) { }
  };

從_Hashtable::_M_buckets（二維指針）和_Hash_node_base的_M_nxt的類型（指針），能夠猜想Hashtable的內存佈局——buckets數組存放hash值相同的node鏈表的頭指針，每一個bucket上掛着一個鏈表。

繼續看__before_begin的類型（_Hashtable）：

using __before_begin = __detail::_Before_begin<_Node_allocator_type>;

繼續跟蹤：

/**
   * This type is to combine a _Hash_node_base instance with an allocator
   * instance through inheritance to benefit from EBO when possible.
   */
  template<typename _NodeAlloc>
    struct _Before_begin : public _NodeAlloc
    {
      _Hash_node_base _M_node;

      _Before_begin(const _Before_begin&) = default;
      _Before_begin(_Before_begin&&) = default;

      template<typename _Alloc>
	_Before_begin(_Alloc&& __a)
	  : _NodeAlloc(std::forward<_Alloc>(__a))
	{ }
    };

根據對STL雙鏈表std::list的瞭解，能夠猜想Berfore_begin的做用，極可能和雙鏈表的「頭部的多餘的一個節點」相似，只是爲了方便迭代器（iterator）迭代，經過_Hashtable::begin()能夠獲得驗證：

iterator
      begin() noexcept
      { return iterator(_M_begin()); }

      __node_type*
      _M_begin() const
      { return static_cast<__node_type*>(_M_before_begin()._M_nxt); }

      const __node_base&
      _M_before_begin() const
      { return _M_bbegin._M_node; }

實際存放Value的node類型爲下面兩種的其中一種（按Hash_node_base的註釋，Key爲string時可能會用第一種，以提高性能）：

/**
   *  Specialization for nodes with caches, struct _Hash_node.
   *
   *  Base class is __detail::_Hash_node_base.
   */
  template<typename _Value>
    struct _Hash_node<_Value, true> : _Hash_node_base
    {
      _Value       _M_v;
      std::size_t  _M_hash_code;

      template<typename... _Args>
	_Hash_node(_Args&&... __args)
	: _M_v(std::forward<_Args>(__args)...), _M_hash_code() { }

      _Hash_node*
      _M_next() const { return static_cast<_Hash_node*>(_M_nxt); }
    };

  /**
   *  Specialization for nodes without caches, struct _Hash_node.
   *
   *  Base class is __detail::_Hash_node_base.
   */
  template<typename _Value>
    struct _Hash_node<_Value, false> : _Hash_node_base
    {
      _Value       _M_v;

      template<typename... _Args>
	_Hash_node(_Args&&... __args)
	: _M_v(std::forward<_Args>(__args)...) { }

      _Hash_node*
      _M_next() const { return static_cast<_Hash_node*>(_M_nxt); }
    };

下面經過insert源碼的追蹤，證明咱們對hashtable內存佈局的猜測：

_Hashtable::insert：

template<typename _Pair, typename = _IFconsp<_Pair>>
	__ireturn_type
	insert(_Pair&& __v)
	{
	  __hashtable& __h = this->_M_conjure_hashtable();
	  return __h._M_emplace(__unique_keys(), std::forward<_Pair>(__v));
	}

_Hashtable::_M_emplace（返回值類型寫得太複雜，已刪除）：

_M_emplace(std::true_type, _Args&&... __args)
      {
	// First build the node to get access to the hash code
	__node_type* __node = _M_allocate_node(std::forward<_Args>(__args)...); // 申請鏈表節點 __args爲 pair<Key, Value> 類型
	const key_type& __k = this->_M_extract()(__node->_M_v); // 從節點中抽取 key
	__hash_code __code; 
	__try
	  {
	    __code = this->_M_hash_code(__k);
	  }
	__catch(...)
	  {
	    _M_deallocate_node(__node);
	    __throw_exception_again;
	  }

	size_type __bkt = _M_bucket_index(__k, __code); // 尋找buckets上的對應hash code對應的index
	if (__node_type* __p = _M_find_node(__bkt, __k, __code)) // 在bucket所指鏈表上找到實際節點
	  {
	    // There is already an equivalent node, no insertion
	    _M_deallocate_node(__node);
	    return std::make_pair(iterator(__p), false);
	  }

	// Insert the node
	return std::make_pair(_M_insert_unique_node(__bkt, __code, __node),
			      true);
      }

_Hashtable::_M_find_node：

__node_type*
      _M_find_node(size_type __bkt, const key_type& __key,
		   __hash_code __c) const
      {
	__node_base* __before_n = _M_find_before_node(__bkt, __key, __c);
	if (__before_n)
	  return static_cast<__node_type*>(__before_n->_M_nxt);
	return nullptr;
      }

_Hashtable::_M_find_before_node（返回值類型寫得太複雜，已刪除）：

_M_find_before_node(size_type __n, const key_type& __k,
			__hash_code __code) const
    {
      __node_base* __prev_p = _M_buckets[__n]; // 取出頭指針
      if (!__prev_p)
	return nullptr;
      __node_type* __p = static_cast<__node_type*>(__prev_p->_M_nxt);
      for (;; __p = __p->_M_next()) // 遍歷鏈表
	{
	  if (this->_M_equals(__k, __code, __p)) // key匹配？
	    return __prev_p;
	  if (!__p->_M_nxt || _M_bucket_index(__p->_M_next()) != __n)
	    break;
	  __prev_p = __p;
	}
      return nullptr;
    }

看到_Hashtable::_M_find_before_node的代碼，就驗證了此前咱們對於Hashtable內存佈局的猜測：這和SGI hash_map的實現體hashtable的內存佈局相同（詳情可參考《STL源碼剖析》，侯捷先生著）。

（PS：追蹤起來並不輕鬆，能夠藉助Eclipse等集成開發環境進行）

例如，std::unordered_map<int, int*>背後的Hashtable的一種可能的內存佈局以下：

std::unordered_map的內存佈局是大多數<數據結構>、<算法>類教材給出的「標準作法」，也是比較常見的實現方法。

btHashMap

btHashMap的內存佈局，與「標準作法」大相徑庭，以下可見btHashMap的數據成員（data member）定義：

template <class Key, class Value>
class btHashMap
{

protected:
	btAlignedObjectArray<int>		m_hashTable;
	btAlignedObjectArray<int>		m_next;
	
	btAlignedObjectArray<Value>		m_valueArray;
	btAlignedObjectArray<Key>		m_keyArray;
// ... 省略
};

能夠看到，btHashMap的將buckets和key, value全放在一塊兒，它的內存佈局可能以下：

接下來經過分析btHashMap的幾個方法的實現，來肯定btHashMap三個btAlignedObjectArray的具體做用。

btHashMap::findIndex

下面來看看btHashMap::findIndex的實現：

int	findIndex(const Key& key) const
	{
		unsigned int hash = key.getHash() & (m_valueArray.capacity()-1); // 依賴 Key::getHash()

		if (hash >= (unsigned int)m_hashTable.size())
		{
			return BT_HASH_NULL;
		}

		int index = m_hashTable[hash]; // index至關於unordered_map的buckets[hash]的鏈表頭指針
		while ((index != BT_HASH_NULL) && key.equals(m_keyArray[index]) == false) // 遍歷鏈表，直到匹配，依賴 Key::equals(Key)
		{
			index = m_next[index]; 
		}
		return index;
	}

btHashMap::findIndex用到了m_hashTable，它的做用相似於unordered_map的buckets數組；m_next則相似於unordered_map鏈表節點的next指針。

btHashMap::insert

接下來看看btHashMap::insert：

void insert(const Key& key, const Value& value) {
		int hash = key.getHash() & (m_valueArray.capacity()-1);

		//replace value if the key is already there
		int index = findIndex(key); // 找到了<Key, Value>節點
		if (index != BT_HASH_NULL)
		{
			m_valueArray[index]=value; // 找到了，更行value
			return;
		}

		int count = m_valueArray.size(); // 當前已填充數目
		int oldCapacity = m_valueArray.capacity();
		m_valueArray.push_back(value); // value壓入m_valueArray的尾部，capacity可能增加
		m_keyArray.push_back(key);     // key壓入m_keyArray的尾部

		int newCapacity = m_valueArray.capacity();
		if (oldCapacity < newCapacity) 
		{
			growTables(key); // 若是增加，調整其他兩個數組的大小，並調整頭指針所在位置
			//hash with new capacity
			hash = key.getHash() & (m_valueArray.capacity()-1);
		}
		m_next[count] = m_hashTable[hash]; // 連同下一行，將新節點插入 m_hashTable[hash]鏈表頭部
		m_hashTable[hash] = count;
	}

這裏驗證了咱們對於m_hashTables和m_next做用的斷言。

btHashMap::remove

btHashMap與普通Hash表的區別在於，它可能要本身管理節點內存；好比，中間節點remove掉以後，如何保證下次insert可以複用節點內存？經過btHashMap::remove能夠知道bullet是如何實現的：

void remove(const Key& key) {

		int hash = key.getHash() & (m_valueArray.capacity()-1);

		int pairIndex = findIndex(key); // 找到<Key, Value>的 index
		
		if (pairIndex ==BT_HASH_NULL)
		{
			return;
		}

		// Remove the pair from the hash table.
		int index = m_hashTable[hash];   // 取出頭指針
		btAssert(index != BT_HASH_NULL);

		int previous = BT_HASH_NULL;
		while (index != pairIndex)   // 找index的前驅
		{
			previous = index;
			index = m_next[index];
		}

		if (previous != BT_HASH_NULL)  // 將當前節點從鏈表上刪除
		{
			btAssert(m_next[previous] == pairIndex);
			m_next[previous] = m_next[pairIndex];  // 當前節點位於鏈表中間
		}
		else 
		{
			m_hashTable[hash] = m_next[pairIndex]; // 當前節點是鏈表第一個節點
		}

		// We now move the last pair into spot of the
		// pair being removed. We need to fix the hash
		// table indices to support the move.

		int lastPairIndex = m_valueArray.size() - 1; 

		// If the removed pair is the last pair, we are done.
		if (lastPairIndex == pairIndex) // 若是<Key, Value>已是array的最後一個元素，則pop_back將減少size（capacity不變）
		{
			m_valueArray.pop_back();
			m_keyArray.pop_back();
			return;
		}

		// Remove the last pair from the hash table. 將最後一個<Key, Value>對從array上移除
		int lastHash = m_keyArray[lastPairIndex].getHash() & (m_valueArray.capacity()-1);

		index = m_hashTable[lastHash];
		btAssert(index != BT_HASH_NULL);

		previous = BT_HASH_NULL;
		while (index != lastPairIndex)
		{
			previous = index;
			index = m_next[index];
		}

		if (previous != BT_HASH_NULL)
		{
			btAssert(m_next[previous] == lastPairIndex);
			m_next[previous] = m_next[lastPairIndex];
		}
		else
		{
			m_hashTable[lastHash] = m_next[lastPairIndex];
		}

		// Copy the last pair into the remove pair's spot.  將最後一個<Key, Value>拷貝到移除pair的空當處
		m_valueArray[pairIndex] = m_valueArray[lastPairIndex];
		m_keyArray[pairIndex] = m_keyArray[lastPairIndex];

		// Insert the last pair into the hash table , 將移除節點插入到m_hashTable[lastHash]鏈表的頭部
		m_next[pairIndex] = m_hashTable[lastHash];
		m_hashTable[lastHash] = pairIndex;

		m_valueArray.pop_back();
		m_keyArray.pop_back();

	}

內存緊密（連續）的好處

btHashMap的這種設計，可以保證整個Hash表內存的緊密（連續）性；而這種連續性的好處主要在於：

第一，能與數組（指針）式API兼容，好比不少OpenGL API。由於存在btHashMap內的Value和Key在內存上都是連續的，因此這一點很好理解；

第二，保證了cache命中率（表元素較少時）。因爲普通鏈表的節點內存是在每次須要時才申請的，因此基本上不會連續，一般不在相同內存頁。因此，即使是短期內屢次訪問鏈表節點，也可能因爲節點內存分散形成不能將全部節點放入cache，從而致使訪問速度的降低；而btHashMap的節點內存始終連續，於是保證較高的cache命中率，能帶來必定程度的性能提高。

btAlignedAllocator點評

btAlignedAllocator定製化接口與std::allocator徹底不一樣。std::allocator的思路是：首先實現allocator，而後將allocator做爲模板參數寫入具體數據結構上，如vector<int, allocator<int> >；

這種方法雖然能夠實現「定製化」，但存在着必定的問題：

第一，因爲全部標準庫的allcoator用的都是std::allocator，若是你使用了另一種allocator，程序中就可能存在不止一種類型的內存管理方法一塊兒工做的局面；特別是當標準庫使用的是SGI 當年實現的「程序退出時才歸還全部內存的」allocator（具體可參閱《STL源碼剖析》）時，內存爭用是不可避免的。

第二，這種設計無形中增長了編碼和調試的複雜性，相信調試過gcc STL代碼的人深有體會。

而btAlignedAllocator則徹底不存在這樣的問題：

第一，它的allocate/deallocate行爲經過全局的函數指針代理實現，不可能存在同時有兩個以上的類型底層管理內存的方法。

第二，使用btAlignedAllocator的數據結構，其模板參數相對簡單，編碼、調試的複雜性天然也下降了。

本人拙見，STL有點過分設計了，雖然Policy Based的設計可以帶來靈活性，但代碼的可讀性降低了不少（或許開發glibc++的那羣人沒打算讓別人看他們的代碼☺）。