最通俗易懂的 HashMap 源碼分析解讀

時間 2020-07-06

標籤通俗易懂 hashmap 源碼分析解讀简体版

原文原文鏈接

文章已經收錄在 Github.com/niumoo/JavaNotes ，更有 Java 程序員所須要掌握的核心知識，歡迎Star和指教。
歡迎關注個人公衆號，文章每週更新。

HashMap 做爲最經常使用的集合類之一，有必要深刻淺出的瞭解一下。這篇文章會深刻到 HashMap 源碼，刨析它的存儲結構以及工做機制。java

1. HashMap 的存儲結構

HashMap 的數據存儲結構是一個 Node<K,V> 數組，在（Java 7 中是 Entry<K,V> 數組，但結構相同）node

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    // 數組
    transient Node<K,V>[] table;    
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        // 鏈表
        Node<K,V> next;
        ....
    }
    .....
}

存儲結構主要是數組加鏈表，像下面的圖。git

2. HashMap 的 put()

在 Java 8 中 HashMap 的 put 方法以下，我已經詳細註釋了重要代碼。程序員

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
// 計算哈希值 與（&）、非（~）、或（|）、異或（^）
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
   
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 若是數組爲空，進行 resize() 初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 若是計算的位置上Node不存在，直接建立節點插入
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        // 若是計算的位置上Node 存在，鏈表處理
        Node<K,V> e; K k;
        // 若是 hash 值，k 值徹底相同，直接覆蓋
        if (p.hash == hash &&((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 若是 index 位置元素已經存在，且是紅黑樹
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // 若是此次要放入的值不存在
            for (int binCount = 0; ; ++binCount) {
                // 尾插法
                if ((e = p.next) == null) {
                    // 找到節點鏈表中next爲空的節點，建立新的節點插入
                    p.next = newNode(hash, key, value, null);
                    // 若是節點鏈表中數量超過TREEIFY_THRESHOLD（8）個，轉化爲紅黑樹
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // 若是節點鏈表中有發現已有相同key
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 若是節點 e 有值，放入數組 table[]
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
 
    ++modCount;
    // 當前大小大於臨界大小，擴容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

舉個例子，若是 put 的 key 爲字母 a，當前 HashMap 容量是初始容量 16，計算出位置是 1。github

# int hash = key.hashCode()
# hash = hash ^ (hash >>> 16)
# 公式 index = (n - 1) & hash // n 是容量
    
hash HEX(97)  = 0110 0001‬
n-1  HEX(15)  = 0000 1111
--------------------------
         結果  = 0000 0001
# 計算獲得位置是 1

總結 HashMap put 過程。面試

計算 key 的 hash 值。
計算方式是 (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);shell
檢查當前數組是否爲空，爲空須要進行初始化，初始化容量是 16 ，負載因子默認 0.75。
計算 key 在數組中的座標。
計算方式：(容量 - 1) & hash. 數組

由於容量老是2的次方，因此-1的值的二進制老是全1。方便與 hash 值進行與運算。安全
若是計算出的座標元素爲空，建立節點加入，put 結束。網絡
1. 若是當前數組容量大於負載因子設置的容量，進行擴容。
若是計算出的座標元素有值。
1. 若是座標上的元素值和要加入的值 key 徹底同樣，覆蓋原有值。
2. 若是座標上的元素是紅黑樹，把要加入的值和 key 加入到紅黑樹。
3. 若是座標上的元素和要加入的元素不一樣（尾插法增長）。
  1. 若是 next 節點爲空，把要加入的值和 key 加入 next 節點。
  2. 若是 next 節點不爲空，循環查看 next 節點。
    若是發現有 next 節點的 key 和要加入的 key 同樣，對應的值替換爲新值。
  3. 若是循環 next 節點查找超過8層還不爲空，把這個位置元素轉換爲紅黑樹。

3. HashMap 的 get()

在 Java 8 中 get 方法源碼以下，我已經作了註釋說明。

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // 只有在存儲數組已經存在的狀況下進入這個 if
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // first 是獲取的座標上元素
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            // key 相同，說明first是想要的元素，返回
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                // 若是是紅黑樹，從紅黑樹中查找結果
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                // 循環遍歷查找
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
 }

get 方法流程總結。

計算 key 的 hash 值。
若是存儲數組不爲空，且計算獲得的位置上的元素不爲空。繼續，不然，返回 Null。
若是獲取到的元素的 key 值相等，說明查找到了，返回元素。
若是獲取到的元素的 key 值不相等，查找 next 節點的元素。
1. 若是元素是紅黑樹，在紅黑樹中查找。
2. 不是紅黑樹，遍歷 next 節點查找，找到則返回。

4. HashMap 的 Hash 規則

計算 hash 值 int hash = key.hashCode()。
與或上 hash 值無符號右移16 位。 hash = hash ^ (hash >>> 16)。
位置計算公式 index = (n - 1) & hash ，其中 n 是容量。

這裏可能會有同窗對 hash ^ (hash >>> 16) 有疑惑，很好奇爲何這裏要拿 hash 值異或上 hash 值無符號右移 16 位呢？下面經過一個例子演示其中道理所在。

假設 hash 值是 0001 0100 1100 0010 0110 0001‬ 0010 0000，當前容量是 16。

hash        = 0001 0100 1100 0010 0110 0001‬ 0010 0000  ---
                                                         | 與或計算
hash >>> 16 = 0000 0000 0000 0000 0001 0100 1100 0010  ---
------------------------------------------------------ 
hash 結果    = 0001 0100 1100 0010 0111 0101 1110 0100 ---
                                                         | & 與運算
容量 -1      = 0000 0000 0000 0000 0000 0000 0000 1111 ---
------------------------------------------------------
# 獲得位置    = 0000 0000 0000 0000 0000 0000 0000 0100 獲得位置是 4

若是又新增一個數據，獲得 hash 值是 0100 0000 1110 0010 1010 0010‬ 0001 0000 ，容量仍是16，計算他的位置應該是什麼呢？

hash        = 0100 0000 1110 0010 1010 0010‬ 0001 0000  ---
                                                         | 與或計算
hash >>> 16 = 0000 0000 0000 0000 0001 0100 1100 0010  ---
------------------------------------------------------ 
hash 結果    = 0100 0000 1110 0010 1011 0110 1101 0010 ---
                                                         | & 與運算
容量 -1      = 0000 0000 0000 0000 0000 0000 0000 1111 ---
------------------------------------------------------
# 獲得位置    = 0000 0000 0000 0000 0000 0000 0000 0010 獲得位置是 2

上面兩個例子，獲得位置一個是 4，一個是 2，上面只是我隨便輸入的兩個二進制數，那麼這兩個數若是不通過 hash ^ (hash >>> 16) 運算，位置會有什麼變化呢？

hash        = 0001 0100 1100 0010 0110 0001‬ 0010 0000
容量 -1      = 0000 0000 0000 0000 0000 0000 0000 1111
------------------------------------------------------
        結果 = 0000 0000 0000 0000 0000 0000 0000 0000
# 獲得位置是 0 
hash        = 0100 0000 1110 0010 1010 0010‬ 0001 0000
容量 -1      = 0000 0000 0000 0000 0000 0000 0000 1111
------------------------------------------------------
        結果 = 0000 0000 0000 0000 0000 0000 0000 0000
# 獲得位置是 0

能夠發現位置都是 0 ，衝突機率提升了。可見 hash ^ (hash >>> 16) 讓數據的 hash 值的高 16 位與低 16 位進行與或混合，能夠減小低位相同時數據插入衝突的機率。

5. HashMap 的初始化大小

初始化大小是 16，爲何是 16 呢？
這多是由於每次擴容都是 2 倍。而選擇 2 的次方值 16 做爲初始容量，有利於擴容時從新 Hash 計算位置。爲何是 16 我想是一個經驗值，理論上說只要是 2 的次方都沒有問題。

6. HashMap 的擴容方式

負載因子是多少？負載因子是 0.75。

擴容方式是什麼？看源碼說明。

/**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * @return the table
     */
    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        // 現有容量
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        // 現有擴容閥值
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            // 若是當前長度已經大於最大容量。結束擴容
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // 若是擴大兩倍以後小於最大容量，且現有容量大於等於初始容量，就擴大兩倍
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
               // 擴容閥值擴大爲兩倍
                newThr = oldThr << 1; // double threshold
        }
        // 當前容量 = 0 ，可是當前記錄容量 > 0 ，獲取當前記錄容量。
       else if (oldThr > 0) // initial capacity was placed in threshold
            // 進入這裏，說明是經過指定容量和負載因子的構造函數
            newCap = oldThr;
        else {                   // zero initial threshold signifies using defaults
            // 進入這裏說明是經過無參構造
            // 新的容量
            newCap = DEFAULT_INITIAL_CAPACITY;
            // 新的閥值
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            // 計算擴容閥值
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        // 若是 oldTab != null，說明是擴容，不然是初始化，直接返回
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    // 若是當前元素 next節點沒有元素，當前元素從新計算位置直接放入
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        // 若是當前節點是紅黑樹
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // == 0 ，位置不變
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            // e.hash & oldCap != 0 ,位置變爲：位置+擴容前容量
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

擴容時候怎麼從新肯定元素在數組中的位置，咱們看到是由 if ((e.hash & oldCap) == 0) 肯定的。

hash HEX(97)  = 0110 0001‬
n    HEX(16)  = 0001 0000
--------------------------
         結果  = 0000 0000
# e.hash & oldCap = 0 計算獲得位置仍是擴容前位置
    
hash HEX(17)  = 0001 0001‬
n    HEX(16)  = 0001 0000
--------------------------
         結果  = 0001 0000
#  e.hash & oldCap != 0 計算獲得位置是擴容前位置+擴容前容量

經過上面的分析也能夠看出，只有在每次容量都是2的次方的狀況下才能使用 if ((e.hash & oldCap) == 0) 判斷擴容後的位置。

7. HashMap 中的紅黑樹

HashMap 在 Java 8 中的實現增長了紅黑樹，當鏈表節點達到 8 個的時候，會把鏈表轉換成紅黑樹，低於 6 個的時候，會退回鏈表。究其緣由是由於當節點過多時，使用紅黑樹能夠更高效的查找到節點。畢竟紅黑樹是一種二叉查找樹。

節點個數是多少的時候，鏈表會轉變成紅黑樹。
鏈表節點個數大於等於 8 時，鏈表會轉換成樹結構。
節點個數是多少的時候，紅黑樹會退回鏈表。
節點個數小於等於 6 時，樹會轉變成鏈表。
爲何轉變條件 8 和 6 有一個差值。
若是沒有差值，都是 8 ，那麼若是頻繁的插入刪除元素，鏈表個數又恰好在 8 徘徊，那麼就會頻繁的發生鏈表轉樹，樹轉鏈表。

8. 爲啥容量都是2的冪？

容量是2的冪時，key 的 hash 值而後 & (容量-1) 肯定位置時碰撞機率會比較低，由於容量爲 2 的冪時，減 1 以後的二進制數爲全1，這樣與運算的結果就等於 hasｈ值後面與 1 進行與運算的幾位。

下面是個例子。

hash HEX(97)  = 0110 0001‬
n-1  HEX(15)  = 0000 1111
--------------------------
         結果  = 0000 0001
# 計算獲得位置是 1
hash HEX(99)  = 0110 0011‬
n-1  HEX(15)  = 0000 1111
--------------------------
         結果  = 0000 0011
# 計算獲得位置是 3
hash HEX(101)  = 0110 0101‬
n-1  HEX(15)   = 0000 1111
--------------------------
         結果   = 0000 0101
# 計算獲得位置是 5

若是是其餘的容量值，假設是9，進行與運算結果碰撞的機率就比較大。

hash HEX(97)  = 0110 0001‬
n-1  HEX(09)  = 0000 1001
--------------------------
         結果  = 0000 0001
# 計算獲得位置是 1
hash HEX(99)  = 0110 0011‬
n-1  HEX(09)  = 0000 1001
--------------------------
         結果  = 0000 0001
# 計算獲得位置是 1
hash HEX(101)  = 0110 0101‬
n-1  HEX(09)   = 0000 1001
--------------------------
         結果   = 0000 0001
# 計算獲得位置是 1

另外，每次都是 2 的冪也可讓 HashMap 擴容時能夠方便的從新計算位置。

hash HEX(97)  = 0110 0001‬
n-1  HEX(15)  = 0000 1111
--------------------------
         結果  = 0000 0001
# 計算獲得位置是 1
    
hash HEX(97)  = 0110 0001‬
n-1  HEX(31)  = 0001 1111
--------------------------
         結果  = 0000 0001
# 計算獲得位置是 1

9. 快速失敗（fail—fast）

HashMap 遍歷使用的是一種快速失敗機制，它是 Java 非安全集合中的一種廣泛機制，這種機制可讓集合在遍歷時，若是有線程對集合進行了修改、刪除、增長操做，會觸發併發修改異常。

它的實現機制是在遍歷前保存一份 modCount ，在每次獲取下一個要遍歷的元素時會對比當前的 modCount 和保存的 modCount 是否相等。

快速失敗也能夠看做是一種安全機制，這樣在多線程操做不安全的集合時，因爲快速失敗的機制，會拋出異常。

10. 線程安全的 Map

使用 Collections.synchronizedMap(Map) 建立線程安全的 Map.
實現原理：有一個變量 final Object mutex; ，操做方法都加了這個 synchronized (mutex) 排它鎖。
使用 Hashtable
使用 ConcurrentHashMap

最後的話