HashMap源碼閱讀

時間 2019-12-08

標籤 hashmap 源碼閱讀简体版

原文原文鏈接

HashMap是Map家族中使用頻度最高的一個，下文主要結合源碼來說解HashMap的工做原理。html

1. 數據結構

HashMap的數據結構主要由數組+鏈表+紅黑樹（JDK1.8後新增）組成，以下圖所示：java

左側數組是哈希表，數組的每一個元素都是一個單鏈表的頭節點，當不一樣的key映射到數組的同一位置，就將其放入單鏈表中來解決key的hash值的衝突。node

當鏈表的長度>8時，JDK1.8作了數據結構的優化，會將鏈表轉化爲紅黑樹，利用紅黑樹快速增刪改查的特色提高HashMap的性能，查詢效率鏈表O(N)，紅黑樹是O(lgN)。算法

哈希表中當key的哈希值衝突時，可採用 開放地址法 和 鏈地址法 來解決。Java中的HashMap使用了鏈地址法：在每一個數組元素後都有一個鏈表，對key經過Hash算法定位到數組下標，將鍵值對數據放在對應下標元素的鏈表上。shell

先了解下HaspMap的幾個字段：bootstrap

/* ---------------- Fields -------------- */
 
/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
transient Node<K,V>[] table;
 
/**
 * Holds cached entrySet(). Note that AbstractMap fields are used
 * for keySet() and values().
 */
transient Set<Map.Entry<K,V>> entrySet;
 
/**
 * The number of key-value mappings contained in this map.
 */
transient int size;
 
/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.  (See ConcurrentModificationException).
 */
transient int modCount;
 
/**
 * The next size value at which to resize (capacity * load factor).
 */
int threshold;     
 
/**
 * The load factor for the hash table.
 */
final float loadFactor;

size：HashMap中實際存在的 Node（key-value對）數量。
modCount：記錄HashMap內部結構發生變化的次數，主要用於迭代器的Fail-Fast（迭代快速失敗）。當 put 新的 key-value 鍵值對時，若是新增了Node節點，屬於結構變化，而某個key對應的value被覆蓋則不屬於結構變化。
threshold：threshold = capacity * loadFactor，容許數組容納的最多元素數量，若是超過這個數目就從新resize（擴容），擴容後HashMap的容量是以前的兩倍。負載因子越大，所能容納的鍵值對個數越多。
loadFactor：負載因子，默認是0.75。是對空間和時間效率的一個平衡選擇，建議不要修改。
Node[] table：是 HashMap 的哈希桶數組，是一個 HashMap 類中的很是重要的字段。

HashMap默認的初始容量是 16，負載因子是 loadFactor=0.75，也就是說：使用HashMap默認構造函數新建了一個HashMap對象，數組最多容納元素個數 threshold = 16 * 0.75 = 12。當增長數據時，size 和 modCount 會隨着增長，數據實際容量超過12時，HashMap就會進行擴容。數組

Node的源碼以下：安全

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;     // 用來定位數組索引位置
    final K key;
    V value;
    Node<K,V> next;       // 鏈表的下一個node
 
    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
 
    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }
 
    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }
 
    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }
 
    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

Node 是 HashMap 的一個內部類，實現了 Map.Entry 接口，存儲着鍵值對。上圖中的每個黑色節點就是一個 Node 對象。數據結構

2. Hash算法

在查找、增長、刪除 key-value 鍵值對時，都須要先在HashMap中定位哈希桶數組的索引位置。有時兩個key的下標會同樣，此時就發生了Hash碰撞，當Hash算法計算結果越分散均勻，Hash碰撞的機率就越小，map的存取效率就越高。多線程

定位數組索引位置的源碼實現以下：

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
 
// jdk1.7的源碼
static int indexFor(int h, int length) { 
     return h & (length-1);
}
  
//jdk1.8沒有 indexFor() 方法，但實現原理同樣的，定位數組索引下標通常按以下方式：tab[(n - 1) & hash]
/**
 * Implements Map.get and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        ...
    }
}

Hash算法本質上分三步：

取key的hashCode值：h = key.hashCode()
高位運算：h ^ (h >>> 16)
取模運算：table[(table.length - 1) & hash]

hash值經過hashCode()的高16位異或低16位來計算，能夠在tabl.length比較小時，能將高低bit都參與到Hash計算中。

在HashMap中，哈希桶數組table的長度length大小必須爲2的n次方，這樣設計，主要是爲了在取模和擴容時作優化。若是將hash值直接對數組長度進行取模運算，這樣元素分佈也比較均勻，可是模運算的消耗是比較大的。當length老是2的n次方時，(table.length - 1) & hash = hash % length，如此來計算元素在table數組的索引處，& 比 % 具備更好的效率。

舉例以下：

3. put方法

HashMap的put方法源碼以下：

public V put(K key, V value) {
    // 對key求hash值
    return putVal(hash(key), key, value, false, true);
}
  
/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // table爲空，則resize()進行擴容新建
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 計算key在table中的index索引下標，若是Node爲null，則table[index]中新建Node節點
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // table[index]的首個節點key存在，則覆蓋value
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 判斷table[index]是否爲紅黑樹，若是是，則直接在樹中插入key-value
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // table[index]爲鏈表，遍歷鏈表。
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 若鏈表長度 > 8，則將鏈表轉化爲紅黑樹，在紅黑樹中進行插入
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // key已經存在，則直接覆蓋value
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // 插入Node成功後，判斷實際存在的key-value對是否大於最大容量threshold，若是超過，則進行擴容resize()
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

4. 擴容

擴容（resize）就是從新計算容量。向HashMap對象裏不停的添加元素，而HashMap對象內部的數組沒法裝載更多的元素時，就須要擴大數組的長度，以便能裝入更多的元素。方法是使用一個新的數組代替已有的容量小的數組。

resize() 擴容時，會新建一個更大的Entry數組，將原來Entry數組中的元素經過transfer()方法轉移到新數組上。經過遍歷數組+鏈表的方式來遍歷舊Entry數組中的每一個元素，經過上文提到的 indexFor()方法肯定在新Entry數組中的下標位置，而後使用鏈表頭插法插入到新Entry數組中。擴容會帶來一系列的運算，新建數組，對原有元素從新hash，這是很耗費資源的。

JDK1.7 resize的源碼以下：

void resize(int newCapacity) {   // newCapacity爲新的數組長度
   // 獲取擴容前舊的Entry數組和數組長度
   Entry[] oldTable = table;    
   int oldCapacity = oldTable.length;     
   // 擴容前的數組長度已經達到最大值了(2^30)  
   if (oldCapacity == MAXIMUM_CAPACITY) { 
       threshold = Integer.MAX_VALUE;   // 修改最大容量閾值爲int的最大值(2^31-1)，這樣之後就不會擴容了
       return;
   }
     
   Entry[] newTable = new Entry[newCapacity];   // 初始化一個新的Entry數組
   transfer(newTable);                          // 將數據轉移到新的Entry數組裏
   table = newTable;                            // HashMap的table屬性引用新的Entry數組
   threshold = (int)(newCapacity * loadFactor); // 修改閾值
}
  
void transfer(Entry[] newTable) {
     Entry[] src = table;                   // src引用了舊的Entry數組
     int newCapacity = newTable.length;
     for (int j = 0; j < src.length; j++) {
         Entry<K,V> e = src[j];             // 遍歷取得舊Entry數組的每一個元素
         if (e != null) {
             src[j] = null;                 // 釋放舊Entry數組的對象引用（for循環後，舊的Entry數組再也不引用任何對象）
             do {
                 Entry<K,V> next = e.next;
                 int i = indexFor(e.hash, newCapacity);  // 從新計算每一個元素在數組中的下標位置
                 e.next = newTable[i]; // 使用單鏈表的頭插方式，將舊Entry數組中元素添加到新Entry數組中
                 newTable[i] = e;     
                 e = next;             // 訪問下一個Entry鏈上的元素
             } while (e != null);
         }
     }
}

JDK1.8 resize的源碼以下：

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // 容量超過最大值就再也不擴充了
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 容量沒有超過最大值，就擴充爲原來的2倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // 計算新的resize容量上限
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        // 把每一個bucket都移動到新的bucket中
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // 鏈表優化重hash
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // 原索引
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // 原索引+oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // 原索引放到bucket中
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 原索引+oldCap放到bucket中
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

HashMap長度擴展爲原來的2倍，這樣使得元素的位置要不在原位置，要不在移動2次冪的位置。

舊table數組的長度爲n，元素原來的位置爲(n - 1) & hash，擴容後數組長度爲原來的2倍，則元素的新位置爲 (n * 2 - 1) & hash。舉個例子，原來table數組長度 n=16，圖a 表示key1和key2肯定索引的位置，圖 b表示擴容後 key1和key2肯定索引的位置，hash1和hash2分別爲key1和key2經過Hash算法求得的hash值。以下圖所示：

key1的原位置爲00101=5，擴容後的位置仍爲00101=5；而key2原位置爲00101=5，擴容後的位置爲10101=5+16（原位置+oldCap）

這樣設計的好處在於：既省去了從新計算hash值的時間；同時，新增1bit是0或1是隨機的，所以resize擴容的過程，將以前衝突的同一鏈表上的節點均勻的分散到新的bucket上

5. 線程安全問題

HashMap是非線程安全的，在多線程場景下，應該避免使用，而是使用線程安全的ConcurrentHashMap。在多線程場景中使用HashMap可能出現死循環，從而致使CPU負載太高達到100%，最終程序宕掉。

當put新元素到HashMap中時，若是總元素個數超過 threshold ，HashMap則會resize擴容，從而hash表中的全部元素會rehash，從新分配到新的hash表中。若是多個線程併發進行 rehash的話，可能會致使環形鏈表的出現，當另外一線程調用HashMap.get()，訪問到了環形鏈表時，就出現了死循環，最終致使程序不可用。如何產生環形鏈表的細節，這篇文章寫的很簡介明瞭：https://coolshell.cn/articles/9606.html。