深刻剖析HashMap源碼

時間 2019-11-16

標籤深刻剖析 hashmap 源碼简体版

原文原文鏈接

1、HashMap數據結構

在 JDK1.8 中，HashMap 是由 數組+鏈表+紅黑樹構成
當一個值中要存儲到Map的時候會根據Key的值來計算出他的hash，經過hash值來確認存放到數組中的位置，若是發生哈希碰撞就以鏈表的形式存儲，當鏈表過長的話，HashMap會把這個鏈表轉換成紅黑樹來存儲

2、HashMap特色

HashMap底層採用的是數組+鏈表+紅黑樹（JDK1.8）
HashMap是採用key-value形式存儲，其中key是能夠容許爲null可是隻能是一個，而且key不容許重複。
HashMap是線程不安全的。
HashMap存入的順序和遍歷的順序有多是不一致的。
HashMap保存數據的時候經過計算key的hash值來去決定存儲的位置

3、源碼剖析

屬性

public class HashMap<K,V> extends AbstractMap<K,V>
              implements Map<K,V>, Cloneable, Serializable {
//默認初始容量爲16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 
//最大容量
static final int MAXIMUM_CAPACITY = 1 << 30;
//默認負載因子爲0.75
static final float DEFAULT_LOAD_FACTOR = 0.75f;
//Hash數組(在resize()中初始化)
transient Node<K,V>[] table;
//ket-value集合
transient Set<Map.Entry<K,V>> entrySet;
//元素個數
transient int size;
//修改次數
transient int modCount;
//容量閾值(元素個數超過該值會自動擴容)  
int threshold;
//負載因子
final float loadFactor;

總結node

默認初始容量爲16，默認負載因子爲0.75
threshold = 數組長度 * loadFactor，當元素個數超過threshold(容量閾值)時，HashMap會進行擴容操做
table數組中存放指向鏈表的引用

構造方法

/*無參*/
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR;//默認負載因子
}
/*傳入初始容量*/
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
/*傳入初始容量和負載因子*/
public HashMap(int initialCapacity, float loadFactor) {
    
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +loadFactor);
        
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

能夠看到容量閾值threshold是由tableSizeFor(initialCapacity)計算出來的，咱們來看看具體實現：算法

/*找到大於或等於 cap 的最小2的冪，用來作容量閾值*/
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

tableSizeFor(int cap)其實就是找到大於或等於cap的最小2的冪，用來作容量閾值。
這個算法的思路就是將該數字的最高非0位後面全置爲1！最後將結果+1後能夠獲得最小的二的整數冪。
一開始進行減一的操做是爲了防止當cap爲二的整數冪時，沒有把自身包含進範圍！
segmentfault

擴容

一、擴容原理

HashMap 的擴容機制與其餘變長集合的套路不太同樣，HashMap 按當前桶數組長度的2倍進行擴容，閾值也變爲原來的2倍。擴容以後，要從新計算鍵值對的位置，並把它們移動到合適的位置上去。

咱們使用的是2次冪的擴展(指長度擴爲原來2倍)，因此，元素的位置要麼是在原位置，要麼是在原位置再移動2次冪的位置。所以，咱們在擴充HashMap的時候，不須要像JDK1.7的實現那樣從新計算hash，只須要看看原來的hash值新增的那個bit是1仍是0就行了，是0的話索引沒變，是1的話索引變成「原索引+oldCap

以上就是 HashMap 的擴容大體過程，接下來咱們來看看具體的實現：數組

/*擴容*/
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    
    //一、若oldCap>0 說明hash數組table已被初始化
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }//按當前桶數組長度的2倍進行擴容，閾值也變爲原來的2倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; 
    }//二、若數組未被初始化，而threshold>0說明調用了HashMap(initialCapacity)和HashMap(initialCapacity, loadFactor)構造器
    else if (oldThr > 0)
        newCap = oldThr;//新容量設爲數組閾值
    else { //三、若table數組未被初始化，且threshold爲0說明調用HashMap()構造方法             
        newCap = DEFAULT_INITIAL_CAPACITY;//默認爲16
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);//16*0.75
    }
    
    //若計算過程當中，閾值溢出歸零，則按閾值公式從新計算
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    //建立新的hash數組，hash數組的初始化也是在這裏完成的
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    //若是舊的hash數組不爲空，則遍歷舊數組並映射到新的hash數組
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;//GC
                if (e.next == null)//若是隻連接一個節點，從新計算並放入新數組
                    newTab[e.hash & (newCap - 1)] = e;
                //如果紅黑樹，則須要進行拆分    
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { 
                    //rehash————>從新映射到新數組
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        /*注意這裏使用的是：e.hash & oldCap，若爲0則索引位置不變，不爲0則新索引=原索引+舊數組長度*/
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

上面的代碼作了三件事：安全

計算新數組的容量 newCap 和新閾值 newThr
根據計算出的 newCap 建立新的桶數組
將Node節點從新映射到新的桶數組裏。若是節點是 TreeNode 類型，則須要拆分成黑樹

二、紅黑樹拆分原理

從新映射紅黑樹的邏輯和從新映射鏈表的邏輯基本一致。不一樣的地方在於，從新映射後，會將紅黑樹拆分紅兩條由 TreeNode 組成的鏈表(也可能全部元素位置不變)
若是鏈表長度小於 UNTREEIFY_THRESHOLD，則將鏈表轉換成普通鏈表。不然根據條件從新將 TreeNode 鏈表樹化。

//若鏈表長度小於該值，則由TreeNode鏈表轉成Node鏈表
static final int UNTREEIFY_THRESHOLD = 6;

/*將紅黑樹拆分紅TreeNode鏈表後從新映射到新數組*/
final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
    TreeNode<K,V> b = this;
    TreeNode<K,V> loHead = null, loTail = null;
    TreeNode<K,V> hiHead = null, hiTail = null;
    int lc = 0, hc = 0;
     /*紅黑樹節點仍然保留了 next 引用，故仍能夠按鏈表方式遍歷紅黑樹*/
     /*下面的循環是對紅黑樹節點進行分組，與上面相似*/
    for (TreeNode<K,V> e = b, next; e != null; e = next) {
        next = (TreeNode<K,V>)e.next;
        e.next = null;
        if ((e.hash & bit) == 0) {
            if ((e.prev = loTail) == null)
                loHead = e;
            else
                loTail.next = e;
            loTail = e;
            ++lc;
        }
        else {
            if ((e.prev = hiTail) == null)
                hiHead = e;
            else
                hiTail.next = e;
            hiTail = e;
            ++hc;
        }
    }

    if (loHead != null) {
        //若是 loHead 不爲空，且鏈表長度小於等於 6，則將紅黑樹轉成鏈表
        if (lc <= UNTREEIFY_THRESHOLD)
            tab[index] = loHead.untreeify(map);
        else {
            tab[index] = loHead;          
            // hiHead == null 時，代表擴容後，全部節點仍在原位置，樹結構不變，無需從新樹化           
            if (hiHead != null) 
                loHead.treeify(tab);
        }
    }
    // 與上面相似
    if (hiHead != null) {
        if (hc <= UNTREEIFY_THRESHOLD)
            tab[index + bit] = hiHead.untreeify(map);
        else {
            tab[index + bit] = hiHead;
            if (loHead != null)
                hiHead.treeify(tab);
        }
    }
}

三、鏈表樹化

在擴容過程當中，樹化要知足兩個條件：數據結構

鏈表長度大於等於 TREEIFY_THRESHOLD（默認爲8）
桶數組容量大於等於 MIN_TREEIFY_CAPACITY（默認爲64）

//當鏈表長度小於該值，不進行樹化
static final int TREEIFY_THRESHOLD = 8;

//當桶數組容量小於該值時，優先進行擴容，而不是樹化
static final int MIN_TREEIFY_CAPACITY = 64;

//TreeNode節點（變相繼承了Node節點，因此包含next引用）
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;   
    boolean red;
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next);
    }
}

/*將普通節點鏈表轉換成樹形節點鏈表*/
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index;
    Node<K,V> e;
    //桶數組容量小於64，優先進行擴容而不是樹化
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        //hd指向樹形鏈表頭節點，tl指向尾節點
        TreeNode<K,V> hd = null, tl = null;
        do {
            //將鏈表中的Node節點轉成TreeNode節點
            TreeNode<K,V> p = replacementTreeNode(e, null);//e爲當前節點
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            //將包含TreeNode節點的鏈表轉成紅黑樹
            hd.treeify(tab);
    }
}
//Node————>TreeNode
TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
    return new TreeNode<>(p.hash, p.key, p.value, next);
}

爲何桶數組容量大於等於64才樹化？
由於當桶數組容量比較小時，鍵值對節點 hash 的碰撞率可能會比較高，進而致使鏈表長度較長。這個時候應該優先擴容，而不是立馬樹化。ide

查找

HashMap中並非直接經過key的hashcode方法獲取哈希值，而是經過內部自定義的hash方法計算哈希值
咱們來看看hash()的實現函數

/** 
 * 計算key的hash值
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

(h = key.hashCode()) ^ (h >>> 16) 是爲了讓高位數據與低位數據進行異或，變相的讓高位數據參與到計算中，int有32位，右移16位就能讓低16位和高16位進行異或

來看看get方法優化

/**
 *獲取key映射的value
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;//hash(key)不等於key.hashCode
}

/*查找key*/
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; //指向hash數組
    Node<K,V> first, e; //first指向hash數組連接的第一個節點，e指向下一個節點
    int n;//hash數組長度
    K k;
    /*(n - 1) & hash ————>根據hash值計算出在數組中的索引index（至關於對數組長度取模，這裏用位運算進行了優化）*/
    if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
        //基本類型用==比較，其它用euqals比較
        if (first.hash == hash && ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            //若是first是TreeNode類型，則調用紅黑樹查找方法
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {//向後遍歷
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

注意：在HashMap中用 (n - 1) & hash 計算key所對應的索引index（至關於對數組長度取模，這裏用位運算進行了優化）this

插入

HashMap插入邏輯：

1.當桶數組 table 爲空時，經過擴容的方式初始化 table
2.查找要插入的鍵值對是否已經存在，存在的話根據條件判斷是否用新值替換舊值
3.若是不存在，則將鍵值對鏈入鏈表中，並根據鏈表長度決定是否將鏈表轉爲紅黑樹
4.判斷鍵值對數量是否大於閾值，大於的話則進行擴容操做

/*
 * 插入key-value
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) {
    Node<K,V>[] tab;//指向hash數組
    Node<K,V> p;//初始化爲桶中第一個節點
    int n, i;//n爲數組長度，i爲索引
    
    //tab被延遲到插入新數據時再進行初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //若是桶中不包含Node引用，則新建Node節點存入桶中便可    
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);//new Node<>(hash, key, value, next)
    else {
        Node<K,V> e; //若是要插入的key-value已存在，用e指向該節點
        K k;
        //若是第一個節點就是要插入的key-value，則讓e指向第一個節點（p在這裏指向第一個節點）
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //若是p是TreeNode類型，則調用紅黑樹的插入操做（注意：TreeNode是Node的子類）
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            //對鏈表進行遍歷，並用binCount統計鏈表長度
            for (int binCount = 0; ; ++binCount) {
                //若是鏈表中不包含要插入的key-value，則將其插入到鏈表尾部
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    //若是鏈表長度大於或等於樹化閾值，則進行樹化操做
                    if (binCount >= TREEIFY_THRESHOLD - 1)
                        treeifyBin(tab, hash);
                    break;
                }
                //若是要插入的key-value已存在則終止遍歷，不然向後遍歷
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        //若是e不爲null說明要插入的key-value已存在
        if (e != null) {
            V oldValue = e.value;
            //根據傳入的onlyIfAbsent判斷是否要更新舊值
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);//空函數？回調？不知道幹嗎的
            return oldValue;
        }
    }
    ++modCount;
    //鍵值對數量超過閾值時，則進行擴容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);//也是空函數？回調？不知道幹嗎的
    return null;
}

刪除

HashMap 的刪除操做並不複雜，僅需三個步驟便可完成。第一步是定位桶位置，第二步遍歷鏈表並找到鍵值相等的節點，第三步刪除節點，源碼以下：

/*
 * 刪除元素
 */
public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,boolean matchValue, boolean movable) {
    Node<K,V>[] tab; 
    Node<K,V> p; 
    int n, index;
    //一、定位元素桶位置
    if ((tab = table) != null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; 
        K k; 
        V v;
        // 若是鍵的值與鏈表第一個節點相等，則將 node 指向該節點
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {  
            // 若是是 TreeNode 類型，調用紅黑樹的查找邏輯定位待刪除節點
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                // 二、遍歷鏈表，找到待刪除節點
                do {
                    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        
        // 三、刪除節點，並修復鏈表或紅黑樹
        if (node != null && (!matchValue || (v = node.value) == value || (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

注意：刪除節點後可能破壞了紅黑樹的平衡性質，removeTreeNode方法會對紅黑樹進行變色、旋轉等操做來保持紅黑樹的平衡結構，這部分比較複雜，感興趣的小夥伴可看下面這篇文章：
紅黑樹詳解

遍歷

最多見的遍歷方式

for(Object key : map.keySet()) {
    // do something
}

等價於

Set keys = map.keySet();
Iterator ite = keys.iterator();
while (ite.hasNext()) {
    Object key = ite.next();
    // do something
}

在遍歷HashMap時，咱們會發現遍歷的順序和插入的順序不一致，這是爲何呢？

咱們這裏以keySet爲例，先來看看部分相關源碼：

public Set<K> keySet() {
    Set<K> ks = keySet;
    if (ks == null) {
        ks = new KeySet();
        keySet = ks;
    }
    return ks;
}

/**
 * 鍵集合
 */
final class KeySet extends AbstractSet<K> {  
    public final Iterator<K> iterator()     { return new KeyIterator(); } 
    // 省略部分代碼
}


/**
 * 鍵迭代器
 */
final class KeyIterator extends HashIterator implements Iterator<K> {
    public final K next() { return nextNode().key; }
}

/*HashMap迭代器基類，子類有KeyIterator、ValueIterator等*/
abstract class HashIterator {
    Node<K,V> next;        //下一個節點
    Node<K,V> current;     //當前節點
    int expectedModCount;  //修改次數
    int index;             //當前索引
    //無參構造
    HashIterator() {
        expectedModCount = modCount;
        Node<K,V>[] t = table;
        current = next = null;
        index = 0;
        //找到第一個不爲空的桶的索引
        if (t != null && size > 0) {
            do {} while (index < t.length && (next = t[index++]) == null);
        }
    }
    //是否有下一個節點
    public final boolean hasNext() {
        return next != null;
    }
    //返回下一個節點
    final Node<K,V> nextNode() {
        Node<K,V>[] t;
        Node<K,V> e = next;
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();//fail-fast
        if (e == null)
            throw new NoSuchElementException();
        //當前的桶遍歷完了就開始遍歷下一個桶
        if ((next = (current = e).next) == null && (t = table) != null) {
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        return e;
    }
    //刪除元素
    public final void remove() {
        Node<K,V> p = current;
        if (p == null)
            throw new IllegalStateException();
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        current = null;
        K key = p.key;
        removeNode(hash(key), key, null, false, false);//調用外部的removeNode
        expectedModCount = modCount;
    }
}

從代碼能夠看出，HashIterator先從桶數組中找到包含鏈表節點引用的桶。而後對這個桶指向的鏈表進行遍歷。遍歷完成後，再繼續尋找下一個包含鏈表節點引用的桶，找到繼續遍歷。找不到，則結束遍歷。這就解釋了爲何遍歷和插入的順序不一致，不懂的同窗請看下圖：

4、總結

本文描述了HashMap的實現原理，並結合源碼作了進一步的分析，也涉及到一些源碼細節設計原因，但願本篇文章能幫助到你們，同時也歡迎討論指正，謝謝支持！

補充

關於HashMap的源碼就講解到這裏了，如今咱們來講說爲何添加到HashMap中的對象須要重寫equals()和hashcode()方法？

這裏以Person爲例：

public class Person {

    Integer id;

    String name;

    public Person(Integer id, String name) {
        this.id = id;
        this.name = name;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null) return false;
        if (obj == this) return true;
        if (obj instanceof Person) {
            Person person = (Person) obj;
            if (this.id == person.id)
                return true;
        }
        return false;
    }

    public static void main(String[] args) {
        Person p1 = new Person(1, "aaa");
        Person p2 = new Person(1, "bbb");

        HashMap<Person, String> map = new HashMap<>();
        map.put(p1, "這是p1");
        System.out.println(map.get(p2));
    }
}