Java集合-HashMap內部結構

時間 2019-11-17

原文原文鏈接

首先看一下Map接口的繼承關係java

說明

Map 爲最頂層的接口，AbstractMap 抽象類實現Map接口，TreeMap HashMap ConcurrentHashMap 都是繼承自 AbstractMap，實現了不一樣的功能。ConcurrentHashMap 另外又實現了一個 ConcurrentMap 接口，這個接口繼承自Map，對Map接口進行了一些擴展（看名字是在擴展了併發方面）。node

概要

接下來經過分析HashMap代碼，瞭解HashMap的內部結構。主要內容爲：bootstrap

Map 接口
Map.Entry
HashMap 內部結構
get 操做
put 操做
resize
hash 擾動函數

Map 接口

首先看一下什麼是Map，Map是一個接口（Interface）。在 api 中的定義爲api

An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.數組

一個擁有鍵值對的對象。一個map不能包含重複的key，沒一個key最多能夠映射到一個值。網絡

看一下map接口中主要的方法併發

public interface Map<K,V> {
    // Query Operations

    int size();
    boolean isEmpty();
    boolean containsKey(Object key);
    boolean containsValue(Object value);
    V get(Object key);


    // Modification Operations

    V put(K key, V value);
    V remove(Object key);


    // Bulk Operations

    void putAll(Map<? extends K, ? extends V> m);
    void clear();


    // Views

    Set<K> keySet();
    Collection<V> values();
    Set<Map.Entry<K, V>> entrySet();

    interface Entry<K,V> {

        K getKey();

        V getValue();

        V setValue(V value);

        boolean equals(Object o);

        int hashCode();

        。。。
    }

    // Comparison and hashing

    boolean equals(Object o);
    int hashCode();


    // Defaultable methods

    ...
}

註釋寫得很清楚，接口中有一些增長獲取移除等操做（Query Opertions, Modification Operations, Buld Operations，View）, 還有一些java8以後引入的默認的方法（這裏沒有顯示出來）。views 部分提供了一些能夠查看map內部的方法，keySet() 返回全部key的一個Set集合，values() 返回全部value的集合，entrySet() 返回全部鍵值對的集合。app

Map.Entry

Map 接口中有一個內部接口 Entry<K, V>。這個接口很是重要，咱們平時所說的鍵值對就是這個東西。less

它提供的方法很簡單函數

interface Entry<K,V> {

    K getKey();

    V getValue();

    V setValue(V value);

    boolean equals(Object o);

    int hashCode();

    。。。
}

獲取key，獲取value，設置value的值，equals hashCode方法。

HashMap 內部結構

定義

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable

繼承自 AbstractMap 實現了 Map 接口

看下 AbstractMap 的定義

public abstract class AbstractMap<K,V> implements Map<K,V> {

AbstractMap 是一個抽象類也實現了 Map 接口。

看到這裏就很奇怪了，爲何 AbstractMap 已經實現了 Map 接口，HashMap 還要再實現一下 Map 接口？

查詢了不少資料，聽說是做者寫得多餘了，其實 HashMap 不必再 implements Map<K, V> 一下，下面的連接有人也提出了一樣的疑問。

https://stackoverflow.com/questions/2165204/why-does-linkedhashsete-extend-hashsete-and-implement-sete

如今來看一下 HashMap 中定義的一些主要的變量

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    。。。

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

    ...

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;

    ...

}

保留了源碼中的註釋說明，基本上看下說明能夠了解這些字段的做用。

DEFAULT_INITIAL_CAPACITY 定義了初始化容量，一個map在無參數的狀況下被建立出來，默認的大小就是 1<<4 （16）。

DEFAULT_LOAD_FACTOR 默認負載因子 0.75，這個很是重要，在後面的擴容會用到。

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);

    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;

    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " + loadFactor);

    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

HashMap 提供了4個構造方法，能夠接收修改初始化大小和負載因子，可是通常狀況下就不要去修改了，避免設置得很差性能上出現問題。

MAXIMUM_CAPACITY 最大容量 1 << 30。1左移30位二進制的形勢下就是 0100 0000 0000 0000 0000 0000 0000 0000，這個的意思是2的30次方，十進制下是 1073741824。註釋說了 MUST be a power of two（必定要是2的次方），再多移動一位 1<<31 就變成負數了。

TREEIFY_THRESHOLD，UNTREEIFY_THRESHOLD， MIN_TREEIFY_CAPACITY 這幾個參數是後面當紅黑樹的參數。

結構

接下來看到2個東西 static class Node<K,V> implements Map.Entry<K,V> 和 transient Node<K,V>[] table。這2個東西就是 HashMap 的本質了。其實 HashMap 就是一個由 Node 類組成的一個二維數組，Node 是 Map.Entry 的具體實現類。

class Node<K,V>

內部定義了了4個字段，hash值，泛型key，泛型value，指向下一個Node節點的引用。

Node<K,V>[] table

The table, initialized on first use, and resized as necessary. When allocated, length is always a power of two.

table 會在第一次使用的時候初始化，而且在有必要的時候（容量超過負載因子）擴容。當擴容以後，數組的長度必定是2的n次方。（後面會解釋爲何必定是2的n次方，而不是其餘值。）

內部接口示意圖

（此圖來源於網絡）

map的大體容貌是這樣的，當put一個對象的時候會根據對象的hash值計算出它在數組中存放的位置（經過擾動函數計算，後面會講到），而後判斷這個位置上有沒有已經存在的對象，若是沒有就直接放到這個位置，若是有將已存在對象的next指向當前對象造成一個鏈表，當鏈表長度超過必定數量以後，鏈表會轉換成紅黑樹（這是java8以後的修改，爲了提高查詢效率）。因此hashmap本質上是一個二維數組加鏈表加紅黑樹的組合。

基本操做

Get

HashMap 的 get 方法以下

transient Node<K,V>[] table;

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && 
        (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
        // always check first node
        if (first.hash == hash && 
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash && 
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

先經過key獲取hash值（拿key的hashCode進行高位異或），經過key的hash值判斷出這個key應該在數組的哪一個位置讀取（first = tab[(n-1) & hash]，這個(n-1) & hash爲「擾動函數」，意在減小不一樣的key落在數組同一位置的機率，已在另外一篇文中詳細說明），經過hash值和hashcode相等來判斷該位置是否已經有元素，若是沒有返回null，若是有按鏈表順序檢索，若是鏈表爲紅黑樹，則轉換爲紅黑樹的查找，找到相同的元素即返回，沒有找到返回null。

Put

HashMap 的 put 方法以下

transient Node<K,V>[] table;

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
    return new Node<>(hash, key, value, next);
}

首先判斷table是否爲空，若是是空的那麼就進行resize（resize方法下面說明），也就是說在第一次put的時候進行擴容，接着仍是經過擾動函數算出key在數組中的位置，若是該位置沒有元素，那麼直接建立一個元素（newNode）放到該位置，若是該位置不是空的，先判斷一次節點元素和傳進來的key相同，若是不一樣判斷是不是紅黑樹，若是是則進行紅黑樹查找，若是不是則循環鏈表，若是遍歷完整個鏈表都沒有找到相同的元素，就建立一個新的元素放到鏈表的最後，若是找到就返回元素的值，最後再判斷一次數組的大小是否超過閥值，若是超過的話就要進行一個擴容。

Resize

resize 方法以下

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

對舊的容量判斷是否須要擴容，若是須要擴容，新的數據容量大小爲原來的2倍（newThr = oldThr << 1; 假設oldThr爲16，轉換成2進制以後左移一位結果是32，若是再次擴容左移一位，結果是64 ）。算出新的容量大小時候先建立指定大小的空數組，而後將原來的數組數據複製過來，輪詢原數組，利用擾動函數從新計算出位置，若是不是鏈表就直接放入，若是是鏈表以及紅黑樹，則就相應的方法複製數據。