Java容器類框架分析(3)HashMap源碼分析

時間 2019-11-30

標籤 java 容器框架分析 hashmap 源碼欄目 Java 简体版

原文原文鏈接

概述

在分析HashMap的源碼以前，先看一下HashMap在數據結構中的位置，常見的數據結構按照邏輯結構跟存儲結構能夠作以下劃分：
數組

先看一下源碼中的註釋安全

Hash table based implementation of the Map interface. This
implementation provides all of the optional map operations, and permits
null values and the null key. (The HashMap
class is roughly equivalent to Hashtable, except that it is
unsynchronized and permits nulls.) This class makes no guarantees as to
the order of the map; in particular, it does not guarantee that the order
will remain constant over time.
哈希表是基於Map接口實現類。這個實現類提供全部Map接口的操做方法，Key跟Value都可以爲空。HashMap除了不是線程安全跟容許Key跟Value爲空以外，大體能夠認爲跟Hashtable相同。HashMap不保證map的順序；尤爲是，隨着時間的推移，隨着時間的推移，map的順序也會發生變化。

從註釋中能夠看出，HashMap是非線程安全的，而且容許Key跟Value爲空，同時也知道HashMap不是傳統意義上的鏈表或者數組，實質上是一個鏈表數組bash

前面分析過ArrayList跟LinkedList，各有利弊，可是實際上咱們在進行數據操做的時候但願查找跟修改效率都高起來，那麼它們倆實際上都不符合咱們的預期，因此就有了HashMap這種數據結構，下面看一下HashMap的繼承關係。數據結構

這個比較清晰，沒什麼好說的，如今分析一下HashMap的內部結構併發

正文

成員變量

//默認的初始化容量，必須是2的冪，默認爲4
    static final int DEFAULT_INITIAL_CAPACITY = 4;
     //最大容量，2的冪，而且小於1 << 30
    static final int MAXIMUM_CAPACITY = 1 << 30;
     //默認的負載因子，當構造方法中沒有指定負載因子的時候
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
     //一個空的HashMapEntry數組，table爲空的時候進行賦值
    static final HashMapEntry<?,?>[] EMPTY_TABLE = {};
    //HashMapEntry數組。長度必須是2的冪
    transient HashMapEntry<K,V>[] table = (HashMapEntry<K,V>[]) EMPTY_TABLE;
   //Map中Key——Value的對數
    transient int size;
    //閾值，size超過這個值就會進行擴容
    int threshold;
    //哈希表實際的負載因子
    final float loadFactor = DEFAULT_LOAD_FACTOR;
    //哈希表被修改的次數
    transient int modCount;複製代碼

下面看一下這個內部類ide

static class HashMapEntry<K,V> implements Map.Entry<K,V> {
        final K key;//Key值
        V value;//Value值
        HashMapEntry<K,V> next;//指向下一個HashMapEntry的指針
        int hash;//Key的hash值

        /**
         * Creates new entry.
         */
        HashMapEntry(int h, K k, V v, HashMapEntry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }複製代碼

在成員變量中有一個HashMapEntry數組，而此時的HashMapEntry中包含有指針,說明這個數組中的元素是鏈表，這樣一來就比較好理解了，HashMap的底層是一個數組，數組中的元素是一個鏈表，也就是一般所說的鏈表數組。ui

構造方法

採用默認的數組容量，默認的增加因子構造一個HashMap

public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }複製代碼

自定義初始化容量，採用默認的增加因子構造一個HashMap

public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }複製代碼

經過傳入一個Map來構造一個HashMap

public HashMap(Map<? extends K, ? extends V> m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1, DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
        inflateTable(threshold);
        putAllForCreate(m);
    }複製代碼

自定義初始化容量跟增加因子構造一個HashMap

public HashMap(int initialCapacity, float loadFactor) {
    //檢查初始化容量是否合乎規範
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY) {
            initialCapacity = MAXIMUM_CAPACITY;
        } else if (initialCapacity < DEFAULT_INITIAL_CAPACITY) {
            initialCapacity = DEFAULT_INITIAL_CAPACITY;
        }
    //檢查負載因子是否合乎規範
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +loadFactor);
        threshold = initialCapacity;//初始化的時候閾值是默認跟容量相同，當size改變的時候會從新賦值
        init();//空實現
    }複製代碼

void init() { }複製代碼

存儲元素

public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);//首次添加元素，擴容
        }
        //Key爲空時，單獨處理
        if (key == null)
            return putForNullKey(value);
        int hash = sun.misc.Hashing.singleWordWangJenkinsHash(key);
        int i = indexFor(hash, table.length);
        for (HashMapEntry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);//增長一個entry元素
        return null;
    }複製代碼

根據Hash值與table的size來計算entry在數組中的位置this

static int indexFor(int h, int length) {
        return h & (length-1);
    }複製代碼

首次設置table的容量spa

private void inflateTable(int toSize) {
        // 若是toSize不是2的冪，那麼久將其轉化成值最相近的2的冪
        int capacity = roundUpToPowerOf2(toSize);
        //計算閾值
        float thresholdFloat = capacity * loadFactor;
        if (thresholdFloat > MAXIMUM_CAPACITY + 1) {
            thresholdFloat = MAXIMUM_CAPACITY + 1;
        }
        //threshold 從新賦值
        threshold = (int) thresholdFloat;
        //自定義容量初始化table
        table = new HashMapEntry[capacity];
    }複製代碼

//處理key爲空的狀況線程

private V putForNullKey(V value) {
        for (HashMapEntry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);//增長一個entry元素,key爲null的時候hash值設置爲0
        return null;
    }複製代碼

咱們看到不論是處理Key爲空仍是不爲空，最後都須要調用addEntry方法，下面來分析一下addEntry

void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            //擴容，擴容因子是2
            resize(2 * table.length);
            //計算key的hash值
            hash = (null != key) ? sun.misc.Hashing.singleWordWangJenkinsHash(key) : 0;
            //根據hash和table的size來計算桶下表，也就是該元素在數組中的位置
            bucketIndex = indexFor(hash, table.length);
        }
        //生成一個新的Entry數組
        createEntry(hash, key, value, bucketIndex);
    }複製代碼

HashMap擴容

void resize(int newCapacity) {
        HashMapEntry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
        //建立一個新的table數組，容量爲舊數組的2倍
        HashMapEntry[] newTable = new HashMapEntry[newCapacity];
        //將舊數組中的元素所有轉移到新數組裏面
        transfer(newTable);
        //將新數組賦值table
        table = newTable;
        //從新計算閾值
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }複製代碼

計算元素的下標

static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);
    }複製代碼

生成一個新數組

void createEntry(int hash, K key, V value, int bucketIndex) {
        //無論Key是否重複，都會去取下標爲bucketIndex的元素
        HashMapEntry<K,V> e = table[bucketIndex];
        //而後從新給bucketIndex進行賦值，若是出現了hash衝突，就把最後添加的這個entry的指針指向上一個元素
        table[bucketIndex] = new HashMapEntry<>(hash, key, value, e);
        //size自增
        size++;
    }複製代碼

基本上到這裏，put方法已經比較清楚了，，而後將hash值跟table的size進行位運算獲得該元素的下標，而後再數組中新增該元素，若是出現了hash衝突，那麼不會刪除該元素，會將最新的entry放在該位置，而且將entry的指針指向上一個元素，下面用一張圖來解釋。

上面這個圖是根據HashMap的原理進行繪製的，我定義了容量爲4(2的整數次冪)的Entry數組，能夠看到每一個Entry都有一個next指針，當有hash衝突的時候，新加入的entry的指針會指向上一個entry，不然指向Null。

讀取元素

public V get(Object key) {
    //先判斷Key值是否爲空，爲空單獨處理
        if (key == null)
            return getForNullKey();
    //經過getEntry獲取相應的Value
      Entry<K,V> entry = getEntry(key);
      return null == entry ? null : entry.getValue();
    }複製代碼

getForNullKey

private V getForNullKey() {
        if (size == 0) {
            return null;
        }
        //遍歷整個數組，獲取對應的Value
        for (HashMapEntry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }複製代碼

getEntry

final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }
        //先計算hash值
        int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);
        //經過hash值獲得key在數組中的下標,遍歷此下標entry的鏈表
        for (HashMapEntry<K,V> e = table[indexFor(hash, table.length)]; e != null;e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }複製代碼

跟put的套路基本同樣，須要注意的是在key非空的時候，須要遍歷entry鏈表上面的全部Key，由於有可能這個鏈表衝突了，就是說不通的key對應的hash值是同樣的，因此須要經過key跟hash雙重判斷

移除元素

public V remove(Object key) {
        Entry<K,V> e = removeEntryForKey(key);
        return (e == null ? null : e.getValue());
    }複製代碼

removeEntryForKey

final Entry<K,V> removeEntryForKey(Object key) {
        if (size == 0) {
            return null;
        }
        int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);
        int i = indexFor(hash, table.length);
        //獲取當前key對應的下標的entry
        HashMapEntry<K,V> prev = table[i];
        //將entry給e
        HashMapEntry<K,V> e = prev;
        //遍歷整個entry下面的整個鏈表
        while (e != null) {
            HashMapEntry<K,V> next = e.next;
            Object k;
            if (e.hash == hash &&((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                //若是是鏈表的head，須要對table從新賦值
                    table[i] = next;
                else
                //若是是鏈表的中間位置，只須要改變head的指針便可
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            //將當前的entry傳給pre
            prev = e;
            //繼續遍歷
            e = next;
        }

        return e;
    }複製代碼

包含某個Key

public boolean containsKey(Object key) {
  //跟移除同樣，調用getEntry
        return getEntry(key) != null;
    }複製代碼

包含某個Value

public boolean containsValue(Object value) {
   //Null單獨處理
        if (value == null)
            return containsNullValue();
        HashMapEntry[] tab = table;
        //遍歷整個table數組
        for (int i = 0; i < tab.length ; i++)
        //遍歷entry的整個鏈表
            for (HashMapEntry e = tab[i] ; e != null ; e = e.next)
                if (value.equals(e.value))
                    return true;
        return false;
    }複製代碼

private boolean containsNullValue() {
    //跟非空同樣
        HashMapEntry[] tab = table;
        for (int i = 0; i < tab.length ; i++)
            for (HashMapEntry e = tab[i] ; e != null ; e = e.next)
                if (e.value == null)
                    return true;
        return false;
    }複製代碼

清空table

調用Arrays的fill方法

public void clear() {
        modCount++;
        Arrays.fill(table, null);
        size = 0;
    }複製代碼

遍歷

遍歷key

public Set<K> keySet() {
        Set<K> ks = keySet;
        return (ks != null ? ks : (keySet = new KeySet()));
    }複製代碼

遍歷value

public Collection<V> values() {
        Collection<V> vs = values;
        return (vs != null ? vs : (values = new Values()));
    }複製代碼

遍歷entry

private Set<Map.Entry<K,V>> entrySet() {
        Set<Map.Entry<K,V>> es = entrySet;
        //首次調用爲空，會調用new EntrySet
        return es != null ? es : (entrySet = new EntrySet());
    }複製代碼

因爲遍歷最後都是調用的都是Collection的iterator方法，看一下他們的實現：

Key持有的Iterator爲KeyIterator

Iterator<K> newKeyIterator()   {
        return new KeyIterator();
    }複製代碼

Value持有的Iterator爲ValueIterator

Iterator<V> newValueIterator()   {
        return new ValueIterator();
    }

       private final class ValueIterator extends HashIterator<V> {
        public V next() {
            return nextEntry().getValue();
        }
    }複製代碼

Entry持有的Iterator爲EntryIterator

Iterator<Map.Entry<K,V>> newEntryIterator()   {
        return new EntryIterator();
    }

    private final class EntryIterator extends HashIterator<Map.Entry<K,V>> {
        public Map.Entry<K,V> next() {
            return nextEntry();
        }
    }複製代碼

上述的三個Iterator都是繼承自HashIterator，只是複寫了next方法而已，因此如今只須要研究一下HashIterator

HashIterator

private abstract class HashIterator<E> implements Iterator<E> {
        HashMapEntry<K,V> next;        // 下一個須要返回的entry
        int expectedModCount;   // 當expectedModCount跟modCount不同時，會報異常,快速失敗
        int index;              // current slot，當前節點
        HashMapEntry<K,V> current;     // 當前的entry

        HashIterator() {
            expectedModCount = modCount;
            if (size > 0) { // advance to first entry
                HashMapEntry[] t = table;
                //遍歷整個table，不爲空的時候而後退出循環，此時將next指向第一個元素
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
        }

        public final boolean hasNext() {
            return next != null;
        }
     //獲取下一個元素
        final Entry<K,V> nextEntry() {
        modCount與expectedModCount不相等的時候，說明有多個線程同時操做Map
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            HashMapEntry<K,V> e = next;
            if (e == null)
                throw new NoSuchElementException();
            if ((next = e.next) == null) {
                HashMapEntry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
            current = e;
            return e;
        }
    //刪除某一個元素
        public void remove() {
            if (current == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Object k = current.key;
            current = null;
            HashMap.this.removeEntryForKey(k);
            expectedModCount = modCount;
        }
    }複製代碼

有幾點須要注意：

快速失敗

Note that fail-fast behavior cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast operations throw {@code ConcurrentModificationException} on a best-effort basis.
Therefore, it would be wrong to write a program that depended on this exception for its correctness: should be used only to detect bugs.
迭代器的快速失敗行爲不能獲得保證，通常來講，存在非同步的併發修改時，不可能做出任何堅定的保證。快速失敗迭代器盡最大努力拋出 ConcurrentModificationException。所以，編寫依賴於此異常的程序的作法是錯誤的，正確作法是：迭代器的快速失敗行爲應該僅用於檢測程序錯誤。

小結

HashMap支持擴容，默認的擴容因子是2
HashMap存儲數據是無序的，若是key值爲null，則放在第一個位置
HashMap的key能夠爲空，若是不爲空須要複寫hashCode跟equal方法
HashMap是非線程安全的：若是想保證線程安全，可使用Collections.synchronizedMap()或者ConcurrentHashMap，不建議使用Hashtable，效率較低