Hashmap 實現方式 jdk1.7 和 1.8區別

時間 2020-06-02

標籤 hashmap 實現方式 jdk1.7 jdk 1.8 區別欄目 Java 简体版

原文原文鏈接

hashmap 是很經常使用的一種集合框架，其底層實現方式在 jdk1.7和 jdk1.8中卻有很大區別，今天咱們經過看源碼的方式來研究下它們之間的區別。算法

hashmap 是用來存儲數據的，它底層數據結構是數組，數組中元素是鏈表或紅黑樹，經過對 key 進行哈希計算等操做後獲得數組下標，把 value 等信息放在鏈表或紅黑樹存在此位置。若是兩個不一樣的 key 運算後獲取的數組下標一致，就出現了哈希衝突。數組默認長度是16，若是實際數組長度超過必定的值，就會進行擴容。在我看來，1.7和1.8主要在處理哈希衝突和擴容問題上區別比較大。數組

首先看下 jdk1.7數據結構

存放數據的數組app

put 方法源碼，我都加了註釋框架

public V put(K key, V value) {
　　　　　//數組爲空就進行初始化 if (table == EMPTY_TABLE) { inflateTable(threshold); } if (key == null) return putForNullKey(value);
　　　　　//key 進行哈希計算 int hash = hash(key);
　　　　　//獲取數組下標 int i = indexFor(hash, table.length);
　　　　　//若是此下標有值，遍歷鏈表上的元素，key 一致的話就替換 value 的值 for (Entry<K,V> e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++;
　　　　　//新增一個key addEntry(hash, key, value, i); return null; }

addEntry源碼

void addEntry(int hash, K key, V value, int bucketIndex) {
　　　　　//數組長度大於閾值且存在哈希衝突（即當前數組下標有元素），就將數組擴容至2倍 if ((size >= threshold) && (null != table[bucketIndex])) { resize(2 * table.length); hash = (null != key) ? hash(key) : 0; bucketIndex = indexFor(hash, table.length); } createEntry(hash, key, value, bucketIndex); }

繼續看 createEntry 源碼this

void createEntry(int hash, K key, V value, int bucketIndex) {
　　　　　//此位置有元素，就在鏈表頭部插入新元素（頭插法） Entry<K,V> e = table[bucketIndex]; table[bucketIndex] = new Entry<>(hash, key, value, e); size++; }

這裏能夠看到 jdk 1.7擴容的條件是數組長度大於閾值且存在哈希衝突，由此咱們能夠想象，默認長度爲16的狀況下，數組最多能夠存27個元素後才擴容，緣由是在一個下標存儲12個元素後（閾值爲12），在剩下的15個下標各存一個元素，最多就可存27個元素，固然這種是很偶然的狀況。不過也能夠看到 jdk1.7 中，這個閾值的做用並非特別的大，並非超過閾值就必定會擴容。spa

下面來看看 jdk1.8 的源碼code

存放數據的數組blog

這裏 hash算法發生了變化，不過這不是重點，咱們繼續看下 put 的源碼源碼

public V put(K key, V value) { return putVal(hash(key), key, value, false, true); }

putVal 源碼

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i;
　　　　　//數組爲空就初始化 if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length;
　　　　　//當前下標爲空，就直接插入 if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k;
　　　　　　　//key 相同就覆蓋原來的值 if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p;
　　　　　　　//樹節點插入數據 else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) {
　　　　　　　　　　　　//鏈表，尾插法插入數據 if ((e = p.next) == null) { p.next = newNode(hash, key, value, null);
　　　　　　　　　　　　　　//鏈表長度超過8，就把鏈表轉爲紅黑樹 if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
 treeifyBin(tab, hash); break; }
　　　　　　　　　　　　//key相同就覆蓋原來的值 if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } if (e != null) { // existing mapping for key
                V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount;
　　　　　//數組長度大於閾值，就擴容 if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }

繼續看下 treeifyBin 的源碼

final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e;
　　　　　//鏈表轉爲紅黑樹時，若此時數組長度小於64，擴容數組 if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); else if ((e = tab[index = (n - 1) & hash]) != null) { TreeNode<K,V> hd = null, tl = null;
　　　　　　　//鏈表轉爲樹結構 do { TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null); if ((tab[index] = hd) != null) hd.treeify(tab); } }

由此能夠看到1.8中，數組有兩種狀況會發生擴容，一種是超過閾值，一種是鏈表轉爲紅黑樹且數組元素小於64時，由此在jdk1.8中，默認長度爲16狀況下，要麼元素一直放在同一下標，數組長度爲9時就會擴容，要麼超過閾值12時纔會擴容。

經過上面的分析，咱們能夠看到jdk1.7和1.8狀況下 hashmap實現方式的主要區別

1. 出現哈希衝突時，1.7把數據存放在鏈表，1.8是先放在鏈表，鏈表長度超過8就轉成紅黑樹

2. 1.7擴容條件是數組長度大於閾值且存在哈希衝突，1.8擴容條件是數組長度大於閾值或鏈表轉爲紅黑樹且數組元素小於64時

這篇文章我只是大概分析下 hashmap 在兩個jdk版本中實現方式的差別，不少如鏈表怎麼轉紅黑樹的，怎麼擴容的細節沒有很清楚的說明，主要這部分也涉及到數據結構的內容，我對這方面瞭解的還不夠透徹。但之因此鏈表要轉成紅黑樹，仍是爲了解決存取效率的問題。鏈表過長，取數據的效率就很慢，紅黑樹插入比較慢，但取數據仍是很快的。

使用 hashmap 時，一開始最好指定下長度，畢竟擴容時，須要從新根據 key 計算數組下標，仍是很影響效率的。