HashMap

時間 2019-11-21

標籤 hashmap 简体版

原文原文鏈接

HashMap實現了Map接口，並繼承 AbstractMap 抽象類，其中 Map 接口定義了鍵值映射規則。AbstractMap 抽象類提供了 Map 接口的骨幹實現，以最大限度地減小實現Map接口所需的工做。html

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable{
...
}

初始容量 和 負載因子(通常默認0.75)，這兩個參數是影響HashMap性能的重要參數。其中，容量表示哈希表中桶的數量 (table 數組的大小)，初始容量是建立哈希表時桶的數量；負載因子是哈希表在其容量自動增長以前能夠達到多滿的一種尺度，它衡量的是一個散列表的空間的使用程度，負載因子越大表示散列表的裝填程度越高，反之愈小。

java

哈希的相關概念

　　Hash 就是把任意長度的輸入(又叫作預映射， pre-image)，經過哈希算法，變換成固定長度的輸出(一般是整型)，該輸出就是哈希值。這種轉換是一種 壓縮映射 ，也就是說，散列值的空間一般遠小於輸入的空間。不一樣的輸入可能會散列成相同的輸出，從而不可能從散列值來惟一的肯定輸入值。簡單的說，就是一種將任意長度的消息壓縮到某一固定長度的息摘要函數。算法

 1  /**
 2      * Constructs an empty HashMap with the default initial capacity
 3      * (16) and the default load factor (0.75).
 4      */
 5     public HashMap() {
 6 
 7         //負載因子:用於衡量的是一個散列表的空間的使用程度
 8         this.loadFactor = DEFAULT_LOAD_FACTOR; 
 9 
10         //HashMap進行擴容的閾值，它的值等於 HashMap 的容量乘以負載因子
11         threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
12 
13         // HashMap的底層實現還是數組，只是數組的每一項都是一條鏈
14         table = new Entry[DEFAULT_INITIAL_CAPACITY];
15 
16         init();
17     }
18 
19 
20 
21 static class Entry<K,V> implements Map.Entry<K,V> {
22 
23     final K key;     // 鍵值對的鍵
24     V value;        // 鍵值對的值
25     Entry<K,V> next;    // 下一個節點
26     final int hash;     // hash(key.hashCode())方法的返回值
27 
28     /**
29      * Creates new entry.
30      */
31     Entry(int h, K k, V v, Entry<K,V> n) {     // Entry 的構造函數
32         value = v;
33         next = n;
34         key = k;
35         hash = h;
36     }
37 
38     ......
39 
40 }

View Code

其中，Entry爲HashMap的內部類，實現了 Map.Entry 接口，其包含了鍵key、值value、下一個節點next，以及hash值四個屬性。事實上，Entry 是構成哈希表的基石，是哈希表所存儲的元素的具體形式。數組

HashMap 的存儲實現

在 HashMap 中，鍵值對的存儲是經過 put(key,vlaue) 方法來實現的，其源碼以下：數據結構

 1 public V put(K key, V value) {
 2 
 3         //當key爲null時，調用putForNullKey方法，並將該鍵值對保存到table的第一個位置 
 4         if (key == null)
 5             return putForNullKey(value); 
 6 
 7         //根據key的hashCode計算hash值
 8         int hash = hash(key.hashCode());             //  ------- (1)
 9 
10         //計算該鍵值對在數組中的存儲位置（哪一個桶）
11         int i = indexFor(hash, table.length);              // ------- (2)
12 
13         //在table的第i個桶上進行迭代，尋找 key 保存的位置
14         for (Entry<K,V> e = table[i]; e != null; e = e.next) {      // ------- (3)
15             Object k;
16             //判斷該條鏈上是否存在hash值相同且key值相等的映射，若存在，則直接覆蓋 value，並返回舊value
17             if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
18                 V oldValue = e.value;
19                 e.value = value;
20                 e.recordAccess(this);
21                 return oldValue;    // 返回舊值
22             }
23         }
24 
25         modCount++; //修改次數增長1，快速失敗機制
26 
27         //原HashMap中無該映射，將該添加至該鏈的鏈頭
28         addEntry(hash, key, value, i);            
29         return null;
30     }

View Code

在上述的 put(key,vlaue) 方法的源碼中，咱們標出了 HashMap 中的哈希策略（即(1)、(2)兩處），hash() 方法用於對Key的hashCode進行從新計算，而 indexFor() 方法用於生成這個Entry對象的插入位置。當計算出來的hash值與hashMap的(length-1)作了&運算後，會獲得位於區間[0，length-1]的一個值。特別地，這個值分佈的越均勻， HashMap 的空間利用率也就越高，存取效率也就越好。app

使用hash()方法對一個對象的hashCode進行從新計算是爲了防止質量低下的hashCode()函數實現。因爲hashMap的支撐數組長度老是 2 的冪次，經過右移可使低位的數據儘可能的不一樣，從而使hash值的分佈儘可能均勻。更多關於該 hash(int h)方法的介紹請見《HashMap hash方法分析》less

咱們知道，HashMap的底層數組長度老是2的n次方。當length爲2的n次方時，h&(length - 1)就至關於對length取模，並且速度比直接取模要快得多，這是HashMap在速度上的一個優化。ide

內部源碼以下:函數

  1 /**
  2      * Offloaded version of put for null keys
  3      */
  4     private V putForNullKey(V value) {
  5         // 若key==null，則將其放入table的第一個桶，即 table[0]
  6         for (Entry<K,V> e = table[0]; e != null; e = e.next) {   
  7             if (e.key == null) {   // 若已經存在key爲null的鍵，則替換其值，並返回舊值
  8                 V oldValue = e.value;
  9                 e.value = value;
 10                 e.recordAccess(this);
 11                 return oldValue;
 12             }
 13         }
 14         modCount++;        // 快速失敗
 15         addEntry(0, null, value, 0);       // 不然，將其添加到 table[0] 的桶中
 16         return null;
 17     }
 18 ————————————————
 19 /**
 20      * Applies a supplemental hash function to a given hashCode, which
 21      * defends against poor quality hash functions.  This is critical
 22      * because HashMap uses power-of-two length hash tables, that
 23      * otherwise encounter collisions for hashCodes that do not differ
 24      * in lower bits. 
 25      * 
 26      * Note: Null keys always map to hash 0, thus index 0.
 27      */
 28     static int hash(int h) {
 29         // This function ensures that hashCodes that differ only by
 30         // constant multiples at each bit position have a bounded
 31         // number of collisions (approximately 8 at default load factor).
 32         h ^= (h >>> 20) ^ (h >>> 12);
 33         return h ^ (h >>> 7) ^ (h >>> 4);
 34     }
 35 ————————————————
 36 /**
 37      * Returns index for hash code h.
 38      */
 39     static int indexFor(int h, int length) {
 40         return h & (length-1);  // 做用等價於取模運算，但這種方式效率更高
 41     }
 42 ————————————————
 43 /**
 44      * Adds a new entry with the specified key, value and hash code to
 45      * the specified bucket.  It is the responsibility of this
 46      * method to resize the table if appropriate.
 47      *
 48      * Subclass overrides this to alter the behavior of put method.
 49      * 
 50      * 永遠都是在鏈表的表頭添加新元素
 51      */
 52     void addEntry(int hash, K key, V value, int bucketIndex) {
 53 
 54         //獲取bucketIndex處的鏈表
 55         Entry<K,V> e = table[bucketIndex];
 56 
 57         //將新建立的 Entry 鏈入 bucketIndex處的鏈表的表頭 
 58         table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
 59 
 60         //若HashMap中元素的個數超過極限值 threshold，則容量擴大兩倍
 61         if (size++ >= threshold)
 62             resize(2 * table.length);
 63     }
 64 ————————————————
 65 /**
 66      * Rehashes the contents of this map into a new array with a
 67      * larger capacity.  This method is called automatically when the
 68      * number of keys in this map reaches its threshold.
 69      *
 70      * If current capacity is MAXIMUM_CAPACITY, this method does not
 71      * resize the map, but sets threshold to Integer.MAX_VALUE.
 72      * This has the effect of preventing future calls.
 73      *
 74      * @param newCapacity the new capacity, MUST be a power of two;
 75      *        must be greater than current capacity unless current
 76      *        capacity is MAXIMUM_CAPACITY (in which case value
 77      *        is irrelevant).
       隨着HashMap中元素的數量愈來愈多，發生碰撞的機率將愈來愈大，所產生的子鏈長度就會愈來愈長，這樣勢必會影響HashMap的存取速度。爲了保證
       HashMap的效率，系統必需要在某個臨界點進行擴容處理，該臨界點就是HashMap中元素的數量在數值上等於threshold（table數組長度*加載因子）。
       可是，不得不說，擴容是一個很是耗時的過程，由於它須要從新計算這些元素在新table數組中的位置並進行復制處理。因此，若是咱們可以提早預知HashMap 
       中元素的個數，那麼在構造HashMap時預設元素的個數可以有效的提升HashMap的性能。

 78      */
 79     void resize(int newCapacity) {
 80         Entry[] oldTable = table;
 81         int oldCapacity = oldTable.length;
 82 
 83         // 若 oldCapacity 已達到最大值，直接將 threshold 設爲 Integer.MAX_VALUE
 84         if (oldCapacity == MAXIMUM_CAPACITY) {  
 85             threshold = Integer.MAX_VALUE;
 86             return;             // 直接返回
 87         }
 88 
 89         // 不然，建立一個更大的數組
 90         Entry[] newTable = new Entry[newCapacity];
 91 
 92         //將每條Entry從新哈希到新的數組中
 93         transfer(newTable);
 94 
 95         table = newTable;
 96         threshold = (int)(newCapacity * loadFactor);  // 從新設定 threshold
 97     }
 98 ————————————————
 99  /**
100      * Transfers all entries from current table to newTable.重哈希的主要是一個從新計算原HashMap中的元素在新table數組中的位置並進行復制處理的過程
101      */
102     void transfer(Entry[] newTable) {
103 
104         // 將原數組 table 賦給數組 src
105         Entry[] src = table;
106         int newCapacity = newTable.length;
107 
108         // 將數組 src 中的每條鏈從新添加到 newTable 中
109         for (int j = 0; j < src.length; j++) {
110             Entry<K,V> e = src[j];
111             if (e != null) {
112                 src[j] = null;   // src 回收
113 
114                 // 將每條鏈的每一個元素依次添加到 newTable 中相應的桶中
115                 do {
116                     Entry<K,V> next = e.next;
117 
118                     // e.hash指的是 hash(key.hashCode())的返回值;
119                     // 計算在newTable中的位置，注意原來在同一條子鏈上的元素可能被分配到不一樣的子鏈
120                     int i = indexFor(e.hash, newCapacity);   
121                     e.next = newTable[i];
122                     newTable[i] = e;
123                     e = next;
124                 } while (e != null);
125             }
126         }
127     }
128 ————————————————

總而言之，上述的hash()方法和indexFor()方法的做用只有一個：保證元素均勻分佈到table的每一個桶中以便充分利用空間。性能

HashMap 永遠都是在鏈表的表頭添加新元素。此外，若HashMap中元素的個數超過極限值 threshold，其將進行擴容操做，通常狀況下，容量將擴大至原來的兩倍。

HashMap 的讀取實現

 1 /**
 2      * Returns the value to which the specified key is mapped,
 3      * or {@code null} if this map contains no mapping for the key.
 4      *
 5      * <p>More formally, if this map contains a mapping from a key
 6      * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 7      * key.equals(k))}, then this method returns {@code v}; otherwise
 8      * it returns {@code null}.  (There can be at most one such mapping.)
 9      *
10      * <p>A return value of {@code null} does not <i>necessarily</i>
11      * indicate that the map contains no mapping for the key; it's also
12      * possible that the map explicitly maps the key to {@code null}.
13      * The {@link #containsKey containsKey} operation may be used to
14      * distinguish these two cases.
15      *
16      * @see #put(Object, Object)
17      */
18     public V get(Object key) {
19         // 若爲null，調用getForNullKey方法返回相對應的value
20         if (key == null)
21             // 從table的第一個桶中尋找 key 爲 null 的映射；若不存在，直接返回null
22             return getForNullKey();  
23 
24         // 根據該 key 的 hashCode 值計算它的 hash 碼 
25         int hash = hash(key.hashCode());
26         // 找出 table 數組中對應的桶
27         for (Entry<K,V> e = table[indexFor(hash, table.length)];
28              e != null;
29              e = e.next) {
30             Object k;
31             //若搜索的key與查找的key相同，則返回相對應的value
32             if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
33                 return e.value;
34         }
35         return null;
36     }

View Code

/**
     * Offloaded version of get() to look up null keys.  Null keys map
     * to index 0.  This null case is split out into separate methods
     * for the sake of performance in the two most commonly used
     * operations (get and put), but incorporated with conditionals in
     * others.
     */
    private V getForNullKey() {
        // 鍵爲NULL的鍵值對若存在，則一定在第一個桶中
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        // 鍵爲NULL的鍵值對若不存在，則直接返回 null
        return null;
    }

HashMap 的底層數組長度爲什麼老是2的n次方？

HashMap 中的數據結構是一個數組鏈表，咱們但願的是元素存放的越均勻越好。最理想的效果是，Entry數組中每一個位置都只有一個元素，這樣，查詢的時候效率最高，不須要遍歷單鏈表，也不須要經過equals去比較Key，並且空間利用率最大。

那如何計算纔會分佈最均勻呢？HashMap採用了一個分兩步走的哈希策略：

1.使用 hash() 方法用於對Key的hashCode進行從新計算，以防止質量低下的hashCode()函數實現。因爲hashMap的支撐數組長度老是 2 的倍數，經過右移可使低位的數據儘可能的不一樣，從而使Key的hash值的分佈儘可能均勻；

// HashMap 的容量必須是2的冪次方，超過 initialCapacity 的最小 2^n 
int capacity = 1;
while (capacity < initialCapacity)
    capacity <<= 1;

2.使用 indexFor() 方法進行取餘運算，以使Entry對象的插入位置儘可能分佈均勻

《java提升篇（二三）—–HashMap》

總結:

不一樣的hash值發生碰撞的機率比較小，這樣就會使得數據在table數組中分佈較均勻，空間利用率較高，查詢速度也較快；

h&(length - 1) 就至關於對length取模，並且在速度、效率上比直接取模要快得多，即兩者是等價不等效的，這是HashMap在速度和效率上的一個優化。

http://www.javashuo.com/article/p-uizorzvn-a.html

注:HashMap 和 ConcurrentHashMap 在 1.7 和 1.8 中不一樣的實現方式

http://www.javashuo.com/article/p-aqrblyjw-n.html

相關標籤/搜索

identityhashmap&hashmap

jvm&nio&hashmap

集合--HashMap

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。