HashMap原理學習

時間 2019-11-13

標籤 hashmap 原理學習简体版

原文原文鏈接

概述

HashMap對於作Java的小夥伴來講太熟悉了。估計大家天天都在使用它。它爲何叫作HashMap？它的內部是怎麼實現的呢？爲何咱們使用的時候不少狀況都是用String做爲它的key呢？帶着這些疑問讓咱們來了解HashMap！java

HashMap介紹

一、介紹

HashMap是一個用」KEY」-「VALUE」來實現數據存儲的類。你能夠用一個」key」去存儲數據。當你想得到數據的時候，你能夠經過」key」去獲得數據。因此你能夠把HashMap看成一個字典。那麼HashMap的名字從何而來呢？其實HashMap的由來是基於Hasing技術(Hasing),Hasing就是將很大的字符串或者任何對象轉換成一個用來表明它們的很小的值，這些更短的值就能夠很方便的用來方便索引、加快搜索。數組

在講解HashMap的存儲過程以前還須要提到一個知識點
咱們都知道在Java中每一個對象都有一個hashcode()方法用來返回該對象的 hash值。HashMap中將會用到對象的hashcode方法來獲取對象的hash值。緩存

二、關係

圖1展現了HashMap的類結構關係。ide

HashMap繼承了AbstractMap，而且支持序列化和反序列化。因爲實現了Clonable接口，也就支持clone()方法來複制一個對象。今天主要說HashMap的內部實現，這裏就不對序列化和clone作講解了。函數

三、內部介紹

上面的圖很清晰的說明了HashMap內部的實現原理。就比如一個籃子，籃子裏裝了不少蘋果，蘋果裏包含了本身的信息和另一個蘋果的引用this

一、和上圖顯示的同樣，HashMap內部包含了一個Entry類型的數組table， table裏的每個數據都是一個Entry對象。spa

二、再來看table裏面存儲的Entry類型，Entry類裏包含了hashcode變量，key,value 和另一個Entry對象。爲何要有一個Entry對象呢？其實若是你看過linkedList的源碼，你可能會知道這就是一個鏈表結構。經過我找到你，你再找到他。不過這裏的Entry並非LinkedList，它是單獨爲HashMap服務的一個內部單鏈表結構的類。.net

三、那麼Entry是一個單鏈表結構的意義又是什麼呢？在咱們瞭解了HashMap的存儲過程以後，你就會很清楚了，接着讓咱們來看HashMap怎麼工做的。3d

HashMap的存儲過程

下面分析一段代碼的HashMap存儲過程。（這裏只是做爲演示的例子，並無真實的去取到了Hash值，若是你有須要能夠經過Debug來獲得key的Hash值）code

HashMap hashMap = new HashMap();//line1  
        hashMap.put("one","hello1");//line2  
        hashMap.put("two","hello2");//line3  
        hashMap.put("three","hello3");//line4  
        hashMap.put("four","hello4");//line5  
        hashMap.put("five","hello5");//line6  
        hashMap.put("six","hello6");//line7  
        hashMap.put("seven","hello7");//line8

put操做的僞代碼能夠表示以下：

public V put(K key, V value){  
    int hash = hash(key);  
    int i = indexFor(hash, table.length);  
    //在table\[i\]的地方添加一個包含hash,key,value信息的Entry類。  
}

下面咱們來看上面代碼的過程
一、line1建立了一個HashMap，因此咱們來看構造函數

/**  
     \* Constructs an empty <tt>HashMap</tt> with the default initial capacity  
     \* (16) and the default load factor (0.75).  
     */  
    public HashMap() {  
        this(DEFAULT\_INITIAL\_CAPACITY, DEFAULT\_LOAD\_FACTOR);  
    }

空構造函數調用了它本身的另外一個構造函數，註釋說明了構建了一個初始容量的空HashMap,那咱們就來看它另一個構造函數。

public HashMap(int initialCapacity, float loadFactor) {  
        if (initialCapacity < 0)  
            throw new IllegalArgumentException("Illegal initial capacity: " +  
                                               initialCapacity);  
        if (initialCapacity > MAXIMUM_CAPACITY)  
            initialCapacity = MAXIMUM_CAPACITY;  
        if (loadFactor <= 0 || Float.isNaN(loadFactor))  
            throw new IllegalArgumentException("Illegal load factor: " +  
                                               loadFactor);

        this.loadFactor = loadFactor;  
        threshold = initialCapacity;  
        init();  
    }

void init() {  
    }

上面的代碼只是簡單的給loadFactor（實際上是數組不夠用來擴容的）和threshold(內部數組的初始化容量),init()是一個空方法。因此如今數組table仍是一個空數組。

/**  
     \* An empty table instance to share when the table is not inflated.  
     */  
    static final Entry<?,?>\[\] EMPTY_TABLE = {};

    /**  
     \* The table, resized as necessary. Length MUST Always be a power of two.  
     */  
    transient Entry<K,V>\[\] table = (Entry<K,V>\[\]) EMPTY_TABLE;

二、接下來到了line2的地方， hashMap.put(「one」,」hello1」);在這裏先提一下put方法源碼：

public V put(K key, V value) {  
        if (table == EMPTY_TABLE) {  
            inflateTable(threshold);//若是是空的，加載  
        }  
        if (key == null)  
            return putForNullKey(value);  
        int hash = hash(key);獲取hash值  
        int i = indexFor(hash, table.length);生成索引  
        for (Entry<K,V> e = table\[i\]; e != null; e = e.next) {  
            Object k;  
            //遍歷已存在的Entry，若是要存入的key和hash值都同樣就覆蓋。  
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {  
                V oldValue = e.value;  
                e.value = value;  
                e.recordAccess(this);  
                return oldValue;  
            }  
        }

        modCount++;  
        //添加一個節點  
        addEntry(hash, key, value, i);  
        return null;  
    }

源碼很簡單，先判斷table若是是空的，就初始化數組table,接着若是key是null就單獨處理。不然的話就獲得key的hash值再生成索引，這裏用了indexFor()方法生成索引是由於:hash值通常都很大，是不適合咱們的數組的。來看indexFor方法

/**  
     \* Returns index for hash code h.  
     */  
    static int indexFor(int h, int length) {  
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";  
        return h & (length-1);  
    }

就是一個&操做，這樣返回的值比較小適合咱們的數組。

繼續 line2put操做，由於開始table是空數組，因此會進入 inflateTable(threshold)方法，其實這個方法就是出實話數組容量，初始化長度是16,這個長度是在開始的構造方法賦值的。
因此，如今空數組變成了長度16的數組了，就像下圖同樣。

接着因爲咱們的key不爲null,到了獲取hash值和索引，這裏假設int hash = hash(key)和int i = indexFor(hash, table.length)生成的索引i爲hash=2306996，i = 4;那麼就會在table索引爲4的位置新建一個Entry，對應的代碼是addEntry(hash, key, value, i);到此結果以下圖：

新建的Entry內部的變量分別是,hash,key,value,和指向下一節點的next Entry。

三、繼續來看line3，line3和line2同樣，並且數組不爲空直接hash(key)和index。因此直接看圖了

四、到了line4，這裏line4狀況有點特殊，咱們假設line4裏key生成的hashcode產生的index也爲4，好比hash(「three」) 的值 63281940
hash&(15)產生的index爲4。這種狀況因爲以前的位置已經有Entry了，因此遍歷Entry若是key和hashcode都相同，就直接替換，不然新添加一個Entry,來看一下對應源碼

public V put(K key, V value) {  
        ...//一些代碼  
        for (Entry<K,V> e = table\[i\]; e != null; e = e.next) {  
            Object k;  
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {  
                V oldValue = e.value;  
                e.value = value;  
                e.recordAccess(this);  
                return oldValue;  
            }  
        }  
        //for循環裏判斷若是hash和key都同樣直接替換。

        modCount++;  
        addEntry(hash, key, value, i);//沒有重複的話就addEntry  
        return null;  
    }

上面代碼先判斷是否須要替換，不須要就調用了addEntry方法。來看addEntry

void addEntry(int hash, K key, V value, int bucketIndex) {  
        if ((size >= threshold) && (null != table\[bucketIndex\])) {  
            resize(2 * table.length);  
            hash = (null != key) ? hash(key) : 0;  
            bucketIndex = indexFor(hash, table.length);  
        }//判斷數組容量是否足夠，不足夠擴容

        createEntry(hash, key, value, bucketIndex);  
    }

裏面又調用了createEntry

void createEntry(int hash, K key, V value, int bucketIndex) {  
        Entry<K,V> e = table\[bucketIndex\];  
        table\[bucketIndex\] = new Entry<>(hash, key, value, e);  
        size++;  
        //獲取當前節點，而後新建一個含有當前hash,key,value信息的一個節點，而且該節點的Entry指向了前一個Entry並賦值給table\[index\],成爲了最新的節點Entry,同時將size加1。  
    }

到這裏相信你們很清楚了。來看看圖：

五、到這裏以後的代碼都在上面的分析狀況當中。我就不一一畫圖了，直接給出程序執行到最後的圖
line5到line8

| 代碼 | hashcode | index | key | value | next | | --- | :-: | --: | --- | --- | --- | | hashMap.put(「four」,」hello4」); | 54378290 | 9 | four | hello4 | null | | hashMap.put(「five」,」hello5」); | 39821723 | 8 | five | hello5 | null | | hashMap.put(「six」,」hello6」); | 86726537 | 4 | six | hello6 | line4產生的Entry | | hashMap.put(「seven」,」hello7」); | 28789082 | 2 | seven | hello7 | line3產生的Entry |

結果圖以下：

到此put 操做就結束了,再來看看取

HashMap的取值過程

咱們經過hashMap.get(K key) 來獲取存入的值，key的取值很簡單了。咱們經過數組的index直接找到Entry，而後再遍歷Entry，當hashcode和key都同樣就是咱們當初存入的值啦。看源碼：

public V get(Object key) {  
        if (key == null)  
            return getForNullKey();  
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();  
    }

調用getEntry(key)拿到entry ,而後返回entry的value,來看getEntry(key)方法

final Entry<K,V> getEntry(Object key) {  
        if (size == 0) {  
            return null;  
        }

        int hash = (key == null) ? 0 : hash(key);  
        for (Entry<K,V> e = table\[indexFor(hash, table.length)\];  
             e != null;  
             e = e.next) {  
            Object k;  
            if (e.hash == hash &&  
                ((k = e.key) == key || (key != null && key.equals(k))))  
                return e;  
        }  
        return null;  
    }

按什麼規則存的就按什麼規則取，獲取到hash,再獲取index,而後拿到Entry遍歷，hash相等的狀況下，若是key相等就知道了咱們想要的值。

再get方法中有null的判斷，null取hash值老是0,再getNullKey(K key)方法中，也是按照遍歷方法來查找的。

到這你確定明白了爲何HashMap能夠用null作key。

瞭解的存儲取值過程和內部實現，其它的方法本身看看源碼很好理解，在此就不一一解釋了。

幾個問題

問題一、HashMap是基於key的hashcode的存儲的，若是兩個不一樣的key產生的hashcode同樣取值怎麼辦？
看了上面的分析，你確定知道，再數組裏面有鏈表結構的Entry來實現，經過遍歷全部的Entry，比較key來肯定究竟是哪個value;

問題二、HashMap是基於key的hashcode的存儲的，若是兩個key同樣產生的hashcode同樣怎麼辦？
在put操做的時候會遍歷全部Entry，若是有key相等的則替換。因此get的時候只會有一個

問題三、咱們老是習慣用一個String做爲HashMap的key，這是爲何呢？其它的類能夠作爲HashMap的key嗎？
這裏由於String是不能夠變的，而且java爲它實現了hashcode的緩存技術。咱們在put和get中都須要獲取key的hashcode，這些方法的效率很大程度上取決於獲取hashcode的，因此用String的緣由：一、它是不可變的。二、它實現了hashcode的緩存，效率更高。若是你對String不瞭解能夠看：Java你可能不知道的事－String

問題四、可變的對象能做爲HashMap的key嗎？
可變的對象是能夠當作HashMap的key的，只是你要確保你可變變量的改變不會改變hashcode。好比如下代碼

public class TestMemory {

    public static void main(String\[\] args) {  
        HashMap hashMap = new HashMap();  
        TestKey testKey = new TestKey();  
        testKey.setAddress("sdfdsf");//line3  
        hashMap.put(testKey,"hello");  
        testKey.setAddress("sdfsdffds");//line5  
        System.out.println(hashMap.get(testKey));  
    }  
}

public class TestKey {  
    String name;  
    String address;

    public String getName() {  
        return name;  
    }

    public void setName(String name) {  
        this.name = name;  
    }

    public String getAddress() {  
        return address;  
    }

    public void setAddress(String address) {  
        this.address = address;  
    }

    @Override  
    public int hashCode() {  
        if (name==null){  
            return 0;  
        }  
        return name.hashCode();  
    }  
}

上面的代碼line3到line5對象裏的address作了改變，可是因爲hashCode是基於name來生成的，name沒變，因此依然可以正常找到value。可是若是把setAdress換成name,get就會返回null。這就是爲何咱們選擇String的緣由。