Map底層數據結構分析（HashMap TreeMap）

時間 2019-11-07

標籤 map 底層數據結構分析 hashmap treemap 简体版

原文原文鏈接

HashMap

結構特色 java

一、table是一個Entry[]數組類型，而Entry實際上就是一個單向鏈表。哈希表的"key-value鍵值對"都是存儲在Entry數組中的。

static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
}
複製代碼

二、size是HashMap的大小，它是HashMap保存的鍵值對的數量。算法

/**
    * The number of key-value mappings contained in this map.
*/
transient int size;

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}
複製代碼

三、threshold是HashMap的閾值，用於判斷是否須要調整HashMap的容量。當HashMap中存儲數據的數量達到threshold時，就須要將HashMap的容量加倍。數組

/**
 * The next size value at which to resize (capacity * load factor).
 * @serial
 */
// If table == EMPTY_TABLE then this is the initial capacity at which the
// table will be created when inflated.
int threshold;

private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    int capacity = roundUpToPowerOf2(toSize);
    // 通常狀況下 threshold = capacity * loadFactor
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}
複製代碼

四、上述代碼中的loadFactor就是加載因子。緩存

/**
 * The load factor for the hash table.
 *
 * @serial
 */
final float loadFactor;

若是加載因子越大，對空間的利用更充分，可是查找效率會下降（鏈表長度會愈來愈長）。若是加載因子過小，那麼表中的數據將過於稀疏（不少空間還沒用，就開始擴容了），對空間形成嚴重浪費。若是咱們在構造方法中不指定，則系統默認加載因子爲0.75，這是一個比較理想的值，通常狀況下咱們是無需修改的。
複製代碼

擴容安全

一、容量特色：不管咱們指定的容量爲多少，構造方法都會將實際容量設爲不小於指定容量的2的次方的一個數，且最大值不能超過2的30次方bash

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    init();
}

在構造方法中，只是進行了簡單的初始化操做，容量的真實值並非這裏肯定的。

public V put(K key, V value) {
// 若是是首次添加，進行容量的初始化
if (table == EMPTY_TABLE) {
    inflateTable(threshold);
}
if (key == null)
    return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
        V oldValue = e.value;
        e.value = value;
        e.recordAccess(this);
        return oldValue;
    }
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
複製代碼

在put()方法中，對首次調用時候進行了判斷，進行了數組的初始化操做,調用了初始化數組的方法inflateTable(threshold)。app

private void inflateTable(int toSize) {
// Find a power of 2 >= toSize
int capacity = roundUpToPowerOf2(toSize);
threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
table = new Entry[capacity];
initHashSeedAsNeeded(capacity);
}
複製代碼

加載因子：加載因子越大，數組填充的越滿，這樣能夠有效的利用空間，可是有一個弊端就是可能會致使衝突的加大，鏈表過長，反過來卻又會形成內存空間的浪費。因此只能須要在空間和時間中找一個平衡點，那就是設置有效的加載因子。咱們知道，不少時候爲了提升查詢效率的作法都是犧牲空間換取時間，到底該怎麼取捨，那就要具體分析。 hashCode() 和 equals()ide

一、 hashCode的存在主要是用於查找的快捷性，如Hashtable，HashMap等，hashCode是用來在散列存儲結構中肯定對象的存儲地址的；學習

int hash = hash(key);
int i = indexFor(hash, table.length);
static int indexFor(int h, int length) {
// assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
return h & (length-1);
}
複製代碼

因爲保證了數組的容量是一個偶數，h & (length-1)可以均勻的分佈在散列表。ui

二、若是兩個對象相同，就是適用於equals(java.lang.Object) 方法，那麼這兩個對象的hashCode必定要相同；

// 對key爲空值的處理，HashMap中key值能夠爲空
if (key == null)
   return putForNullKey(value);
// hash值和equals()方法在put()方法中的運用
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
    V oldValue = e.value;
    e.value = value;
    e.recordAccess(this);
    return oldValue;
}
}
複製代碼

三、若是對象的equals方法被重寫，那麼對象的hashCode也儘可能重寫，而且產生hashCode使用的對象，必定要和equals方法中使用的一致，不然就會違反上面提到的第2點；

四、兩個對象的hashCode相同，並不必定表示兩個對象就相同，也就是不必定適用於equals(java.lang.Object) 方法，只可以說明這兩個對象在散列存儲結構中相同的索引位，如Hashtable，他們「存放在同一個籃子裏」。 HsdhMap 概述

一、LinkedHashMap是HashMap的子類，與HashMap有着一樣的存儲結構，但它加入了一個雙向鏈表的頭結點，將全部put到LinkedHashmap的節點一一串成了一個雙向循環鏈表，所以它保留了節點插入的順序，可使節點的輸出順序與輸入順序相同。

/**
 * The head of the doubly linked list.
 */
// 記錄添加順序的雙向鏈表
private transient Entry<K,V> header;

// 記錄添加順序的雙向鏈表
private static class Entry<K,V> extends HashMap.Entry<K,V> {
// These fields comprise the doubly linked list used for iteration.
Entry<K,V> before, after;
}

// 添加節點方法
void addEntry(int hash, K key, V value, int bucketIndex) {
super.addEntry(hash, key, value, bucketIndex);

// Remove eldest entry if instructed
// 添加到雙向節點節點的頭部
Entry<K,V> eldest = header.after;
// 若是removeEldestEntry(eldest)爲true，則刪除最舊的節點，默認爲false
if (removeEldestEntry(eldest)) {
    removeEntryForKey(eldest.key);
}
}
複製代碼

二、LinkedHashMap能夠用來實現LRU算法。

三、LinkedHashMap一樣是非線程安全的，只在單線程環境下使用。 LRU算法

LRU（Least recently used，最近最少使用）算法根據數據的歷史訪問記錄來進行淘汰數據，其核心思想是「若是數據最近被訪問過，那麼未來被訪問的概率也更高」。

新數據插入到鏈表頭部。每當緩存命中（即緩存數據被訪問），則將數據移到鏈表頭部。當鏈表滿的時候，將鏈表尾部的數據丟棄。

上面3條中，LinkedHashMap實現了第一條，可是沒有實現二、3兩條規定

先看第2條，訪問數據（查詢，更新）時，在LinkedHashMap中，會作什麼事情

首先看看更新操做，LinkedHashMap的更新操做，其實就是使用了HashMap的put()方法

public V put(K key, V value) {
if (table == EMPTY_TABLE) {
    inflateTable(threshold);
}
if (key == null)
    return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    Object k;
    // 若是是已經存在的數據
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
        V oldValue = e.value;
        e.value = value;
        // 調用這個方法
        e.recordAccess(this);
        return oldValue;
    }
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
複製代碼

咱們能夠看到已經存在的數據進行更新時，有調用了recordAccess(this)這個方法，可是，這個方法在HashMap中是一個空實現，這個方法的真正實如今LinkedHashMap中

void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
// accessOrder默認值爲false
if (lm.accessOrder) {
    lm.modCount++;
    // remove()和addBefore()組合，移動到鏈表的頭部
    remove();
    addBefore(lm.header);
}
}
複製代碼

能夠看見，若是accessOrder爲true，就把節點移動到鏈表的頭部。可是，默認值爲false。把節點移動到鏈表的頭部，是符合LRU的規則第二條的，惋惜默認爲false。

而後，咱們來看看LinkedHashMap的查詢方法

public V get(Object key) {
Entry<K,V> e = (Entry<K,V>)getEntry(key);
if (e == null)
    return null;
e.recordAccess(this);
return e.value;
}
複製代碼

咱們發現，這個方法也調用了recordAccess(this)。可是，和上文描述狀況同樣，默認accessOrder參數爲false，要把accessOrder設置爲true，才知足Lru的規則第2條。

其實，在LinkedHashMap中，有設置accessOrder的構造方法。

public LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder) {
super(initialCapacity, loadFactor);
//
this.accessOrder = accessOrder;
}
複製代碼

咱們在編寫Lru的LinkedHashMap的時候經過調用這個構造方法就能夠設置accessOrder

最後，咱們來看看LinkedHashMap的添加操做

void addEntry(int hash, K key, V value, int bucketIndex) {
super.addEntry(hash, key, value, bucketIndex);

// Remove eldest entry if instructed
// 找出鏈表中的尾部節點
Entry<K,V> eldest = header.after;
// 若是removeEldestEntry(eldest)爲true就刪除節點，默認爲false
if (removeEldestEntry(eldest)) {
    removeEntryForKey(eldest.key);
}
}
複製代碼

咱們看出來，在removeEldestEntry(eldest)返回的是true的時候，可以實現Lru的第3條規定。

綜合起來，咱們編寫出了這個實現了Lru算法的LinkedHashMap

public class LRULinkedHashMap<K, V> extends LinkedHashMap<K, V> {
private static final long serialVersionUID = -5933045562735378538L;
// 定義Lru緩存的默認容量
private static final int LRU_MAX_CAPACITY = 1024;
private int capacity;

public LRULinkedHashMap() {
    super();
}

// 經過構造方法設置accessOrder
public LRULinkedHashMap(int initialCapacity, float loadFactor, boolean isLRU) {
    super(initialCapacity, loadFactor, true);
    capacity = LRU_MAX_CAPACITY;
}

// 經過構造方法設置accessOrder
public LRULinkedHashMap(int initialCapacity, float loadFactor,
        boolean isLRU, int lruCapacity) {
    super(initialCapacity, loadFactor, true);
    this.capacity = lruCapacity;
}

// 複寫LinkedHashMap的removeEldestEntry()方法
@Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
    System.out.println(eldest.getKey() + "=" + eldest.getValue());

    if (size() > capacity) {
        return true;
    }
    return false;
}
}
複製代碼

TreeMap

定製比較器和天然比較器

TreeMap不保證元素的前後添加順序,可是會對集合中的元素作排序操做，底層由有紅黑樹算法實現(樹結構,比較擅長作範圍查詢)。

TreeSet要麼天然排序,要麼定製排序。

天然排序: 要求在TreeSet集合中的對象必須實現java.lang.Comparable接口，並覆蓋compareTo方法。
定製排序: 要求在構建TreeSet對象的時候，傳入一個比較器對象(必須實現java.lang.Comparator接口)。在比較器中覆蓋compare方法,並編寫比較規則.

TreeSet判斷元素對象重複的規則:compareTo/compare方法是否返回0.若是返回0,則視爲是同一個對象.

如下咱們來看看TreeMap中使用比較器的代碼吧

// 若是傳入了定製比較器
Comparator<? super K> cpr = comparator;
if (cpr != null) {
// 紅黑樹的算法
do {
    parent = t;
    cmp = cpr.compare(key, t.key);
    if (cmp < 0)
        t = t.left;
    else if (cmp > 0)
        t = t.right;
    else
        return t.setValue(value);
} while (t != null);
}
else {
if (key == null)
    throw new NullPointerException();
// 若是實現了天然比較器
Comparable<? super K> k = (Comparable<? super K>) key;
// 紅黑樹算法
do {
    parent = t;
    cmp = k.compareTo(t.key);
    if (cmp < 0)
        t = t.left;
    else if (cmp > 0)
        t = t.right;
    else
        return t.setValue(value);
} while (t != null);
}
複製代碼

從上面的代碼中咱們能夠看出來，若是TreeMap中先是判判定製比較器，而後判斷天然比較器。若是兩個比較器都實現了，TreeMap會使用定製比較器進行比較，若是兩個比較器都沒有實現，TreeMap會在Comparable<? super K> k = (Comparable<? super K>) key一句中拋出強制轉型失敗異常。

紅黑樹概述

紅黑樹又稱紅-黑二叉樹，它首先是一顆二叉樹，它具體二叉樹全部的特性。同時紅黑樹更是一顆自平衡的排序二叉樹。

基本的二叉樹他們都須要知足一個基本性質--即樹中的任何節點的值大於它的左子節點，且小於它的右子節點。按照這個基本性質使得樹的檢索效率大大提升。

生成二叉樹的過程是很是容易失衡的，最壞的狀況就是一邊倒（只有右/左子樹），這樣勢必會致使二叉樹的檢索效率大大下降，因此爲了維持二叉樹的平衡，大牛們提出了各類實現的算法，如：AVL，SBT，伸展樹，TREAP ，紅黑樹等等

平衡二叉樹必須具有以下特性：它是一棵空樹或它的左右兩個子樹的高度差的絕對值不超過1，而且左右兩個子樹都是一棵平衡二叉樹。也就是說該二叉樹的任何一個子節點，其左右子樹的高度都相近。

紅黑樹規則

紅黑樹顧名思義就是節點是紅色或者黑色的平衡二叉樹，它經過顏色的約束來維持着二叉樹的平衡。對於一棵有效的紅黑樹二叉樹而言咱們必須增長以下規則：

每一個節點都只能是紅色或者黑色
根節點是黑色
每一個葉節點（NIL節點，空節點）是黑色的。
若是一個結點是紅的，則它兩個子節點都是黑的。也就是說在一條路徑上不能出現相鄰的兩個紅色結點。
從任一節點到其每一個葉子的全部路徑都包含相同數目的黑色節點。

紅黑樹基本操做

在添加或刪除節點後，紅黑樹就發生了變化，可能再也不知足5個特性，爲了保持紅黑樹的特性，就有了三個動做：左旋、右旋、着色。

紅黑樹添加操做

咱們先來看看紅黑樹的添加代碼

public V put(K key, V value) {
    Entry<K,V> t = root;
    if (t == null) {
        compare(key, key); // type (and possibly null) check

        root = new Entry<>(key, value, null);
        size = 1;
        modCount++;
        return null;
    }
    int cmp;
    Entry<K,V> parent;
    // split comparator and comparable paths
    Comparator<? super K> cpr = comparator;
    if (cpr != null) {
        // 紅黑樹的比較操做，while循環比較，直到插入到樹的葉子節點
        do {
            parent = t;
            cmp = cpr.compare(key, t.key);
            if (cmp < 0)
                t = t.left;
            else if (cmp > 0)
                t = t.right;
            else
                return t.setValue(value);
        } while (t != null);
    }
    else {
        if (key == null)
            throw new NullPointerException();
        Comparable<? super K> k = (Comparable<? super K>) key;
        // 紅黑樹的比較操做，while循環比較，直到插入到樹的葉子節點
        do {
            parent = t;
            cmp = k.compareTo(t.key);
            if (cmp < 0)
                t = t.left;
            else if (cmp > 0)
                t = t.right;
            else
                return t.setValue(value);
        } while (t != null);
    }
    Entry<K,V> e = new Entry<>(key, value, parent);
    if (cmp < 0)
        parent.left = e;
    else
        parent.right = e;
    // 插入成爲樹的葉子節點以後，有可能使樹不平衡（違反紅黑樹的5點規則），這個時候，須要調整。調用fixAfterInsertion(e)方法。
    fixAfterInsertion(e);
    size++;
    modCount++;
    return null;
    }
複製代碼

咱們首先作的是先將節點插入成爲樹的葉子節點，而後再對樹的平衡進行調整。具體調整的規則，主要輸須要樹符合紅黑樹的5點規則。

咱們在來看看紅黑樹的調整操做

private void fixAfterInsertion(Entry<K,V> x) {
    x.color = RED;
    while (x != null && x != root && x.parent.color == RED) {
        if (parentOf(x) == leftOf(parentOf(parentOf(x)))) {
            Entry<K,V> y = rightOf(parentOf(parentOf(x)));
            // 狀況 1
            if (colorOf(y) == RED) {
                setColor(parentOf(x), BLACK);
                setColor(y, BLACK);
                setColor(parentOf(parentOf(x)), RED);
                x = parentOf(parentOf(x));
            } else {
                // 狀況2 2小點
                if (x == rightOf(parentOf(x))) {
                    x = parentOf(x);
                    rotateLeft(x);
                }
                // 狀況2 1小點
                setColor(parentOf(x), BLACK);
                setColor(parentOf(parentOf(x)), RED);
                rotateRight(parentOf(parentOf(x)));
            }
        } else {
            Entry<K,V> y = leftOf(parentOf(parentOf(x)));
            // 與上面的if語句，成爲鏡像
            if (colorOf(y) == RED) {
                setColor(parentOf(x), BLACK);
                setColor(y, BLACK);
                setColor(parentOf(parentOf(x)), RED);
                x = parentOf(parentOf(x));
            } else {
                if (x == leftOf(parentOf(x))) {
                    x = parentOf(x);
                    rotateRight(x);
                }
                setColor(parentOf(x), BLACK);
                setColor(parentOf(parentOf(x)), RED);
                rotateLeft(parentOf(parentOf(x)));
            }
        }
    }
    root.color = BLACK;
    }
複製代碼

根據以上的代碼，畫出狀況1 和狀況2的圖解，以下。咱們能夠對照着上面的代碼進行比較學習。

（1）新插入節點爲根節點。這種狀況直接將新插入節點設置爲根節點便可，無需進行後續的旋轉和着色處理。

（2）新插入節點的父節點是黑色。這種狀況直接將新節點插入便可，不會違背規則（4）。

（3）新插入節點的父節點是紅色。這種狀況會違背規則（4），而這種狀況又分爲了如下幾種，下面進行圖解：

①新插入節點N的父節點P和叔叔節點U都是紅色。方法是：將祖父節點G設置爲紅色，父節點P和叔叔節點U設置爲黑色，這時候就看似平衡了。可是，若是祖父節點G的父節點也是紅色，這時候又違背規則（4），調整方法是：將GPUN這一組當作一個新的節點，按照前面的方案遞歸；又可是根節點爲紅就違反規則（2），這時調整方法是直接將根節點設置爲黑色（兩個連續黑色是沒問題的）。