WeakHashMap源碼解讀

時間 2019-12-14

原文原文鏈接

1. 簡介

本文基於JDK8u111的源碼分析WeakHashMap的一些主要方法的實現。java

2. 數據結構

就數據結構來講WeakHashMap與HashMap原理差很少，都是拉鍊法來解決哈希衝突。
下面是WeakHashMap中的Entry結構定義。git

/**
  * 省略部分方法實現。
  */
private static class Entry<K,V> extends WeakReference<Object> implements Map.Entry<K,V> {
    V value;
    final int hash;
    Entry<K,V> next;

    Entry(Object key, V value,
            ReferenceQueue<Object> queue,
            int hash, Entry<K,V> next) {
        super(key, queue);
        this.value = value;
        this.hash  = hash;
        this.next  = next;
    }
    @SuppressWarnings("unchecked")
    public K getKey() {
        return (K) WeakHashMap.unmaskNull(get());
    }
    public V getValue() {
        return value;
    }

    public V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

}

另外，每一個WeakHashMap內部都有一個ReferenceQueue用於收集被GC的弱引用，定義以下。github

private final ReferenceQueue<Object> queue = new ReferenceQueue<>();

這個queue會做爲Entry構造方法的一個參數用於實例化WeakReference，其主要做用是爲方便清理WeakHashMap中無效Entry。算法

3. 重要方法源碼

3.1 哈希算法

首先看一下WeakHashMap是如何去hash一個Object的apache

final int hash(Object k) {
    int h = k.hashCode();

    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

能夠看到WeakHashMap的hash方法其實是和JDK7中HashMap是相同的。
由於WeakHashMap與HashMap相似，Capacity也是2的冪。若是直接用對象的hashCode那麼在計算bucket的index的時候可能會出現比較嚴重的衝突（高位不一樣，低位相同分配到同一個bucket中）。爲了不這種狀況，須要將高位與低位做一個混淆或者擾動，增長bucket index的隨機性。數組

在JDK8的HashMap類中，hash方法已經簡化爲只須要一次擾動亦或。緩存

3.2 插入元素

public V put(K key, V value) {
    Object k = maskNull(key);
    int h = hash(k);

    // getTable會做一次清理。
    Entry<K,V>[] tab = getTable();
    int i = indexFor(h, tab.length);

    // 遍歷bucket中元素，查詢是否命中map中已有元素。
    for (Entry<K,V> e = tab[i]; e != null; e = e.next) {
        if (h == e.hash && eq(k, e.get())) {
            V oldValue = e.value;
            if (value != oldValue)
                e.value = value;
            return oldValue;
        }
    }

    modCount++;
    Entry<K,V> e = tab[i];
    // 將新元素插入到bucket中。
    tab[i] = new Entry<>(k, value, queue, h, e);

    // 超過閾值後擴容一倍。
    if (++size >= threshold)
        resize(tab.length * 2);
    return null;
}
private Entry<K,V>[] getTable() {
    expungeStaleEntries();
    return table;
}

下面來看看WeakHashMap是如何清理髒數據的tomcat

private void expungeStaleEntries() {
    // 遍歷該WeakHashMap的reference queue中被回收的弱引用。
    for (Object x; (x = queue.poll()) != null; ) {
        /*
         * 這裏有個值得注意的點就是下面的代碼被包在queue的同步塊中。
         * 由於這裏不一樣步的話，WeakHashMap在不涉及修改，只有併發讀的狀況下，
         * 下面的清理在多線程狀況下可能會破壞內部數據結構。
         * 而之因此不在整個方法級別做同步，緣由是上面的ReferenceQueue的poll方法是線程安全，
         * 能夠併發取數據的（poll方法裏面有同步）。
         */
        synchronized (queue) {
            @SuppressWarnings("unchecked")
                Entry<K,V> e = (Entry<K,V>) x;
            int i = indexFor(e.hash, table.length);

            Entry<K,V> prev = table[i];
            Entry<K,V> p = prev;
            // 遍歷對應bucket中的元素。
            while (p != null) {
                Entry<K,V> next = p.next;
                if (p == e) {
                    // 意味着table[i]==e，直接將table[i]向後指一位便可
                    if (prev == e)
                        table[i] = next;
                    else
                        // 刪除p節點，將前驅和後繼連接上。
                        prev.next = next;
                    // 由於可能有HashIterator正在遍歷，因此e.next這裏不清爲null。
                    e.value = null; // Help GC
                    size--;
                    break;
                }
                prev = p;
                p = next;
            }
        }
    }
}

3.2.1 擴容機制

與HashMap相似，在WeakHashMap中元素超過閾值threshold時也會發生擴容，下面是WeakHashMap的resize方法實現安全

void resize(int newCapacity) {
    Entry<K,V>[] oldTable = getTable();
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry<K,V>[] newTable = newTable(newCapacity);
    transfer(oldTable, newTable);
    table = newTable;

    /*
     * WeakHashMap這裏有個很嚴謹的設計是會再次判斷元素個數是否超過閾值的一半
     * 由於在剛開始getTable以及後續transfer過程當中都有清理機制（transfer方法不會去拷貝已經被回收的元素）。
     * 若是size的值小於閾值的一半，爲了不WeakHashMap的Capacity的無限擴張，會去從新將元素拷貝到原先的數組中。
     */
    if (size >= threshold / 2) {
        threshold = (int)(newCapacity * loadFactor);
    } else {
        expungeStaleEntries();
        transfer(newTable, oldTable);
        table = oldTable;
    }
}


private void transfer(Entry<K,V>[] src, Entry<K,V>[] dest) {
    for (int j = 0; j < src.length; ++j) {
        Entry<K,V> e = src[j];
        src[j] = null;
        while (e != null) {
            Entry<K,V> next = e.next;
            Object key = e.get();
            if (key == null) {
                e.next = null; 
                e.value = null; 
                size--;
            } else {
                int i = indexFor(e.hash, dest.length);
                e.next = dest[i];
                dest[i] = e;
            }
            e = next;
        }
    }
}

至此，put方法咱們已經閱讀的差很少了。這裏梳理一下WeakHashMap在咱們操做put元素時哪些狀況下會清理元素數據結構

put方法開始的getTable會調用一次expungeStaleEntries
須要擴容時resize方法開始的getTable會調用一次expungeStaleEntries
transfer方法自己會判斷弱引用指向的對象是否已經被GC
擴容後發現size小於閾值一半，會調用一次expungeStaleEntries

3.3 取出元素

WeakHashMap根據key獲取一個mapping對應的value仍是相對比較簡單的。

public V get(Object key) {
    Object k = maskNull(key);
    int h = hash(k);
    Entry<K,V>[] tab = getTable();
    int index = indexFor(h, tab.length);
    Entry<K,V> e = tab[index];
    // 遍歷bucket中元素。
    while (e != null) {
        if (e.hash == h && eq(k, e.get()))
            return e.value;
        e = e.next;
    }
    return null;
}

能夠看到在get方法中也有getTable方法的調用，這裏也會涉及到已被GC的key對應entry的清理。

3.3 刪除元素

public V remove(Object key) {
    Object k = maskNull(key);
    int h = hash(k);
    Entry<K,V>[] tab = getTable();
    int i = indexFor(h, tab.length);
    Entry<K,V> prev = tab[i];
    Entry<K,V> e = prev;

    while (e != null) {
        Entry<K,V> next = e.next;
        /* 
         * 這裏的邏輯其實和expungeStaleEntries相似，
         * 若是在bucket最外的端點，則直接把tab[i]的指向日後面挪一下便可，
         * 不然將待刪除節點前驅和後繼連接上便可。
         */
        if (h == e.hash && eq(k, e.get())) {
            modCount++;
            size--;
            if (prev == e)
                tab[i] = next;
            else
                prev.next = next;
            return e.value;
        }
        prev = e;
        e = next;
    }

    return null;
}

4. WeakHashMap的使用

WeakHashMap的一種使用場景是不影響key生命週期的緩存。能夠參考tomcat中的ConcurrentCache中，使用了WeakHashMap。咱們來看下代碼：

public final class ConcurrentCache<K,V> {

    private final int size;

    private final Map<K,V> eden;

    private final Map<K,V> longterm;

    public ConcurrentCache(int size) {
        this.size = size;
        this.eden = new ConcurrentHashMap<>(size);
        this.longterm = new WeakHashMap<>(size);
    }

    public V get(K k) {
        V v = this.eden.get(k);
        if (v == null) {
            synchronized (longterm) {
                v = this.longterm.get(k);
            }
            if (v != null) {
                this.eden.put(k, v);
            }
        }
        return v;
    }

    public void put(K k, V v) {
        if (this.eden.size() >= size) {
            synchronized (longterm) {
                this.longterm.putAll(this.eden);
            }
            this.eden.clear();
        }
        this.eden.put(k, v);
    }
}

不過在我實際項目開發中，通常碰到須要用到WeakHashMap的場景仍是比較少見的。

5. 總結

下面總結一下WeakHashMap，並和HashMap以及ThreadLocalMap做一個比較。

比較內容	WeakHashMap	HashMap	ThreadLocalMap
存儲方式	拉鍊法	拉鍊法	開放地址法
存儲數據	任意，key/value可爲null，實際存儲的Entry爲key的弱引用。	任意，key/value可爲null	key爲ThreadLocal對象，value任意類型可爲null，key必定不會爲null（無法本身塞null），實際存儲的Entry是key的弱引用。
對key的GC影響	Entry爲弱引用，不影響key的GC	強引用，對key的GC有影響	Entry爲弱引用，不影響key的GC
線程安全	否	否	是
其它	自帶無效數據清理	JDK8中方法實現有優化	自帶無效數據清理