曹工說JDK源碼（3）--ConcurrentHashMap，Hash算法優化、位運算揭祕

時間 2020-06-09

標籤 jdk 源碼 concurrenthashmap hash 算法優化運算揭祕欄目 Java 简体版

原文原文鏈接

hashcode，有點講究

什麼是好的hashcode，通常來講，一個hashcode，通常用int來表示，32位。java

下面兩個hashcode，你們以爲怎麼樣？node

0111 1111 1111 1111 1111 1111 1111 1111  ------A
1111 1111 1111 1111 1111 1111 1111 1111  ------B

只有第32位（從右到左）不同，好像也沒有所謂的好壞吧？數組

那，咱們再想一想，hashcode通常怎麼使用呢？在hashmap中，由數組+鏈表+紅黑樹組成，其中，數組乃重中之重，假設數組長度爲2的n次方，（hashmap的數組，強制要求長度爲2的n次方），這裏假設爲8.app

你們又知道，hashcode 對 8 取模，效果等同於 hashcode & (8 - 1)。ide

那麼，前面的A 和（8 - 1）相與的結果如何呢？函數

0111 1111 1111 1111 1111 1111 1111 1111  ------A
0000 0000 0000 0000 0000 0000 0000 0111  ------ 8 -1
    相與
0000 0000 0000 0000 0000 0000 0000 0111  ------ 7

結果爲7，也就是，會放進array[7]。性能

你們再看B的計算過程：測試

1111 1111 1111 1111 1111 1111 1111 1111  ------B
0000 0000 0000 0000 0000 0000 0000 0111  ------ 8 -1
    相與
0000 0000 0000 0000 0000 0000 0000 0111  ------ 7

雖然B的第32位爲1，可是，奈何和咱們相與的隊友，7，是個垃圾。優化

前面的高位，全是0。this

ok，你懂了嗎，數組長度過小了，才8，致使前面有29位都是0；你可能以爲通常容量不可能這麼小，那假設容量爲2的16次方，容量爲65536，這下不是很小了吧，但即便如此，前面的16位也是0.

因此，問題明白了嗎，咱們計算出來的hashcode，低位相同，高位不一樣；可是，由於和咱們進行與計算的隊友太過垃圾，致使咱們出現了hash衝突。

ok，咱們怎麼來解決這個問題呢？

咱們能不能把高位也參與計算呢？天然，是能夠的。

hashmap中如何優化

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

這裏，其實分了3個步驟：

計算hashcode，做爲操做數1
```
h = key.hashCode()
```
將第一步的hashcode，右移16位，做爲操做數2
```
h >>> 16
```
操做數1 和操做數2 進行異或操做，獲得最終的hashcode

仍是拿前面的來算，

0111 1111 1111 1111 1111 1111 1111 1111  ------A
0000 0000 0000 0000 0111 1111 1111 1111   ----- A >>> 16
          異或（相同則爲0，不然爲1）
0111 1111 1111 1111 1000 0000 0000 0000    --- 2147450880

這裏算出來的結果是 2147450880，再去對 7 進行與運算：

0111 1111 1111 1111 1000 0000 0000 0000    --- 2147450880  
0000 0000 0000 0000 0000 0000 0000 0111  ------ 8 -1
          與運算
0000 0000 0000 0000 0000 0000 0000 0000  ------ 0

這裏的A，算出來，依然在array[0]。

再拿B來算一下：

1111 1111 1111 1111 1111 1111 1111 1111  ------ B
0000 0000 0000 0000 1111 1111 1111 1111   ----- B >>> 16
          異或（相同則爲0，不然爲1）
1111 1111 1111 1111 0000 0000 0000 0000    --- -65536
0000 0000 0000 0000 0000 0000 0000 0111  ------ 7   
         與運算
0000 0000 0000 0000 0000 0000 0000 0000  ------- 0

最終算出來爲0，因此，應該放在array[0]。

恩？算出來兩個仍是衝突了，我只能說，我挑的數字真的牛逼，是否是該去買彩票啊。。

總的來講，你們能夠多試幾組數，下邊提供下源代碼：

public class BinaryTest {
    public static void main(String[] args) {
        int a = 0b00001111111111111111111111111011;
        int b = 0b10001101111111111111110111111011;

        int i = tabAt(32, a);
        System.out.println("index for a:" + i);

        i = tabAt(32, b);
        System.out.println("index for b:" + i);

    }

    static final int tabAt(int  arraySize, int hash) {

        int h = hash;
        int finalHashCode = h ^ (h >>> 16);
        int i = finalHashCode & (arraySize - 1);

        return i;
    }
}

雖說，我測試了幾個數字，仍是有些衝突，可是，你把高16位弄進來參與計算，總比你不弄進來計算要好吧。

你們也能夠看看hashmap中，hash方法的註釋：

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */

裏面提到了2點：

So we apply a transform that spreads the impact of higher bits downward.

因此，咱們進行了一個轉換，把高位的做用利用起來。

we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as

to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

咱們僅僅異或了從高位移動下來的二進制位，用最經濟的方式，削減系統性能損失，一樣，由於數組大小的限制，致使高位在索引計算中一直用不到，咱們經過這種轉換將其利用起來。

ConcurrentHashMap如何優化

在concurrentHashMap中，其主要是：

final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());

這裏主要是使用spread方法來計算hash值：

static final int spread(int h) {
        return (h ^ (h >>> 16)) & HASH_BITS;
    }

你們若是要仔細觀察每一步的二進制，可使用下面的demo：

static final int spread(int h) {
        	// 1
            String s = Integer.toBinaryString(h);
            System.out.println("h:" + s);
    
        	// 2
            String lower16Bits = Integer.toBinaryString(h >>> 16);
            System.out.println("lower16Bits:" + lower16Bits);
    
        	// 3
            int temp = h ^ (h >>> 16);
            System.out.println("h ^ (h >>> 16):" + Integer.toBinaryString(temp));
    
        	// 4
            int result = (temp) & HASH_BITS;
            System.out.println("final:" + Integer.toBinaryString(result));
    
    
            return result;
        }

這裏和HashMap相比，多了點東西，也就是多出來了：

& HASH_BITS;

這個有什麼用處呢？

由於(h ^ (h >>> 16))計算出來的hashcode，多是負數。這裏，和 HASH_BITS進行了相與：

static final int HASH_BITS = 0x7fffffff; // usable bits of normal node hash

1111 1111 1111 1111 1111 1111 1111 1111   假設計算出來的hashcode爲負數，由於第32位爲1
0111 1111 1111 1111 1111 1111 1111 1111       0x7fffffff
    進行相與
0111 ..................................

這裏，第32位，由於0x7fffffff的第32位，總爲0，因此相與後的結果，第32位也總爲0 ，因此，這樣的話，hashcode就老是正數了，不會是負數。

concurrentHashMap中，node的hashcode，爲啥不能是負數

當hashcode爲正數時，表示該哈希桶爲正常的鏈表結構。

當hashcode爲負數時，有幾種狀況：

ForwardingNode

此時，其hash值爲：

static final int MOVED     = -1; // hash for forwarding nodes

當節點爲ForwardingNode類型時（表示哈希表在擴容進行中，該哈希桶已經被遷移到了新的臨時hash表，此時，要get的話，須要去臨時hash表查找；要put的話，是不行的，會幫助擴容）

TreeBin

static final int TREEBIN   = -2; // hash for roots of trees

表示，該哈希桶，已經轉了紅黑樹。

擴容時的位運算

/**
     * Returns the stamp bits for resizing a table of size n.
     * Must be negative when shifted left by RESIZE_STAMP_SHIFT.
     */
    static final int resizeStamp(int n) {
        return Integer.numberOfLeadingZeros(n) | (1 << (RESIZE_STAMP_BITS - 1));
    }

這裏，假設，n爲4，即，hashmap中數組容量爲4.

下面這句，求4的二進制表示中，前面有多少個0.

Integer.numberOfLeadingZeros(n)

表示爲32位後，以下

0000 0000 0000 0000, 0000 0000 0000 0100

因此，前面有29個0，即，這裏的結果爲29.
(1 << (RESIZE_STAMP_BITS - 1)

這一句呢，其中RESIZE_STAMP_BITS 是個常量，爲16. 至關於，把1 向左移動15位。

二進制爲：
```
1000 0000 0000 0000   -- 1 << 15
```

最終結果：

0000 0000 0000 0000 0000 0000 0001 1101   -- 29
0000 0000 0000 0000 1000 0000 0000 0000   -- 1 << 15
進行或運算

0000 0000 0000 0000 1000 0000 0001 1101   --  至關於把29的第一位，變成了1，其餘都沒變。

因此，最終結果是，

這個數，換算爲10進制，爲32972，是個正數。

這個數，有啥用呢？

在addCount函數中，當整個哈希表的鍵值對數量，超過sizeCtl時（通常爲0.75 * 數組長度），就會觸發擴容。

java.util.concurrent.ConcurrentHashMap#addCount
    
int sc =  sizeCtl;
boolean bSumExteedSizeControl = newBaseCount >= (long) sc;
// 1
if (bContinue) {
    int rs = resizeStamp(n);
    // 2
    if (sc < 0) {
        if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
            sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
            transferIndex <= 0)
            break;
        if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1))
            transfer(tab, nt);
    }
    // 3
    else if (U.compareAndSwapInt(this, SIZECTL, sc,
                                   (rs << RESIZE_STAMP_SHIFT) + 2))
        transfer(tab, null);
    newBaseCount = sumCount();
} else {
    break;
}

1處，若是擴容條件知足
2處，若是sc小於0，這個sc是啥，就是前面說的sizeCtl，此時應該是等於：0.75 * 數組長度，不可能爲負數

3處，將sc（此時爲正數），cas修改成：

(rs << RESIZE_STAMP_SHIFT) + 2)

這個數有點意思了，rs就是前面咱們的resizeStamp獲得的結果。

按照前面的demo，咱們拿到的結果爲：

0000 0000 0000 0000 1000 0000 0001 1101   --  至關於把29的第一位，變成了1，其餘都沒變。

由於

private static int RESIZE_STAMP_BITS = 16;
private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS;

因此，RESIZE_STAMP_SHIFT 爲16.

0000 0000 0000 0000 1000 0000 0001 1101   --  至關於把29的第一位，變成了1，其餘都沒變。
1000 0000 0001 1101 0000 0000 0000 0000 ---   左移16位，即   rs << RESIZE_STAMP_SHIFT
1000 0000 0001 1101 0000 0000 0000 0010    -- (rs << RESIZE_STAMP_SHIFT) + 2)

最終，這個數，第一位是 1，說明了，這個數，確定是負數。

你們若是看過其餘人寫的資料，也就知道，當sizeCtl爲負數時，表示正在擴容。

因此，這裏

if (U.compareAndSwapInt(this, SIZECTL, sc,
                            (rs << RESIZE_STAMP_SHIFT) + 2))

這句話就是，若是當前線程成功地，利用cas，將sizeCtl從正數，變成負數，就能夠進行擴容。

擴容時，其餘線程怎麼執行

// 1
if (bContinue) {
    int rs = resizeStamp(n);
    // 2
    if (sc < 0) {
        // 2.1
        if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
            sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
            transferIndex <= 0)
            break;
        // 2.2
        if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1))
            transfer(tab, nt);
    }
    // 3
    else if (U.compareAndSwapInt(this, SIZECTL, sc,
                                   (rs << RESIZE_STAMP_SHIFT) + 2))
        transfer(tab, null);
    newBaseCount = sumCount();
} else {
    break;
}

此時，由於上面的線程觸發了擴容，sc已經變成了負數了，此時，新的線程進來，會判斷2處。

2處是知足的，會進入2.1處判斷，這裏的部分條件看不懂，大概是：擴容已經結束，就再也不執行，直接break

不然，進入2.2處，輔助擴容，同時，把sc變成sc + 1，增長擴容線程數。