hashCode 的值是怎麼生成的？對象內存地址嗎？

時間 2021-06-10

標籤 java git github 算法 bash 併發 dom jvm ide 高併發欄目 Java 简体版

原文原文鏈接

先點贊再看，養成好習慣

先看一個最簡單的打印java

System.out.println(new Object());

會輸出該類的全限定類名和一串字符串：git

java.lang.Object@6659c656

@符號後面的是什麼？是 hashcode 仍是對象的內存地址？仍是其餘的什麼值？
github

其實@後面的只是對象的 hashcode 值，16進制展現的 hashcode 而已，來驗證一下：算法

Object o = new Object();
int hashcode = o.hashCode();
// toString
System.out.println(o);
// hashcode 十六進制
System.out.println(Integer.toHexString(hashcode));
// hashcode
System.out.println(hashcode);
// 這個方法，也是獲取對象的 hashcode；不過和 Object.hashcode 不一樣的是，該方法會無視重寫的hashcode
System.out.println(System.identityHashCode(o));

輸出結果：bash

java.lang.Object@6659c656
6659c656
1717159510
1717159510

那對象的 hashcode 究竟是怎麼生成的呢？真的就是內存地址嗎？
併發

本文內容基於 JAVA 8 HotSpotdom

hashCode 的生成邏輯

JVM 裏生成 hashCode 的邏輯並無那麼簡單，它提供了好幾種策略，每種策略的生成結果都不一樣。
jvm

來看一下 openjdk 源碼裏生成 hashCode 的核心方法：ide

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = intptr_t(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = intptr_t(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

從源碼裏能夠發現，生成策略是由一個 hashCode 的全局變量控制的，默認爲5；而這個變量的定義在另外一個頭文件裏：高併發

product(intx, hashCode, 5,                                            
         "(Unstable) select hashCode generation algorithm" )

源碼裏很清楚了……（非穩定）選擇 hashCode 生成的算法，並且這裏的定義，是能夠由 jvm 啓動參數來控制的，先來確認下默認值：

java -XX:+PrintFlagsFinal -version | grep hashCode

intx hashCode                                  = 5                                   {product}
openjdk version "1.8.0_282"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_282-b08)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.282-b08, mixed mode)

因此咱們能夠經過 jvm 的啓動參數來配置不一樣的 hashcode 生成算法，測試不一樣算法下的生成結果：

-XX:hashCode=N

如今來看看，每種 hashcode 生成算法的不一樣表現。

第 0 種算法

if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random();
  }

這種生成算法，使用的一種Park-Miller RNG的隨機數生成策略。不過須要注意的是……這個隨機算法在高併發的時候會出現自旋等待

第 1 種算法

if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations.  This can be useful in some of the 1-0
    // synchronization schemes.
    intptr_t addrBits = intptr_t(obj) >> 3 ;
    value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
}

這個算法，真的是對象的內存地址了，直接獲取對象的 intptr_t 類型指針

第 2 種算法

if (hashCode == 2) {
    value = 1 ;            // for sensitivity testing
}

這個就不用解釋了……固定返回 1，應該是用於內部的測試場景。

有興趣的同窗，能夠試試-XX:hashCode=2來開啓這個算法，看看 hashCode 結果是否是都變成 1 了。

第 3 種算法

if (hashCode == 3) {
    value = ++GVars.hcSequence ;
}

這個算法也很簡單，自增嘛，全部對象的 hashCode 都使用這一個自增變量。來試試效果：

System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());

//output
java.lang.Object@144
java.lang.Object@145
java.lang.Object@146
java.lang.Object@147
java.lang.Object@148
java.lang.Object@149

果真是自增的……有點意思

第 4 種算法

if (hashCode == 4) {
    value = intptr_t(obj) ;
}

這裏和第 1 種算法其實區別不大，都是返回對象地址，只是第 1 種算法是一個變體。

第 5 種算法

最後一種，也是默認的生成算法，hashCode 配置不等於 0/1/2/3/4 時使用該算法：

else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

這裏是經過當前狀態值進行異或（XOR）運算獲得的一個 hash 值，相比前面的自增算法和隨機算法來講效率更高，但重複率應該也會相對增高，不過 hashCode 重複又有什麼關係呢……

原本 jvm 就不保證這個值必定不重複，像 HashMap 裏的鏈地址法就是解決 hash 衝突用的