String 源碼探究

時間 2019-11-06

標籤 string 源碼探究简体版

原文原文鏈接

原由：突然想到平時用的HashMap 當key是字符串的時候爲何總能夠覆蓋，而後看了String的源碼發現：java

private final char value[];

private int hash; // Default to 0

hashCode方法：算法

public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

equals方法：數組

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

很顯然hashCode和eques方法都是根據char[]數組中的char判斷的，可是hashCode函數裏面爲何是app

h = 31 * h + val[i];這個數字爲何選擇31吶，引發了個人興趣。

下面是知乎上的回答：less

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.ide
設計者選擇 31 這個值是由於它是一個奇質數。若是它是一個偶數，在使用乘法當中產生數值溢出時，原有數字的信息將會丟失，由於乘以二至關於位移。
選擇質數的優點不是那麼清晰，可是這是一個傳統。31 的一個優良的性質是：乘法能夠被位移和減法替代： 31 * i == (i << 5) - i
現代的 VM 能夠自行完成這個優化。

As Goodrich and Tamassia point out, If you take over 50,000 English words (formed as the union of the word lists
 provided in two variants of Unix), using the constants 31, 33, 37, 39, and 41 will produce less than 7 collisions 
in each case. Knowing this, it should come as no surprise that many Java implementations choose one of these constants.
Coincidentally, I was in the middle of reading the section "polynomial hash codes" when I saw this question.
正如 Goodrich 和 Tamassia 指出的那樣，若是你使用 31，33， 37，39 和 41 這幾個數值，將其應用於 hashCode 的算法中，每個數字對超過 
50000 個英語單詞（由兩個 Unix 版本的字典的並集構成）產生的 hash 只會產生少於 7 個的衝突。知道了這個以後，Java 大多數的發行版均會使用這幾個
數值之一的事實對你也不會顯得奇怪了。巧合的是，我是在閱讀『多項式哈希值』這一個章節的時候看到這個問題的。

但是爲何java能夠s="abcd"這樣直接賦值吶？難道和c語言裏面的重載同樣嗎？函數

可是否認的：優化

由於從語言一級來看，java不支持運算符重載，這點是確定的。ui

String類的」=」,」+」,」+=」，看似運算符重載，實際不是，只是在java編譯器裏作了一點手腳。
java編譯器對String的運算符作了特殊處理。this

例如：
String s = 「a」;
s += 「b」;
編譯器轉換成了：
String s = 「a」;
s = (new StringBuilder()).append(s).append(「b」).toString();

HashSet: 繼承的AbstractSet內

public int hashCode() {
        int h = 0;
        Iterator<E> i = iterator();
        while (i.hasNext()) {
            E obj = i.next();
            if (obj != null)
                h += obj.hashCode();
        }
        return h;
    }

Integer：

public int hashCode() {
    return hashCode(this.value);
  }

  public static int hashCode(int var0) {
    return var0;
  }

Double:

public int hashCode() {
    return hashCode(this.value);
  }

  public static int hashCode(double var0) {
    long var2 = doubleToLongBits(var0);
    return (int)(var2 ^ var2 >>> 32);
  }