談談java中幾種常見的散列算法及解決哈希碰撞的方式

時間 2019-12-06

原文原文鏈接

由表及裏，按部就班，請往下看。隨手點贊是對做者最大的鼓勵！^0^。java

什麼是哈希表

引用：嚴蔚敏《數據結構（C語言版）》中的內容算法

哈希表就是 依據關鍵字能夠根據必定的算法（哈希函數）映射到表中的特定位置 的思想創建的表。所以哈希表最大的特色就是能夠根據f(K)函數獲得其在數組中的索引。

接下來來看看Java中Object對hashCode()方法的說明，固然此方法和equals(Object obj)方法是相輔相成的。數組

Object類中的equals和hashCode方法（文章內源碼均基於JDK8）

equals方法官方文檔：

public boolean equals(Object obj)數據結構

Indicates whether some other object is "equal to" this one. app

The equals method implements an equivalence relation on non-null
object references: ide

· It is reflexive: for any non-null reference value x, x.equals(x) should return true.
· It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
· It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
· It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
· For any non-null reference value x, x.equals(null) should return false. 函數

The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value true). 性能

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.flex

Parameters:
obj the reference object with which to compare.
Returns:
true if this object is the same as the obj argument; false otherwise.
See Also:
hashCode(),java.util.HashMap優化

在官方說明中，指明瞭equals方法具備自反性、對稱性、傳遞性、一致性，同時也提醒在在繼承Object的時候，若是要重寫hashCode方法，一般都須要重寫該方法，由於hashCode要求（下面也有說起）：若是兩個對象執行equals方法結果爲true，則兩對象的哈希碼應該是相等的。

hashCode方法官方文檔：

public native int hashCode();

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by java.util.HashMap.

The general contract of hashCode is:

· Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
· If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
· It is not required that if two objects are unequal according to the java.lang.Object.equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)
Returns:
a hash code value for this object.
See Also:
java.lang.Object.equals(java.lang.Object),java.lang.System.identityHashCode

該方法返回對象的通過處理後的內存地址，因爲每一個對象的內存地址都不同，因此哈希碼也不同。此方法爲native方法，取決於JVM的內部設計，通常是某種C地址的偏移。

文檔中給出了三條規定：

在對象沒有被修改的前提下，執行屢次調用，該hashCode方法必須始終返回相同的整數。
若是兩個對象執行equals方法結果爲true，則分別調用hashCode方法產生的整數結果是相等的。
非必要要求：兩個對象執行equals方法結果爲false，則分別調用hashCode方法產生的整數結果是不相等的。

第三個要求雖然爲非必需，但若是實現，則能夠提升散列表的性能。

接下來分析幾個常見的實現方式。

String的equals和hashCode方法

hashCode方法源碼：

public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

該函數很簡單，以31爲權，每一位爲字符的ASCII值進行運算，用天然溢出來等效取模，達到了目的——只要字符串的內容相同，返回的哈希碼也相同。可是乘子31在此須要解釋一下。選31做爲乘子，是由於：

31是一個奇質數，若是選擇一個偶數會在乘法運算中產生溢出，致使數值信息丟失，由於乘二至關於移位運算。選擇質數的優點並非特別的明顯，但這是一個傳統。
31能夠被JVM優化：31 * i = (i << 5) - i。

equals方法源碼：

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

此equals方法包含了"=="，雙等號比較的是地址，存儲地址相同，內容則相同。當地址不一樣的時候，先驗證了比較對象是否爲String，接着比較了兩個字符串的長度，最後才循環比較每一個字符是否相等。

Integer的equals和hashCode方法

hashCode方法源碼：

@Override
    public int hashCode() {
        return Integer.hashCode(value);
    }
    public static int hashCode(int value) {
        return value;
    }

equals方法源碼：

public boolean equals(Object obj) {
        if (obj instanceof Integer) {
            return value == ((Integer)obj).intValue();
        }
        return false;
    }

因而可知，Integer哈希碼就是Integer對象裏所包含的那個整數的數值，且equals方法比較的也是二者的整數數值，即兩個數值大小的Integer對象，計算出的哈希碼是相等的。

最後，像int，char這樣的基礎類，它們不須要hashCode，若是須要存儲時，將進行自動裝箱操做，計算方法同Integer。

哈希碰撞（hash衝突）

在計算hash地址的過程當中會出現對於不一樣的關鍵字出現相同的哈希地址的狀況，即key1 ≠ key2，可是f(key1) = f(key2)，這種狀況就是Hash 衝突。具備相同關鍵字的key1和key2稱之爲同義詞。
經過優化哈希函數能夠減小這種衝突的狀況（如：均衡哈希函數），可是在通用條件下，考慮到於表格的長度有限及關鍵值（數據）的無限，這種衝突是不可避免的，因此就須要處理衝突。

衝突處理

衝突處理分爲如下四種方式：

開放地址
再哈希
鏈地址
創建公共溢出區

其中開放地址又分爲：

線性探測再散列

二次探測再散列

僞隨機探測再散列

下面談談幾種方法的原理：

開放地址

開放地址法處理衝突的基本原則就是出現衝突後按照必定算法查找一個空位置存放。公式：

Hi爲計算出的地址，H(key)爲哈希函數，di爲增量。其中di的三種獲取方式既是上面提到的開放地址法的三種分類（線性探測再散列、二次探測再散列、僞隨機探測再散列）。

線性探測再散列

，即依次向後查找。
二次探測再散列
，即依次向先後查找，增量爲一、二、3的二次方。
僞隨機探測再散列
僞隨機，顧名思義就是隨機產生一個增量位移。

再哈希法

再哈希法，就是出現衝突後採用其餘的哈希函數計算，直到再也不衝突爲止。

，其中RHi爲不一樣的哈希函數。

鏈地址法

連接地址法不一樣與前兩種方法，他是在出現衝突的地方存儲一個鏈表，全部的同義詞記錄都存在其中。形象點說就行像是在出現衝突的地方直接把後續的值摞上去。例如HashMap，以下圖。

創建公共溢出區

創建公共溢出區的基本思想是：假設哈希函數的值域是[1,m-1]，則設向量HashTable[0...m-1]爲基本表，每一個份量存放一個記錄，另外設向量OverTable[0...v]爲溢出表，全部關鍵字和基本表中關鍵字爲同義詞的記錄，無論它們由哈希函數獲得的哈希地址是什麼，一旦發生衝突，都填入溢出表。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。