Why hash maps in Java 8 use binary tree instead of linked list?

Q:    I recently came to know that in Java 8 hash maps uses binary tree instead of linked list and hash code is used as the branching factor.I understand that in case of high collision the lookup is reduced to O(log n) from O(n) by using binary trees.My question is what good does it really do as the amortized time complexity is still O(1) and maybe if you force to store all the entries in the same bucket by providing the same hash code for all keys we can see a significant time difference but no one in their right minds would do that.html

Binary tree also uses more space than singly linked list as it stores both left and right nodes.Why increase the space complexity when there is absolutely no improvement in time complexity except for some spurious test cases.java

我最近才知道在Java 8哈希映射中使用二叉樹而不是鏈表,並使用哈希代碼做爲分支因子。我知道在高衝突的狀況下,查找從 O(n)減小到O(log n) 經過使用二叉樹。個人問題是它真正作了什麼好處,由於攤銷的時間複雜度仍然是 O(1)而且若是你強制經過爲全部鍵提供相同的哈希碼來存儲同一桶中的全部條目 能夠看到一個顯着的時間差別,但沒有一我的在他們正確的思想中會這樣作。二進制樹比單鏈表使用更多空間,由於它存儲左右節點。當除了一些虛假測試用例以外,當時間複雜度徹底沒有改善時,爲何增長空間複雜度。node

A:    This is mostly security-related change. While in normal situation it's rarely possible to have many collisions, if hash keys arrive from untrusted source (e.g. HTTP header names received from the client), then it's possible and not very hard to specially craft the input, so the resulting keys will have the same hashcode. Now if you perform many look-ups, you may experience denial-of-service. It appears that there's quite a lot of code in the wild which is vulnerable to this kind of attacks, thus it was decided to fix this on the Java side.安全

For more information refer to JEP-180.app

這主要是與安全相關的變化。 雖然在正常狀況下不多有可能發生不少衝突,若是哈希密鑰來自不受信任的來源(例如從客戶端收到的HTTP頭名稱),那麼可能而且不是很難專門設計輸入,所以生成的密鑰將具備 相同的哈希碼。 如今,若是您執行許多查找,您可能會遇到拒絕服務。 彷佛在野外有至關多的代碼容易受到這種攻擊,所以決定在Java端解決這個問題。ide

有關更多信息,請參閱JEP-180函數

 

PS(參考原文):性能

在設計hash函數時,由於目前的table長度n爲2的冪,而計算下標的時候,是這樣實現的(使用&位操做,而非%求餘):測試

(n - 1) & hash

設計者認爲這方法很容易發生碰撞。爲何這麼說呢?不妨思考一下,在n – 1爲15(0×1111)時,其實散列真正生效的只是低4bit的有效位,固然容易碰撞了。ui

所以,設計者想了一個顧全大局的方法(綜合考慮了速度、做用、質量),就是把高16bit和低16bit異或了一下。設計者還解釋到由於如今大多數的hashCode的分佈已經很不錯了,就算是發生了碰撞也用O(logn)的tree去作了。僅僅異或一下,既減小了系統的開銷,也不會形成的由於高位沒有參與下標的計算(table長度比較小時),從而引發的碰撞。

若是仍是產生了頻繁的碰撞,會發生什麼問題呢?做者註釋說,他們使用樹來處理頻繁的碰撞(we use trees to handle large sets of collisions in bins),在JEP-180中,描述了這個問題:

Improve the performance of java.util.HashMap under high hash-collision conditions byusing balanced trees rather than linked lists to store map entries. Implement the same improvement in the LinkedHashMap class.

以前已經提過,在獲取HashMap的元素時,基本分兩步:

  1. 首先根據hashCode()作hash,而後肯定bucket的index;
  2. 若是bucket的節點的key不是咱們須要的,則經過keys.equals()在鏈中找。

在Java 8以前的實現中是用鏈表解決衝突的,在產生碰撞的狀況下,進行get時,兩步的時間複雜度是O(1)+O(n)。所以,當碰撞很厲害的時候n很大,O(n)的速度顯然是影響速度的。

所以在Java 8中,利用紅黑樹替換鏈表,這樣複雜度就變成了O(1)+O(logn)了,這樣在n很大的時候,可以比較理想的解決這個問題,在Java 8:HashMap的性能提高一文中有性能測試的結果

 

JEP 180: Handle Frequent HashMap Collisions with Balanced Trees

Author Mike Duigou
Owner Brent Christian
Type Feature
Scope Implementation
Status Closed / Delivered
Release 8
Component core-libs
Discussion core dash libs dash dev at openjdk dot java dot net
Effort M
Duration M
Reviewed by Alan Bateman
Endorsed by Brian Goetz
Created 2013/02/08 20:00
Updated 2017/06/14 18:44
Issue 8046170

Summary

Improve the performance of java.util.HashMap under high hash-collision conditions by using balanced trees rather than linked lists to store map entries. Implement the same improvement in the LinkedHashMap class.

Motivation

Earlier work in this area in JDK 8, namely the alternative string-hashing implementation, improved collision performance for string-valued keys only, and it did so at the cost of adding a new (private) field to every String instance.

The changes proposed here will improve collision performance for any key type that implements Comparable. The alternative string-hashing mechanism, including the private hash32 field added to the String class, can then be removed.

Description

The principal idea is that once the number of items in a hash bucket grows beyond a certain threshold, that bucket will switch from using a linked list of entries to a balanced tree. In the case of high hash collisions, this will improve worst-case performance from O(n) to O(log n).

This technique has already been implemented in the latest version of thejava.util.concurrent.ConcurrentHashMap class, which is also slated for inclusion in JDK 8 as part of JEP 155. Portions of that code will be re-used to implement the same idea in the HashMap and LinkedHashMap classes. Only the implementations will be changed; no interfaces or specifications will be modified. Some user-visible behaviors, such as iteration order, will change within the bounds of their current specifications.

We will not implement this technique in the legacy Hashtable class. That class has been part of the platform since Java 1.0, and some legacy code that uses it is known to depend upon iteration order. Hashtable will be reverted to its state prior to the introduction of the alternative string-hashing implementation, and will maintain its historical iteration order.

We also will not implement this technique in WeakHashMap. An attempt was made, but the complexity of having to account for weak keys resulted in an unacceptable drop in microbenchmark performance. WeakHashMap will also be reverted to its prior state.

There is no need to implement this technique in the IdentityHashMap class. It uses System.identityHashCode() to generate hash codes, so collisions are generally rare.

Testing

  • Run Map tests from Doug Lea's JSR 166 CVS workspace (includes a couple microbenchmarks)
  • Run performance tests of standard workloads
  • Possibly develop new microbenchmarks

Risks and Assumptions

This change will introduce some overhead for the addition and management of the balanced trees; we expect that overhead to be negligible.

This change will likely result in a change to the iteration order of the HashMap class. The HashMap specification explicitly makes no guarantee about iteration order. The iteration order of the LinkedHashMap class will be maintained.

相關文章
相關標籤/搜索