JDK1.8 的 HashMap 源碼之注意事項

時間 2019-11-16

原文原文鏈接

文章目錄

英語渣靠着翻譯插件，大概翻譯的，不免有錯誤之處，注意甄別；java

鏈表變樹

This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. Most methods try to use normal bins, but relay to TreeNode methods when applicable (simply by checking instanceof a node). Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. However, since the vast majority of bins in normal use are not overpopulated, checking for existence of tree bins may be delayed in the course of table methods.

HashMap 一般表現爲一張哈希表，每一個桶裏面放一個元素，當有多個元素的時候，變爲鏈表，這時候每一個桶裏面都是放鏈表；可是當鏈表的長度達到一個臨界的時候，鏈表轉換爲樹，每一個樹的結構就像 TreeMap 同樣，這時候，每一個桶裏面就是一個樹形結構；node

大多數方法只是用普通的桶，即裏面只是鏈表；可是在合適的時候，鏈表會被轉爲樹，好比檢查每一個節點的時候；由於這時候轉換爲樹，不但支持原來鏈表的遍歷和使用，同時還能得到更快的查找；web

可是因爲大多數時候，每一個桶都沒有被過分填充，即裏面都是鏈表，還達不到轉換爲樹的條件，所以 HashMap 的方法可能會延遲檢查桶裏面究竟是鏈表仍是樹形結構；編程

樹形結構與Comparable，性能極致與下降

Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable", type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)

當桶裏面的結構是樹形結構的時候，一般狀況下是按照 HashCode 來算下標位置的；可是若是要插入的元素實現了 Comparable 接口，則使用接口的 compareTo 方法，計算排序的位置；app

樹形結構（HashMap中的樹是紅黑樹）保證了在元素具備 不一樣的哈希碼或者能夠排序 的狀況下，插入元素複雜度在最壞的狀況下是 O log(n) ，所以，將桶中的鏈表在必定狀況下轉成樹是值得的;dom

所以，若是惡意的將 hashCode 方法的返回值故意分佈在一塊兒，好比返回同一個哈希碼，或者實現了 Comparable 接口，可是 compareTo 永遠返回 0，這時候 HashCdoe 的性能會降低；ide

若是這兩種方法都不適用（不一樣的哈希碼或者能夠排序），與不採起預防措施相比，咱們可能會浪費大約兩倍的時間和空間。但目前所知的惟一案例源於糟糕的用戶編程實踐，這些實踐已經很是緩慢，以致於沒有什麼區別；svg

鏈表與樹之間轉換的閾值

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)).

由於樹節點的大小大約是普通節點的兩倍，因此咱們只在桶中包含足夠的節點（TREEIFY_THRESHOLD = 8）時才進行鏈表轉換成樹的操做；性能

當樹由於刪除節點變得很小的時候，會再次轉換回鏈表；this

若是 HashCode 方法設計的很好的話，哈希衝突下降，鏈表的長度基本就不會很長，樹是不多用到的；

理想狀況下，隨機的哈希碼，遵循 Poisson （泊松）分佈；

下面還有一些，粗略看了下，沒有什麼震撼的信息，就不在翻譯了；

看源碼，英文註釋，也很費時間啊