要弄明白這個問題,咱們首先要明白爲何要轉換,這個問題比較簡單,由於Map中桶的元素初始化是鏈表保存的,其查找性能是O(n),而樹結構能將查找性能提高到O(log(n))。當鏈表長度很小的時候,即便遍歷,速度也很是快,可是當鏈表長度不斷變長,確定會對查詢性能有必定的影響,因此才須要轉成樹。至於爲何閾值是8,我想,去源碼中找尋答案應該是最可靠的途徑。java
8這個閾值定義在HashMap中,以下所示,這段註釋只說明瞭8是bin(bin就是bucket,即HashMap中hashCode值同樣的元素保存的地方)從鏈表轉成樹的閾值,可是並無說明爲何是8:node
/** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon shrinkage. */ static final int TREEIFY_THRESHOLD = 8;
咱們繼續往下看,在HashMap中有一段Implementation notes
,筆者摘錄了幾段重要的描述,第一段以下所示,大概含義是當bin變得很大的時候,就會被轉換成TreeNodes中的bin,其結構和TreeMap類似,也就是紅黑樹:算法
This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap
繼續往下看,TreeNodes佔用空間是普通Nodes的兩倍,因此只有當bin包含足夠多的節點時纔會轉成TreeNodes,而是否足夠多就是由TREEIFY_THRESHOLD的值決定的。當bin中節點數變少時,又會轉成普通的bin。而且咱們查看源碼的時候發現,鏈表長度達到8就轉成紅黑樹,當長度降到6就轉成普通bin。less
這樣就解析了爲何不是一開始就將其轉換爲TreeNodes,而是須要必定節點數才轉爲TreeNodes,說白了就是trade-off,空間和時間的權衡:dom
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5)*pow(0.5, k)/factorial(k)). The first values are: 0: 0.60653066 1: 0.30326533 2: 0.07581633 3: 0.01263606 4: 0.00157952 5: 0.00015795 6: 0.00001316 7: 0.00000094 8: 0.00000006 more: less than 1 in ten million
這段內容還說到:當hashCode離散性很好的時候,樹型bin用到的機率很是小,由於數據均勻分佈在每一個bin中,幾乎不會有bin中鏈表長度會達到閾值。可是在隨機hashCode下,離散性可能會變差,然而JDK又不能阻止用戶實現這種很差的hash算法,所以就可能致使不均勻的數據分佈。不過理想狀況下隨機hashCode算法下全部bin中節點的分佈頻率會遵循泊松分佈,咱們能夠看到,一個bin中鏈表長度達到8個元素的機率爲0.00000006,幾乎是不可能事件。因此,之因此選擇8,不是拍拍屁股決定的,而是根據機率統計決定的。因而可知,發展30年的Java每一項改動和優化都是很是嚴謹和科學的。性能
畫外音優化
筆者經過搜索引擎搜索這個問題,發現不少下面這個答案(猜想也是相互轉發):this
紅黑樹的平均查找長度是log(n),若是長度爲8,平均查找長度爲log(8)=3,鏈表的平均查找長度爲n/2,當長度爲8時,平均查找長度爲8/2=4,這纔有轉換成樹的必要;鏈表長度若是是小於等於6,6/2=3,而log(6)=2.6,雖然速度也很快的,可是轉化爲樹結構和生成樹的時間並不會過短。搜索引擎