給jdk寫註釋系列之jdk1.6容器(6)-HashSet源碼解析&Map迭代器

時間 2019-12-09

標籤 jdk 註釋系列 jdk1.6 容器 hashset 源碼解析 map 迭代欄目 Java 简体版

原文原文鏈接

　　今天的主角是HashSet，Set是什麼東東，固然也是一種java容器了。

如今再看到Hash心底裏有沒有會心一笑呢，這裏再也不贅述hash的概念原理等一大堆東西了（不懂得須要先回去看下HashMap了），須要在囉嗦一句的是hash表是基於快速存取的角度設計的，也是一種典型的空間換時間的作法（這個在分析HashMap中都有講過）。那麼今天的HashSet它又是怎麼一回事的，他的存在又是爲了解決什麼問題呢？

先來看下Set的特色：Set元素無順序，且元素不能夠重複。。想到了什麼？無順序，因爲散列的緣故；不可重複，HashMap的key就是不能重複的。是的，你有想對了。HashSet就是基於HashMap的key來實現的，整個HashSet中基本全部方法都是調用的HashMap的方法。 利用HashMap能夠實現兩個賣點：1.不可重複，2.快速查找（contains）。

一塊兒來看下吧：

1.定義

1 public class HashSet<E>
2     extends AbstractSet<E>
3     implements Set<E>, Cloneable, java.io.Serializable

　　咱們看到HashSet繼承了AbstractSet抽象類，並實現了Set、Cloneable、Serializable接口。AbstractSet是一個抽象類，對一些基礎的set操做進行封裝。繼續來看下Set接口的定義：html

 1 public interface Set<E> extends Collection<E> {
 2     // Query Operations
 3     int size();
 4     boolean isEmpty();
 5     boolean contains(Object o);
 6     Iterator<E> iterator();
 7     Object[] toArray();
 8     <T> T[] toArray(T[] a);
 9     // Modification Operations
10     boolean add(E e);
11     boolean remove(Object o);
12     // Bulk Operations
13     boolean containsAll(Collection<?> c);
14     boolean addAll(Collection<? extends E> c);
15     boolean retainAll(Collection<?> c);
16     boolean removeAll(Collection<?> c);
17     void clear();
18     // Comparison and hashing
19     boolean equals(Object o);
20     int hashCode();
21 }

　　發現了什麼，Set接口和java.util.List接口同樣也實現了Collection接口，可是Set和List所不一樣的是，Set沒有get等跟下標先關的一些操做方法，那怎麼取值呢？Iterator還記得嗎，使用迭代器對不對。（不明白的回去看Iterator講解）java

2.底層存儲程序員

1     // 底層使用HashMap來保存HashSet的元素
2     private transient HashMap<E,Object> map;
3 
4     // Dummy value to associate with an Object in the backing Map
5     // 因爲Set只使用到了HashMap的key，因此此處定義一個靜態的常量Object類，來充當HashMap的value
6     private static final Object PRESENT = new Object();

　　看到這裏就明白了，和咱們前面說的同樣，HashSet是用HashMap來保存數據，而主要使用到的就是HashMap的key。設計模式

　　看到private static final Object PRESENT = new Object();不知道你有沒有一點疑問呢。這裏使用一個靜態的常量Object類來充當HashMap的value，既然這裏map的value是沒有意義的，爲何不直接使用null值來充當value呢？好比寫成這樣子private final Object PRESENT = null;咱們都知道的是，Java首先將變量PRESENT分配在棧空間，而將new出來的Object分配到堆空間，這裏的new Object()是佔用堆內存的（一個空的Object對象佔用8byte），而null值咱們知道，是不會在堆空間分配內存的。那麼想想這裏爲何不使用null值。想到什麼嗎，看一個異常類java.lang.NullPointerException，噢買尬，這絕對是Java程序員的一個噩夢，這是全部Java程序猿都會遇到的一個異常，你看到這個異常你覺得很好解決，可是有些時候也不是那麼容易解決，Java號稱沒有指針，可是到處碰到NullPointerException。因此啊，爲了從根源上避免NullPointerException的出現，浪費8個byte又怎麼樣，在下面的代碼中我不再會寫這樣的代碼啦if (xxx == null) { ... } else {....}，好爽。數組

3.構造方法post

 1  /**
 2      * 使用HashMap的默認容量大小16和默認加載因子0.75初始化map，構造一個HashSet
 3      */
 4     public HashSet() {
 5         map = new HashMap<E,Object>();
 6     }
 7 
 8     /**
 9      * 構造一個指定Collection參數的HashSet，這裏不只僅是Set，只要實現Collection接口的容器均可以
10      */
11     public HashSet(Collection<? extends E> c) {
12         map = new HashMap<E,Object>(Math. max((int) (c.size()/.75f) + 1, 16));
13        // 使用Collection實現的Iterator迭代器，將集合c的元素一個個加入HashSet中
14        addAll(c);
15     }
16 
17     /**
18      * 使用指定的初始容量大小和加載因子初始化map，構造一個HashSet
19      */
20     public HashSet( int initialCapacity, float loadFactor) {
21         map = new HashMap<E,Object>(initialCapacity, loadFactor);
22     }
23 
24     /**
25      * 使用指定的初始容量大小和默認的加載因子0.75初始化map，構造一個HashSet
26      */
27     public HashSet( int initialCapacity) {
28         map = new HashMap<E,Object>(initialCapacity);
29     }
30 
31     /**
32      * 不對外公開的一個構造方法（默認default修飾），底層構造的是LinkedHashMap，dummy只是一個標示參數，無具體意義
33      */
34     HashSet( int initialCapacity, float loadFactor, boolean dummy) {
35         map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
36 }

　　從構造方法能夠很輕鬆的看出，HashSet的底層是一個HashMap，理解了HashMap後，這裏沒什麼可說的。只有最後一個構造方法有寫區別，這裏構造的是LinkedHashMap，該方法不對外公開，其實是提供給LinkedHashSet使用的，而第三個參數dummy是無心義的，只是爲了區分其餘構造方法。this

4.增長和刪除spa

    /**
     * 利用HashMap的put方法實現add方法
     */
    public boolean add(E e) {
        return map .put(e, PRESENT)== null;
    }

    /**
     * 利用HashMap的remove方法實現remove方法
     */
    public boolean remove(Object o) {
        return map .remove(o)==PRESENT;
    }

    /**
     * 添加一個集合到HashSet中，該方法在AbstractCollection中
     */
    public boolean addAll(Collection<? extends E> c) {
        boolean modified = false;
       // 取得集合c迭代器Iterator
       Iterator<? extends E> e = c.iterator();
       // 遍歷迭代器
        while (e.hasNext()) {
           // 將集合c的每一個元素加入到HashSet中
           if (add(e.next()))
              modified = true;
       }
        return modified;
    }
    
    /**
     * 刪除指定集合c中的全部元素，該方法在AbstractSet中
     */
    public boolean removeAll(Collection<?> c) {
        boolean modified = false;

        // 判斷當前HashSet元素個數和指定集合c的元素個數，目的是減小遍歷次數
        if (size() > c.size()) {
            // 若是當前HashSet元素多，則遍歷集合c，將集合c中的元素一個個刪除
            for (Iterator<?> i = c.iterator(); i.hasNext(); )
                modified |= remove(i.next());
        } else {
            // 若是集合c元素多，則遍歷當前HashSet，將集合c中包含的元素一個個刪除
            for (Iterator<?> i = iterator(); i.hasNext(); ) {
                if (c.contains(i.next())) {
                    i.remove();
                    modified = true;
                }
            }
        }
        return modified;
}

5.是否包含設計

 1     /**
 2      * 利用HashMap的containsKey方法實現contains方法
 3      */
 4     public boolean contains(Object o) {
 5         return map .containsKey(o);
 6     }
 7       
 8     /**
 9      * 檢查是否包含指定集合中全部元素，該方法在AbstractCollection中
10      */
11     public boolean containsAll(Collection<?> c) {
12        // 取得集合c的迭代器Iterator
13        Iterator<?> e = c.iterator();
14        // 遍歷迭代器，只要集合c中有一個元素不屬於當前HashSet，則返回false
15         while (e.hasNext())
16            if (!contains(e.next()))
17                return false;
18         return true;
19 }

　　因爲HashMap基於hash表實現，hash表實現的容器最重要的一點就是能夠快速存取，那麼HashSet對於contains方法，利用HashMap的containsKey方法，效率是很是之快的。在我看來，這個方法也是HashSet最核心的賣點方法之一。指針

6.容量檢查

 1 /**
 2      * Returns the number of elements in this set (its cardinality).
 3      *
 4      * @return the number of elements in this set (its cardinality)
 5      */
 6     public int size() {
 7         return map .size();
 8     }
 9 
10     /**
11      * Returns <tt>true</tt> if this set contains no elements.
12      *
13      * @return <tt> true</tt> if this set contains no elements
14      */
15     public boolean isEmpty() {
16         return map .isEmpty();
17     }

　　以上代碼都很簡單，由於基本都是基於HashMap實現，只要理解了HashMap，HashSet理解起來真的是小菜一碟了。

那麼HashSet就結束了。。。等等，不對還有一個東西，那就是迭代器，在HashMap和LinkedHashMap中都說過，這兩個的迭代器實現都要依賴Set接口，下面就讓咱們先看下HashSet的迭代器吧。

7.迭代器

7.1 HashMap的迭代器

在《 Iterator設計模式》中，咱們分析了，實現Iterator迭代器的幾個角色，而且本身簡單實現了一個。並且咱們看到Collection實現了Iterable接口，而且要求其子類實現一個返回Iterator接口的iterator()方法。那麼既然HashSet是Collection的孫子類，那麼HashSet也應該實現了一個返回Iterator接口的iterator()方法，對不對，咱們去看看。

 1     /**
 2      * Returns an iterator over the elements in this set.  The elements
 3      * are returned in no particular order.
 4      *
 5      * @return an Iterator over the elements in this set
 6      * @see ConcurrentModificationException
 7      */
 8     public Iterator<E> iterator() {
 9         return map .keySet().iterator();
10     }

　　我cha，咋回事，HashSet的iterator()方法居然也是利用HashMap實現的，咱們去看看HashMap的keySet()方法是什麼鬼。

1 public Set<K> keySet() {
2         Set<K> ks = keySet;
3         return (ks != null ? ks : (keySet = new KeySet()));
4 }

　　HashMap的keySet()方法的返回值居然是一個Set，具體實現是一個叫KeySet的東東，KeySet又是什麼鬼。

 1 private final class KeySet extends AbstractSet<K> {
 2         public Iterator<K> iterator() {
 3             return newKeyIterator();
 4         }
 5         public int size() {
 6             return size ;
 7         }
 8         public boolean contains(Object o) {
 9             return containsKey(o);
10         }
11         public boolean remove(Object o) {
12             return HashMap.this.removeEntryForKey(o) != null;
13         }
14         public void clear() {
15             HashMap. this.clear();
16         }
17 }

　　哦，KeySet是一個實現了AbstractSet的HashMap的內部類。而KeySet的iterator()方法返回的是一個newKeyIterator()方法，好繞好繞，頭暈了。

1 Iterator<K> newKeyIterator()   {
2         return new KeyIterator();
3 }

　　newKeyIterator()方法返回的又是一個KeyIterator()方法，what are you 弄啥嘞？

1 private final class KeyIterator extends HashIterator<K> {
2         public K next() {
3             return nextEntry().getKey();
4         }
5 }

　　好吧，不想說什麼了，繼續往下看吧。

 1 private abstract class HashIterator<E> implements Iterator<E> {
 2         // 下一個須要返回的節點
 3         Entry<K,V> next;   // next entry to return
 4         int expectedModCount ;     // For fast-fail
 5         int index ;          // current slot
 6         // 當前須要返回的節點
 7         Entry<K,V> current;// current entry
 8 
 9         HashIterator() {
10             expectedModCount = modCount ;
11             if (size > 0) { // advance to first entry
12                 Entry[] t = table;
13                // 初始化next參數，將next賦值爲HashMap底層的第一個不爲null節點
14                 while (index < t.length && ( next = t[index ++]) == null)
15                     ;
16             }
17         }
18 
19         public final boolean hasNext() {
20             return next != null;
21         }
22 
23         final Entry<K,V> nextEntry() {
24             if (modCount != expectedModCount)
25                 throw new ConcurrentModificationException();
26             // 取得HashMap底層數組中鏈表的一個節點
27             Entry<K,V> e = next;
28             if (e == null)
29                 throw new NoSuchElementException();
30 
31             // 將next指向下一個節點，並判斷是否爲null
32             if ((next = e.next) == null) {
33                 Entry[] t = table;
34                 // 若是爲null，則遍歷真個數組，知道取得一個不爲null的節點
35                 while (index < t.length && ( next = t[index ++]) == null)
36                     ;
37             }
38            current = e;
39            // 返回當前節點
40             return e;
41         }
42 
43         public void remove() {
44             if (current == null)
45                 throw new IllegalStateException();
46             if (modCount != expectedModCount)
47                 throw new ConcurrentModificationException();
48             Object k = current.key ;
49             current = null;
50             HashMap. this.removeEntryForKey(k);
51             expectedModCount = modCount ;
52         }
53 
54 }

　　最終找到了HashIterator這個類（也是HashMap的內部類），好累。。。主要看下nextEntry()這個方法，該方法主要思路是，首選拿去HashMap低層數組中第一個不爲null的節點，每次調用迭代器的next()方法，就用該節點next一下，噹噹前節點next到最後爲null，就拿數組中下一個不爲null的節點繼續遍歷。什麼意思呢，就是循環從數組第一個索引開始，遍歷整個Hash表。

至於你問我Iterator實現起來原本挺容易的一件事，爲何HashMap搞得這麼複雜，我只想說不要問我，我也不知道。。。

固然map是一個k-v鍵值對的容器，除了有對key的迭代keySet()，固然還有對value的迭代values（爲何value的迭代不是返回Set，由於value是能夠重複的嘛），還有對整個鍵值對k-v的迭代entrySet()，和上面的代碼都是一個原理，這裏就很少講了。

7.2 LinkedHashMap的迭代器

看完HashMap的Iterator實現，再來看下LinkedHashMap是怎麼實現的吧（不從頭開始找了，直接看最核心代碼吧）。

 1 private abstract class LinkedHashIterator<T> implements Iterator<T> {
 2        // header.after爲LinkedHashMap雙向鏈表的第一個節點，由於LinkedHashMap的header節點不保存數據
 3        Entry<K,V> nextEntry    = header .after;
 4        // 最後一次返回的節點
 5        Entry<K,V> lastReturned = null;
 6 
 7         /**
 8         * The modCount value that the iterator believes that the backing
 9         * List should have.  If this expectation is violated, the iterator
10         * has detected concurrent modification.
11         */
12         int expectedModCount = modCount;
13 
14         public boolean hasNext() {
15             return nextEntry != header;
16        }
17 
18         public void remove() {
19            if (lastReturned == null)
20                throw new IllegalStateException();
21            if (modCount != expectedModCount)
22                throw new ConcurrentModificationException();
23 
24             LinkedHashMap. this.remove(lastReturned .key);
25             lastReturned = null;
26             expectedModCount = modCount ;
27        }
28 
29        Entry<K,V> nextEntry() {
30            if (modCount != expectedModCount)
31                throw new ConcurrentModificationException();
32             if (nextEntry == header)
33                 throw new NoSuchElementException();
34 
35             // 將要返回的節點nextEntry賦值給lastReturned
36             // 將nextEntry賦值給臨時變量e（由於接下來nextEntry要指向下一個節點）
37             Entry<K,V> e = lastReturned = nextEntry ;
38             // 將nextEntry指向下一個節點
39             nextEntry = e.after ;
40             // 放回當前需返回的節點
41             return e;
42        }
43 }

　　能夠看出LinkedHashMap的迭代器，不在遍歷真個Hash表，而只是遍歷其自身維護的雙向循環鏈表，這樣就不在須要對數組中是否爲空節點進行的判斷。因此說LinkedHashMap在迭代器上的效率面一般是高與HashMap的，既然這裏是一般，那麼何時不一般呢，那就是HashMap中元素較少，分佈均勻，沒有空節點的時候。

Map的迭代器源碼讀起來比較不太容易懂（主要是各類調用，各類內部類，核心代碼很差找），可是找到核心代碼後，邏輯原理也就很容易看懂了，固然前提是創建在瞭解了HashMap和LinkedHashMap的底層存儲結構。

額，這一篇確實是講HashSet的，不是講Map，這算不算走題了。。。

HashSet 完！

參見：

給jdk寫註釋系列之jdk1.6容器(4)-HashMap源碼解析

給jdk寫註釋系列之jdk1.6容器(3)-Iterator設計模式