教妹學 Java：大有可爲的集合

時間 2019-12-08

標籤 java 大有可爲集合欄目 Java 简体版

原文原文鏈接

00、故事的起源

「二哥，上一篇《泛型》的反響效果怎麼樣啊？」三妹對她提議的《教妹學 Java》專欄非常關心。java

「有人評論說，‘二哥你敲代碼都敲出幻想了啊。’」node

「呵呵，這句話充斥着滿滿的諷刺意味啊。」三妹有點難過了起來。程序員

「不過，也有人評論說，‘建議這個系列的文章多寫啊，由於我花了半個月都沒看懂《 Java 編程思想》中關於泛型的講解，但再看完這篇文章後終於融會貫通了，比心。’」算法

「二哥，你能不能先說好消息啊？真是的。我也要給這位暖心的讀者比心了。」三妹說完這句話就在我面前比了一個心，我瞅了她一眼，發現她以前的愁容也無影無蹤了。編程

「那接下來，二哥還要繼續寫嗎？」我看到了三妹深情的目光。數組

「嗯，我想該寫集合了。」微信

「那就讓我繼續來提問吧，二哥你繼續來回答。」三妹已經躍躍欲試了。數據結構

0一、二哥，什麼是集合啊？

三妹，聽哥慢慢給你講啊。性能

JDK 1.2 的時候引入了集合的概念，用來包含一組數據結構。與數組不一樣的是，這些數據結構的存儲空間會隨着元素增長而動態增長。其中，有一些集合類支持添加劇復元素，而另外一些不支持；有一些支持添加 null 元素，而另外一些不支持。測試

能夠根據繼承體系將集合分爲兩大類，一類實現了 Collection 接口（見圖 1），另外一類實現了 Map 接口（見圖 2）。

介紹一下圖 1：

1）Collection 是全部集合類的根接口。

2）Set 接口的實現類不容許重複的元素，例如 HashSet、LinkedHashSet。

3）List 接口的實現類容許重複元素，可經過 index 訪問對應位置上的元素，例如 LinkedList、ArrayList。

4）Queue 接口的實現類容許在隊列的尾部或者頭部增長或者刪除元素，例如 PriorityQueue。

介紹一下圖 2：

1）HashMap 是最經常使用的 Map，能夠根據鍵直接獲取對應的值，它根據鍵的 hashCode 值存儲數據，因此訪問速度很是快。HashMap 最多隻容許一條記錄的鍵爲 null (多條會覆蓋)；但容許多條記錄的值爲 null。

2）TreeMap 可以把它保存的記錄根據鍵（不容許鍵的值爲 null）排序，默認是升序，也能夠指定排序的比較器，當用迭代器（Iterator）遍歷 TreeMap 時，獲得的記錄是排過序的。

3）Hashtable 的鍵和值均不容許爲 null，是線程同步的，也就是說任一時刻只有一個線程能寫 Hashtable，線程同步會消耗掉一些性能，所以 Hashtable 在寫入時花費的時間也會比較多。

4）LinkedHashMap 保存了記錄的插入順序，當用迭代器（Iterator）遍歷 LinkedHashMap 時，先獲得的記錄確定是先插入的。鍵和值均容許爲 null。

有了集合的幫助，程序員再也不須要親自實現元素的排序、查找等底層算法了。另外，基於數組實現的集合類在頻繁讀取時性能更佳，好比說 ArrayList；基於隊列實現的集合類在頻繁增長、更新、刪除數據時效率更高，好比說 LinkedList；程序員所要作的就是，根據業務須要選擇適當的集合類，至於性能調優嘛，能夠微信找二哥。

0二、二哥，LinkedList 和 ArrayList 有什麼區別啊？

三妹，剛提完問題就打盹啊，繼續聽哥給你慢慢講啊。

LinkedList 實際上是一個雙向鏈表，來看源碼。

public class LinkedList<E> {
    transient int size = 0;

    /** * Pointer to first node. * Invariant: (first == null && last == null) || * (first.prev == null && first.item != null) */
    transient Node<E> first;

    /** * Pointer to last node. * Invariant: (first == null && last == null) || * (last.next == null && last.item != null) */
    transient Node<E> last;

    private static class Node<E> {
        E item;
        Node<E> next;
        Node<E> prev;

        Node(Node<E> prev, E element, Node<E> next) {
            this.item = element;
            this.next = next;
            this.prev = prev;
        }
    }
}
複製代碼

1）LinkedList 包含一個很是重要的內部類——Node。Node 是節點所對應的數據結構，item 爲當前節點的值，prev 爲上一個節點，next 爲下一個節點——這也正是「雙向」鏈表的緣由。first 爲 LinkedList 的第一個節點，last 爲最後一個節點。

2）size 是 LinkedList 的節點個數。當往 LinkedList 添加一個元素時，size+1，刪除一個元素時，size-1。

ArrayList 實際上是一個動態數組，來看源碼。

public class ArrayList<E> {
     /** * The array buffer into which the elements of the ArrayList are stored. * The capacity of the ArrayList is the length of this array buffer. Any * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA * will be expanded to DEFAULT_CAPACITY when the first element is added. */
    transient Object[] elementData; // non-private to simplify nested class access

    /** * The size of the ArrayList (the number of elements it contains). * * @serial */
    private int size;
}
複製代碼

1）elementData 是 Object 類型的數組，用來保存添加到 ArrayList 中的元素。若是經過默認構造參數建立 ArrayList 對象時，elementData 的默認大小是 10。當 ArrayList 容量不足以容納所有元素時，就會從新設置容量，新的容量 = 原始容量 + (原始容量 >> 1)（參照如下代碼）。

private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    elementData = Arrays.copyOf(elementData, newCapacity);
}
複製代碼

>> 運算符尚未駕馭了。不過，經過代碼測試後的結論是，當原始容量爲 10 的時候，新的容量爲 15；當原始容量爲 20 的時候，新的容量爲 30。

2） size 是 ArrayList 的元素個數。當往 ArrayList 添加一個元素時，size+1，刪除一個元素時，size-1。

因爲 LinkedList 和 ArrayList 底層實現的不一樣（一個雙向鏈表，一個動態數組），它們之間的區別也很一目瞭然。

關鍵點1 ：LinkedList 在添加（add(E e)）、插入（add(int index, E element)）、刪除（remove(int index)）元素的性能上遠超 ArrayList。

爲何呢？先來看 ArrayList 的相關源碼。

// ensureCapacityInternal() 方法內部會調用 System.arraycopy()
public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

public void add(int index, E element) {
    System.arraycopy(elementData, index, elementData, index + 1,
                     size - index);
    elementData[index] = element;
    size++;
}

public E remove(int index) {
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}
複製代碼

觀察 ArrayList 的源碼，就可以發現，ArrayList 在添加、插入、刪除元素的時候，會有意或者無心（擴容）的調用 System.arraycopy(Object src, int srcPos, Object dest, int destPos, int length) 方法，該方法對性能的損耗是很是嚴重的。

再來看 LinkedList 的相關源碼。

/** * Links e as last element. */
void linkLast(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<>(l, e, null);
    last = newNode;
    if (l == null)
        first = newNode;
    else
        l.next = newNode;
}
/** * Unlinks non-null node x. */
E unlink(Node<E> x) {

    if (prev == null) {
        first = next;
    } else {
        prev.next = next;
        x.prev = null;
    }

    if (next == null) {
        last = prev;
    } else {
        next.prev = prev;
        x.next = null;
    }

    x.item = null;
    return element;
}
複製代碼

LinkedList 不存在擴容的問題，也不須要對原有的元素進行復制；只須要改變節點的數據就行了。

關鍵點2：LinkedList 在查找元素時要慢於 ArrayList。

爲何呢？先來看 LinkedList 的相關源碼。

/** * Returns the (non-null) Node at the specified element index. */
Node<E> node(int index) {
    // assert isElementIndex(index);

    if (index < (size >> 1)) {
        Node<E> x = first;
        for (int i = 0; i < index; i++)
            x = x.next;
        return x;
    } else {
        Node<E> x = last;
        for (int i = size - 1; i > index; i--)
            x = x.prev;
        return x;
    }
}
複製代碼

觀察 LinkedList 的源碼，就可以發現， LinkedList 在定位 index 的時候會先判斷位置（是在 1 / 2 的前面仍是後面），再從前日後或者從後往前執行 for 循環依次找。

再來看 ArrayList 的相關源碼。

@SuppressWarnings("unchecked")
E elementData(int index) {
    return (E) elementData[index];
}
複製代碼

ArrayList 直接根據 index 從數組中取出該位置上的元素，不須要 for 循環遍歷啊——這樣顯然更快！

0三、二哥，HashMap 和 TreeMap 有什麼區別啊？

三妹，提問題愈來愈有藝術了啊？繼續聽哥給你慢慢講啊。

HashMap 存儲的是鍵值對，其鍵是一個哈希碼（Hash 的直譯，也稱做散列）。來看源碼。

public class HashMap<K,V> {
    transient Node<K,V>[] table;
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
    }
    public HashMap(int initialCapacity, float loadFactor) {
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }
}
複製代碼

1）table 是一個 Node 數組，而 Node 是一個單向鏈表（只有 next）。HashMap 的鍵值對就存儲在 table 數組中。

2）loadFactor 就是大名鼎鼎的加載因子，默認的加載因子是 0.75, 聽說這是在時間和空間成本上尋求的一種折衷。

3）initialCapacity 就是初始容量，默認爲 16。 4）threshold 是 HashMap 的閾值——判斷是否須要對 HashMap 進行擴容，threshold 的值 = 容量 * 加載因子，當 HashMap 中存儲的數據數量達到 threshold 時，就須要將 HashMap 的容量加倍。

「初始容量」和「加載因子」對 HashMap 的性能影響頗大。容量是 HashMap 中桶（見下圖）的數量，初始容量只是 HashMap 在建立時的容量。加載因子是 HashMap 在其容量自動增長以前能夠達到多滿的一種尺度。

TreeMap 存儲的是有序的鍵值對，基於紅黑樹（Red-Black tree）實現。能夠在初始化的時候指定鍵位的排序方式，若是沒有指定的話就根據鍵位的天然順序進行排序。來看源碼。

public class TreeMap<K,V> {
    private final Comparator<? super K> comparator;
    private transient Entry<K,V> root;
    private static final boolean RED   = false;
    private static final boolean BLACK = true;
    static final class Entry<K,V> implements Map.Entry<K,V> {
        K key;
        V value;
        Entry<K,V> left;
        Entry<K,V> right;
        Entry<K,V> parent;
        boolean color = BLACK;
    }
}
複製代碼

1）root 是紅黑樹的根節點，是一個 Entry 類型（按照 key 進行排序），包含了 key（鍵）、value（值）、left（左邊的子節點）、right（右邊的子節點）、parent（父節點）、color（顏色）。

2）comparator 是紅黑樹的排序方式，是一個 Comparator 接口類型，該接口裏面有一個 compare 方法，有兩個參數 T o1 和 T o2，是泛型的表示方式，表示待比較的兩個對象，該方法的返回值是一個整形， o1大於o2，返回正整數； o1等於o2，返回0；o1小於o3，返回負整數。

總結一下就是，HashMap 適用於在 Map 中插入、刪除和定位元素；TreeMap 適用於按天然順序或自定義順序遍歷鍵（key）。

0四、二哥，再講講二分查找唄！

三妹，沒有任何問題，包在我身上。不過，在講以前，你能先去給哥泡杯咖啡嗎？

一般，咱們從數組中查找一個元素時，須要對整個數組進行遍歷。但若是這個數組是排序過的，就能夠進行二分查找了。

二分查找的方式：

第一步，將數組中間位置上的元素與要查找的對象進行比較，若是二者相等，則查找成功；不然進行第二步。

第二步，利用中間位置將數組分割成前、後兩個子集。

第三步，比較要查找的對象與中間位置上的元素，若是前者大於後者，則在後面的子集中按照以前的方式進行查找；不然，在前面的子集中按照以前的方式進行查找。

這樣作能夠將查找範圍縮減一半，大大的減小了查詢的次數。

Collections 類的 binarySearch() 方法實現了二分查找這個算法，能夠直接使用，前提是先要排序，不然將返回 -2。源碼以下。

private static <T>
int indexedBinarySearch(List<? extends Comparable<? super T>> list, T key) {
    int low = 0;
    int high = list.size()-1;

    while (low <= high) {
        int mid = (low + high) >>> 1;
        Comparable<? super T> midVal = list.get(mid);
        int cmp = midVal.compareTo(key);

        if (cmp < 0)
            low = mid + 1;
        else if (cmp > 0)
            high = mid - 1;
        else
            return mid; // key found
    }
    return -(low + 1);  // key not found
}
複製代碼

咱們來測試一下。

List<String> list1 = new ArrayList<>();
list1.add("沉");
list1.add("默");
list1.add("王");
list1.add("二");

Collections.sort(list1); // 先要排序
System.out.println(Collections.binarySearch(list1, "王")); // 2
複製代碼