學習Java Collection Framework的Iterator實現

時間 2019-11-06

標籤學習 java collection framework iterator 實現欄目 Java 简体版

原文原文鏈接

繼續研讀JDK的源碼，在比較HashMap和ConcurrentHashMap的不一樣之處發現了一個細節——關於Iterator的實現的不一樣，其實HashMap和ConcurrentHashMap還有更多不一樣的地方，這也是面試常常問到的問題，有一篇文章我以爲講的很好了，Java進階（六）從ConcurrentHashMap的演進看Java多線程核心技術。
Iterator是一種設計模式，在Java Collection Framework中常常做爲容器的視圖(view)，大多數時候只支持刪除、不支持增長，提供統一的接口方法等特色。在Java Collection Framework的Iterator實現中大多數是fast-fail方式的，而支持併發的容器數據結構則沒有這個限制。javascript

非併發數據結構的狀況

常見的使用方法

1）使用Iterator遍歷字符串列表java

List<String> lists = Arrays.asList("a","b","c");
Iterator<String> iterator = lists.iterator();
while (iterator.hasNext()) {
    String val = iterator.next();
    System.out.println(val);
}

這種作法是for..each的語法的展開形式node

for(String val: lists){
    //sout
}

2）使用Iterator遍歷LinkedList面試

LinkedList<String> linkedList = new LinkedList<>(lists);
iterator = linkedList.iterator();
while (iterator.hasNext()) {
    String val = iterator.next();
    System.out.println(val);
}

3）使用Iterator遍歷HashMap設計模式

Map<String,Integer> hmap = new HashMap<>(3);
hmap.put("a",1);
hmap.put("b",2);
hmap.put("c",3);

Iterator<Map.Entry<String,Integer>> mapIterator = hmap.entrySet().iterator();
while (mapIterator.hasNext()) {
    Map.Entry<String,Integer> entry = mapIterator.next();
    System.out.println(entry.getKey() + ":" + entry.getValue());
}

非併發數據結構Iterator的實現

1）ArrayList中的Iterator數組

list中的結構是順序的，Iterator既然是List的視圖，那它也表現了相同的順序。
ArrayList得到Iterator,數據結構

/**
* Returns an iterator over the elements in this list in proper sequence.
*
* <p>The returned iterator is <a href="#fail-fast"><i>fail-fast</i></a>.
*
* @return an iterator over the elements in this list in proper sequence
*/
public Iterator<E> iterator() {
    return new Itr();
}

源碼，多線程

/**
* An optimized version of AbstractList.Itr
*/
private class Itr implements Iterator<E> {
    int cursor;       // index of next element to return
    int lastRet = -1; // index of last element returned; -1 if no such
    int expectedModCount = modCount;

    public boolean hasNext() {
        return cursor != size;
    }

    @SuppressWarnings("unchecked")
    public E next() {
        checkForComodification();
        int i = cursor;
        if (i >= size)
            throw new NoSuchElementException();
        Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length)
            throw new ConcurrentModificationException();
        cursor = i + 1;
        return (E) elementData[lastRet = i];
    }

    public void remove() {
        if (lastRet < 0)
            throw new IllegalStateException();
        checkForComodification();

        try {
            ArrayList.this.remove(lastRet);
            cursor = lastRet;
            lastRet = -1;
            expectedModCount = modCount;
        } catch (IndexOutOfBoundsException ex) {
            throw new ConcurrentModificationException();
        }
    }

    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
}

Itr是ArrayList的一個內部類，它能使用宿主類的成員變量，事實上Itr反映了ArrayList的內部狀況，使用了size、expectedModCount和elementData等屬性。經過遊標cursor的方式不斷往前遞進，只要遊標小於size就說明依然還有元素能夠訪問。
應該看到的是，在調用了new Iterator()以後，能夠看作Itr對ArrayList作了快照，這裏的快照並非很嚴格，是基於modCount比較來實現的。它在初始化時備份了modCount的值，保存爲私有的變量expectedModCount。閉包

首先Iterator接口並無諸如add的方法，即不能經過Iterator來爲容器增長元素；
其次，若是有其餘線程變化了容器的結構（structural modification），那麼ArrayList.this.modCount的值會發生改變，那麼在Itr執行next或者remove方法時會判斷出來modCount != expectedModCount的狀況，從而拋出異常fast-fail。
再次，若是執行了Itr的remove方法，它可以調用ArrayList.this.remove的方法，而後修正遊標和expectedModCount等。併發

ArrayList.this.remove(lastRet);
cursor = lastRet;
lastRet = -1;
expectedModCount = modCount;

2）LinkedList中的Iterator

LinkedList的Iterator和ArrayList中的有一些相似的地方。
首先，LinkedList的iterator入口方法實際上是AbstractSequentialList抽象類中，

/**
* Returns an iterator over the elements in this list (in proper
* sequence).<p>
*
* This implementation merely returns a list iterator over the list.
*
* @return an iterator over the elements in this list (in proper sequence)
*/
public Iterator<E> iterator() {
    return listIterator();
}

/**
* Returns a list iterator over the elements in this list (in proper
* sequence).
*
* @param  index index of first element to be returned from the list
*         iterator (by a call to the <code>next</code> method)
* @return a list iterator over the elements in this list (in proper
*         sequence)
* @throws IndexOutOfBoundsException {@inheritDoc}
*/
public abstract ListIterator<E> listIterator(int index);

而這個ListIterator是一個接口，它被LinkedList$ListItr實現，

private class ListItr implements ListIterator<E> {
    private Node<E> lastReturned = null;
    private Node<E> next;
    private int nextIndex;
    private int expectedModCount = modCount;

    ListItr(int index) {
        // assert isPositionIndex(index);
        next = (index == size) ? null : node(index);
        nextIndex = index;
    }

    public boolean hasNext() {
        return nextIndex < size;
    }

    public E next() {
        checkForComodification();
        if (!hasNext())
            throw new NoSuchElementException();

        lastReturned = next;
        next = next.next;
        nextIndex++;
        return lastReturned.item;
    }

    public boolean hasPrevious() {
        return nextIndex > 0;
    }

    public E previous() {
        checkForComodification();
        if (!hasPrevious())
            throw new NoSuchElementException();

        lastReturned = next = (next == null) ? last : next.prev;
        nextIndex--;
        return lastReturned.item;
    }

    public int nextIndex() {
        return nextIndex;
    }

    public int previousIndex() {
        return nextIndex - 1;
    }

    public void remove() {
        checkForComodification();
        if (lastReturned == null)
            throw new IllegalStateException();

        Node<E> lastNext = lastReturned.next;
        unlink(lastReturned);
        if (next == lastReturned)
            next = lastNext;
        else
            nextIndex--;
        lastReturned = null;
        expectedModCount++;
    }

    public void set(E e) {
        if (lastReturned == null)
            throw new IllegalStateException();
        checkForComodification();
        lastReturned.item = e;
    }

    public void add(E e) {
        checkForComodification();
        lastReturned = null;
        if (next == null)
            linkLast(e);
        else
            linkBefore(e, next);
        nextIndex++;
        expectedModCount++;
    }

    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
}

LinkedList的Iterator要比ArrayList中的複雜一些，它更支持了add等方法；
相似原來遊標的遍歷方式，基於size、expectedModCount等比較邏輯依然存在，只不過遍歷的方式不是原來的下標增進，而是節點之間的next指針來實現。

3）HashMap中的Iterator

HashMap有多個view視圖，keySet, values, entrySet，這裏分析下entrySet這個視圖，另外兩個原理和entrySet視圖的差很少。

private final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
    public Iterator<Map.Entry<K,V>> iterator() {
        return newEntryIterator();
    }
    public boolean contains(Object o) {
        if (!(o instanceof Map.Entry))
            return false;
        Map.Entry<K,V> e = (Map.Entry<K,V>) o;
        Entry<K,V> candidate = getEntry(e.getKey());
        return candidate != null && candidate.equals(e);
    }
    public boolean remove(Object o) {
        return removeMapping(o) != null;
    }
    public int size() {
        return size;
    }
    public void clear() {
        HashMap.this.clear();
    }
}

EntrySet的iterator方法中調用了newEntryIterator,將構造EntryIterator實例，
EntryIterator源碼

private final class EntryIterator extends HashIterator<Map.Entry<K,V>> {
    public Map.Entry<K,V> next() {
        return nextEntry();
    }
}

EntryIterator繼承了HashIterator類，複用了父類的大部分方法，只是覆蓋了next方法。
HashIterator源碼，

private abstract class HashIterator<E> implements Iterator<E> {
    Entry<K,V> next;        // next entry to return
    int expectedModCount;   // For fast-fail
    int index;              // current slot
    Entry<K,V> current;     // current entry

    HashIterator() {
        expectedModCount = modCount;
        if (size > 0) { // advance to first entry
            Entry[] t = table;
            while (index < t.length && (next = t[index++]) == null)
                ;
        }
    }

    public final boolean hasNext() {
        return next != null;
    }

    final Entry<K,V> nextEntry() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        Entry<K,V> e = next;
        if (e == null)
            throw new NoSuchElementException();

        if ((next = e.next) == null) {
            Entry[] t = table;
            while (index < t.length && (next = t[index++]) == null)
                ;
        }
        current = e;
        return e;
    }

    public void remove() {
        if (current == null)
            throw new IllegalStateException();
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        Object k = current.key;
        current = null;
        HashMap.this.removeEntryForKey(k);
        expectedModCount = modCount;
    }
}

因爲HashMap的結構並非順序的，在執行Iterator.next方法時不能經過next指針或下標的方式直接找到下一個元素，HashIterator爲了能達到這個目的，在構造函數和nextEntry方法中預先作了advance處理。

//構造函數中
if (size > 0) { // advance to first entry
    Entry[] t = table;
    while (index < t.length && (next = t[index++]) == null)
        ;
}
//nextEntry中
if ((next = e.next) == null) {
    Entry[] t = table;
    while (index < t.length && (next = t[index++]) == null)
        ;
}

構造函數中預先在HashMap的table數組找到第一個頭結點不爲null的元素；
(next = t[index++]) == null的寫法有點迷惑性，不考慮HashMap爲空的狀況，index自增停在next != null的狀況，即 next = t[index-1], index已經往前一步了；

在nextEntry中若是發現e.next是null，此時表示table這個數組元素的鏈表遍歷結束了，須要跳到下一個頭節點不爲空的元素繼續遍歷，而index恰好往前一步了，此時繼續執行

next = t[index++]

假設next[index]不爲空，那麼下一個遍歷的數組元素頭節點找到，而且index已經自增了。

併發數據結構的狀況

以ConcurrentHashMap爲例，看ConcurrentHashMap$HashInteraotr的實現

abstract class HashIterator {
    int nextSegmentIndex;
    int nextTableIndex;
    HashEntry<K,V>[] currentTable;
    HashEntry<K, V> nextEntry;
    HashEntry<K, V> lastReturned;

    HashIterator() {
        nextSegmentIndex = segments.length - 1;
        nextTableIndex = -1;
        advance();
    }

    /**
    * Set nextEntry to first node of next non-empty table
    * (in backwards order, to simplify checks).
    */
    final void advance() {
        for (;;) {
            if (nextTableIndex >= 0) {
                if ((nextEntry = entryAt(currentTable,
                                            nextTableIndex--)) != null)
                    break;
            }
            else if (nextSegmentIndex >= 0) {
                Segment<K,V> seg = segmentAt(segments, nextSegmentIndex--);
                if (seg != null && (currentTable = seg.table) != null)
                    nextTableIndex = currentTable.length - 1;
            }
            else
                break;
        }
    }

    final HashEntry<K,V> nextEntry() {
        HashEntry<K,V> e = nextEntry;
        if (e == null)
            throw new NoSuchElementException();
        lastReturned = e; // cannot assign until after null check
        if ((nextEntry = e.next) == null)
            advance();
        return e;
    }

    public final boolean hasNext() { return nextEntry != null; }
    public final boolean hasMoreElements() { return nextEntry != null; }

    public final void remove() {
        if (lastReturned == null)
            throw new IllegalStateException();
        ConcurrentHashMap.this.remove(lastReturned.key);
        lastReturned = null;
    }
}

這裏能看到ConcurrentHashMap的segment分段因素所在，在構造函數中指定了最後一個segment數組元素，而後作advance處理，也是從後往前處理的。首先找到不爲null的分段segment，而後纔是在segment的table數組中找到不爲null的元素，這都是從後往前「前進」的。

而與HashMap不一樣的地方，ConcurrentHashMap的Iterator並非fast-fail的，它並無判斷modCount;除此以外還應該看到它對nextEntry的處理，在advance的方法調用如下兩個方法，

/**
* Gets the jth element of given segment array (if nonnull) with
* volatile element access semantics via Unsafe. (The null check
* can trigger harmlessly only during deserialization.) Note:
* because each element of segments array is set only once (using
* fully ordered writes), some performance-sensitive methods rely
* on this method only as a recheck upon null reads.
*/
@SuppressWarnings("unchecked")
static final <K,V> Segment<K,V> segmentAt(Segment<K,V>[] ss, int j) {
    long u = (j << SSHIFT) + SBASE;
    return ss == null ? null :
        (Segment<K,V>) UNSAFE.getObjectVolatile(ss, u);
}
/**
* Gets the ith element of given table (if nonnull) with volatile
* read semantics. Note: This is manually integrated into a few
* performance-sensitive methods to reduce call overhead.
*/
@SuppressWarnings("unchecked")
static final <K,V> HashEntry<K,V> entryAt(HashEntry<K,V>[] tab, int i) {
    return (tab == null) ? null :
        (HashEntry<K,V>) UNSAFE.getObjectVolatile
        (tab, ((long)i << TSHIFT) + TBASE);
}

它們都是調用了UNSAFE.getObjectVolatile方法，利用了volatile access的方式，相較於上鎖的方式性能更好。

番外篇

JavaScript實現的Iterator的例子

這個例子來自MDN的文檔，作法比較簡潔,迭代器

function makeIterator(array){
    var nextIndex = 0;

    return {
       next: function(){
           return nextIndex < array.length ?
               {value: array[nextIndex++], done: false} :
               {done: true};
       }
    };
}
var it = makeIterator(['yo', 'ya']);
console.log(it.next().value); // 'yo'
console.log(it.next().value); // 'ya'
console.log(it.next().done);  // true

能夠考慮給這個makeIterator的返回值加上hasNext屬性，

return {
    next: ...,
    hasNext: function() {
        return nextIndex < array.length;
    }
}

JavaScript利用了閉包實現了Iterator和Java利用內部類實現有類似的地方。

總結

Iterator的主要目的仍是爲了表現底層數據結構的全部元素，提供一種統一的遍歷方式。在不一樣的數據結構須要針對不一樣語義作出改動，像LinkedList的支持add方法，像ConcurrentHashMap和HashMap的advance處理，像ConcurrentHashMap那樣不判斷modeCount而使用volatile access等。