二叉查找樹實現原理分析

時間 2019-12-04

原文原文鏈接

引言

二叉查找樹是一種能將鏈表插入的靈活性和有序數組查找的高效性結合起來的一種重要的數據結構，它是咱們後面學習紅黑樹和AVL樹的基礎，本文咱們就先來看一下二叉查找樹的實現原理。java

二叉查找樹的定義

二叉查找樹最重要的一個特徵就是：每一個結點都含有一個Comparable的鍵及其相關聯的值，該結點的鍵要大於左子樹中全部結點的鍵，而小於右子樹中全部結點的鍵。node

下圖就是一個典型的二叉查找樹，咱們以結點E爲例，能夠觀察到，左子樹中的全部結點A和E都要小於E，而右子樹中全部的結點R和H都要大於結點E。git

在實現二叉查找樹中相關操做以前咱們先要來定義一個二叉查找樹，因爲Java中不支持指針操做，咱們能夠用內部類Node來替代以表示樹中的結點，每一個Node對象都含有一對鍵值(key和val)，兩條連接(left和right)，和子節點計數器(size)。另外咱們還提早實現了size(), isEmpty()和contains()這幾個基礎方法，三種分別用來計算二叉樹中的結點數目，判斷二叉樹是否爲空，判斷二叉樹中是否含有包含指定鍵的結點。github

public class BST<Key extends Comparable<Key>, Value> {
    private Node root;             // root of BST

    private class Node {
        private Key key;           // sorted by key
        private Value val;         // associated data
        private Node left, right;  // left and right subtrees
        private int size;          // number of nodes in subtree

        public Node(Key key, Value val, int size) {
            this.key = key;
            this.val = val;
            this.size = size;
        }
    }

    // Returns the number of key-value pairs in this symbol table.
    public int size() {
        return size(root);
    }

    // Returns number of key-value pairs in BST rooted at x.
    private int size(Node x) {
        if(x == null) {
            return 0;
        } else {
            return x.size;
        }
    }

    // Returns true if this symbol table is empty.
    public boolean isEmpty() {
        return size() == 0;
    }

    // Returns true if this symbol table contains key and false otherwise.
    public boolean contains(Key key) {
        if(key == null) {
            throw new IllegalArgumentException("argument to contains() is null");
        } else {
            return get(key) != null;
        }
    }
}

查找和插入操做的實現

查找操做

咱們先來看一下如何在二叉樹中根據指定的鍵查找到它相關聯的結點。查找會有兩種結果：查找成功或者不成功，咱們以查找成功的情形來分析一下整個查找的過程。前面咱們提到了二叉查找樹的一個重要特徵就是：左子樹的結點都要小於根結點，右子樹的結點都要大於根結點。根據這一性質，咱們從根結點開始遍歷二叉樹，遍歷的過程當中會出現3種狀況：數組

若是查找的鍵key小於根結點的key，說明咱們要查找的鍵若是存在的話確定在左子樹，由於左子樹中的結點都要小於根結點，接下來咱們繼續遞歸遍歷左子樹。
若是要查找的鍵key大於根結點的key，說明咱們要查找的鍵若是存在的話確定在右子樹中，由於右子樹中的結點都要大於根節點，接下來咱們繼續遞歸遍歷右子樹。
若是要查找的鍵key等於根結點的key，那麼咱們就直接返回根結點的val。

上面的操做咱們利用遞歸能夠很是容易的實現，代碼以下：數據結構

/**
 * Returns the value associated with the given key.
 *
 * @param  key the key
 * @return the value associated with the given key if the key is in the symbol table
 *         and null if the key is not in the symbol table
 * @throws IllegalArgumentException if key is null
 */
public Value get(Key key) {
    if(key == null) {
        throw new IllegalArgumentException("first argument to put() is null");
    } else {
        return get(root, key);
    }
}

private Value get(Node x, Key key) {
    if(x == null) {
        return null;
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp < 0) {
            return get(x.left, key);
        } else if(cmp > 0) {
            return get(x.right, key);
        } else {
            return x.val;
        }
    }
}

插入操做

若是理解了上面的查找操做，插入操做其實也很好理解，咱們首先要找到咱們新插入結點的位置，其思想和查找操做同樣。找到插入的位置後咱們就將新結點插入二叉樹。只是這裏還要加一個步驟：更新結點的size，由於咱們剛剛新插入告終點，該結點的父節點，父節點的父節點的size都要加一。less

插入操做的實現一樣有多種實現方法，可是遞歸的實現應該是最爲清晰的。下面的代碼的思想和get基本相似，只是多了x.N = size(x.left) + size(x.right) + 1;這一步驟用來更新結點的size大小。學習

/**
 * Inserts the specified key-value pair into the symbol table, overwriting the old
 * value with the new value if the symbol table already contains the specified key.
 * Deletes the specified key (and its associated value) from this symbol table
 * if the specified value is null.
 *
 * @param  key the key
 * @param  val the value
 * @throws IllegalArgumentException if key is null
 */
public void put(Key key, Value val) {
    if(key == null) {
        throw new IllegalArgumentException("first argument to put() is null");
    }
    if(val == null) {
        delete(key);
        return;
    }
    root = put(root, key, val);
    // assert check(); // Check integrity of BST data structure.
}

private Node put(Node x, Key key, Value val) {
    if(x == null) {
        return new Node(key, val, 1);
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp < 0) {
            x.left = put(x.left, key, val)
        } else if(cmp > 0) {
            x.right = put(x.right, key, val);
        } else {
            x.val = val;
        }
        // reset links and increment counts on the way up
        x.size = size(x.left) + size(x.right) + 1;
        return x;
    }
}

select與rank的實現

select的實現

上面咱們的get()操做是經過指定的key去在二叉查找樹中查詢其關聯的結點，二叉查找樹的另一個優勢就是它能夠必定程度上保證數據的有序性，因此咱們能夠較高效的去查詢第n小的數據。this

首先咱們來思考一個問題：怎麼知道一個二叉查找樹中小於指定結點的子結點的個數？這一點根據二叉查找樹的性質-左子樹中的結點都要小於根結點很容易實現，咱們只須要統計左子樹的大小就好了。結合下面這幅圖，以查找二叉樹第4小的結點咱們來看一下select操做的具體流程。spa

依次遍歷二叉樹，咱們來到了圖2中的E結點，E結點的左子樹有2個結點，它是二叉樹中第3小的結點，因此咱們能夠判斷出要查找的結點確定在E結點的右子樹中。因爲咱們要查找第4小的結點，而E又是二叉樹中第3小的結點，因此咱們要查找的這個結點接下來確定要知足一個特徵：E的右子樹中只有0個比它更小的結點，即右子樹中最小的結點H。

select的實現以下，實際就是根據左子樹的結點數目來判斷當前結點在二叉樹中的大小。

/**
 * Return the kth smallest key in the symbol table.
 *
 * @param  k the order statistic
 * @return the kth smallest key in the symbol table
 * @throws IllegalArgumentException unless k is between 0 and n-1
 */
public Key select(int k) {
    if (k < 0 || k >= size()) {
        throw new IllegalArgumentException("called select() with invalid argument: " + k);
    } else {
        Node x = select(root, k);
        return x.key;
    }
}

// Return the key of rank k.
public Node select(Node x, int k) {
    if(x == null) {
        return null;
    } else {
        int t = size(x.left);
        if(t > k) {
            return select(x.left, k);
        } else if(t < k) {
            return select(x.right, k);
        } else {
            return x;
        }
    }
}

rank就是查找指定的鍵key在二叉樹中的排名，實現代碼以下，思想和上面一致我就不重複解釋了。

/**
 * Return the number of keys in the symbol table strictly less than key.
 *
 * @param  key the key
 * @return the number of keys in the symbol table strictly less than key
 * @throws IllegalArgumentException if key is null
 */
public int rank(Key key) {
    if (key == null) {
        throw new IllegalArgumentException("argument to rank() is null");
    } else {
        return rank(key, root);
    }
}

public int rank(Key key, Node x) {
    if(x == null) {
        return 0;
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp < 0) {
            return rank(key, x.left);
        } else if(cmp > 0) {
            return 1 + size(x.left) + rank(key, x.right);
        } else {
            return size(x.left);
        }
    }
}

刪除操做

刪除操做是二叉查找樹中最難實現的方法，在實現它以前，咱們先來看一下如何刪除二叉查找樹中最小的結點。

爲了實現deleteMin()，咱們首先要找到這個最小的節點，很明顯這個結點就是樹中最左邊的結點A，咱們重點關注的是怎麼刪除這個結點A。在咱們下面這幅圖中結點E的左子樹中的兩個結點A和C都是小於結點E的，咱們只須要將結點E的左連接由A變爲C便可，而後A就會自動被GC回收。最後一步就是更新節點的size了。

具體的實現代碼以下：

/**
 * Removes the smallest key and associated value from the symbol table.
 *
 * @throws NoSuchElementException if the symbol table is empty
 */
public void deleteMin() {
    if (isEmpty()) {
        throw new NoSuchElementException("Symbol table underflow");
    } else {
        root = deleteMin(root);
        // assert check(); // Check integrity of BST data structure.
    }
}

private Node deleteMin(Node x) {
    if(x.left == null) {
        return x.right;
    } else {
        x.left = deleteMin(x.left);
        x.size = size(x.left) + size(x.right) + 1;
        return x;
    }
}

刪除最大的結點也是一個道理，我就不重複解釋了：

/**
 * Removes the largest key and associated value from the symbol table.
 *
 * @throws NoSuchElementException if the symbol table is empty
 */
public void deleteMax() {
    if (isEmpty()) {
        throw new NoSuchElementException("Symbol table underflow");
    } else {
        root = deleteMax(root);
        // assert check(); // Check integrity of BST data structure.
    }
}

private Node deleteMax(Node x) {
    if (x.right == null) {
        return x.left;
    } else {
        x.right = deleteMax(x.right);
        x.size = size(x.left) + size(x.right) + 1;
        return x;
    }
}

接下來咱們結合下圖來一步步完整地看一下整個刪除操做的過程，首先仍是和上面同樣咱們要找到須要刪除的結點E，而後咱們要在E的右子樹中找到最小結點，這裏是H，接下來咱們就用H替代E就好了。爲何能夠直接用H替代E呢？由於H結點大於E的左子樹的全部結點，小於E的右子樹中的其它全部結點，因此這一次替換並不會破壞二叉樹的特性。

實現代碼以下，這裏解釋一下執行到了// find key後的代碼，這個時候會出現三種狀況：

結點的右連接爲空，這個時候咱們直接返回左連接來替代刪除結點。
結點的左連接爲空，這個時候返回右連接來替代刪除結點。
左右連接都不爲空的話，就是咱們上圖中的那種情形了。

/**
 * Removes the specified key and its associated value from this symbol table
 * (if the key is in this symbol table).
 *
 * @param  key the key
 * @throws IllegalArgumentException if key is null
 */
public void delete(Key key) {
    if (key == null) {
        throw new IllegalArgumentException("argument to delete() is null");
    } else {
        root = delete(root, key);
        // assert check(); // Check integrity of BST data structure.
    }
}

private Node delete(Key key) {
    if(x == null) {
        return null;
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp < 0) {
            x.left = delete(x.left, key);
        } else if(cmp > 0) {
            x.right = delete(x.right, key);
        } else {
            // find key
            if(x.right == null) {
                return x.left;
            } else if(x.left == null) {
                return x.right;
            } else {
                Node t = x;
                x = min(t.right);
                x.right = deleteMin(t.right);
                x.left = t.left;
            }
        }
        // update links and node count after recursive calls
        x.size = size(x.left) + size(x.right) + 1;
        return x;
    }
}

floor和ceiling的實現

floor的實現

floor()要實現的就是向下取整，咱們來分析一下它的執行流程：

若是指定的鍵key小於根節點的鍵，那麼小於等於key的最大結點確定就在左子樹中了。
若是指定的鍵key大於根結點的鍵，狀況就要複雜一些，這個時候要分兩種狀況：1>當右子樹中存在小於等於key的結點時，小於等於key的最大結點則在右子樹中；2>反之根節點自身就是小於等於key的最大結點了。

具體實現代碼以下：

/**
 * Returns the largest key in the symbol table less than or equal to key.
 *
 * @param  key the key
 * @return the largest key in the symbol table less than or equal to key
 * @throws NoSuchElementException if there is no such key
 * @throws IllegalArgumentException if  key is null
 */
public Key floor(Key key) {
    if (key == null) {
        throw new IllegalArgumentException("argument to floor() is null");
    }
    if (isEmpty()) {
        throw new NoSuchElementException("called floor() with empty symbol table");
    }
    Node x = floor(root, key);
    if (x == null) {
        return null;
    } else {
        return x.key;
    }
}


private Node floor(Node x, Key key) {
    if (x == null) {
        return null;
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp == 0) {
            return x;
        } else if(cmp < 0) {
            return floor(x.left, key);
        } else {
            Node t = floor(x.right, key);
            if(t != null) {
                return t;
            } else {
                return x;
            }
        }
    }
}

rank的實現

rank()則與floor()相反，它作的是向下取整，即找到大於等於key的最小結點。可是二者的實現思路是一致的，只要將上面的左變爲右，小於變爲大於就好了：

/**
 * Returns the smallest key in the symbol table greater than or equal to {@code key}.
 *
 * @param  key the key
 * @return the smallest key in the symbol table greater than or equal to {@code key}
 * @throws NoSuchElementException if there is no such key
 * @throws IllegalArgumentException if {@code key} is {@code null}
 */
public Key ceiling(Key key) {
    if(key == null) {
        throw new IllegalArgumentException("argument to ceiling() is null");
    }
    if(isEmpty()) {
        throw new NoSuchElementException("called ceiling() with empty symbol table");
    }
    Node x = ceiling(root, key);
    if(x == null) {
        return null;
    } else {
        return x.key;
    }
}

private Node ceiling(Node x, Key key) {
    if(x == null) {
        return null;
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp == 0) {
            return x;
        } else if(cmp < 0) {
            Node t = ceiling(x.left, key);
            if (t != null) {
                return t;
            } else {
                return x;
            }
        } else {
            return ceiling(x.right, key);
        }
    }
}