JavaScript數據結構——字典和散列表的實現

時間 2019-11-05

標籤 javascript 數據結構字典列表實現欄目 JavaScript 简体版

原文原文鏈接

　　在前一篇文章中，咱們介紹瞭如何在JavaScript中實現集合。字典和集合的主要區別就在於，集合中數據是以[值，值]的形式保存的，咱們只關心值自己；而在字典和散列表中數據是以[鍵，值]的形式保存的，鍵不能重複，咱們不只關心鍵，也關心鍵所對應的值。html

　　咱們也能夠把字典稱之爲映射表。因爲字典和集合很類似，咱們能夠在前一篇文章中的集合類Set的基礎上來實現咱們的字典類Dictionary。與Set類類似，ES6的原生Map類已經實現了字典的所有功能，稍後咱們會介紹它的用法。算法

　　下面是咱們的Dictionary字典類的實現代碼：編程

class Dictionary {
    constructor () {
        this.items = {};
    }

    set (key, value) { // 向字典中添加或修改元素
        this.items[key] = value;
    }

    get (key) { // 經過鍵值查找字典中的值
        return this.items[key];
    }

    delete (key) { // 經過使用鍵值來從字典中刪除對應的元素
        if (this.has(key)) {
            delete this.items[key];
            return true;
        }
        return false;
    }

    has (key) { // 判斷給定的鍵值是否存在於字典中
        return this.items.hasOwnProperty(key);
    }

    clear() { // 清空字典內容
        this.items = {};
    }

    size () { // 返回字典中全部元素的數量
        return Object.keys(this.items).length;
    }

    keys () { // 返回字典中全部的鍵值
        return Object.keys(this.items);
    }

    values () { // 返回字典中全部的值
        return Object.values(this.items);
    }

    getItems () { // 返回字典中的全部元素
        return this.items;
    }
}

　　與Set類很類似，只是把其中value的部分替換成了key。咱們來看看一些測試用例：數組

let Dictionary = require('./dictionary');

let dictionary = new Dictionary();
dictionary.set('Gandalf', 'gandalf@email.com');
dictionary.set('John', 'john@email.com');
dictionary.set('Tyrion', 'tyrion@email.com');
console.log(dictionary.has('Gandalf')); // true
console.log(dictionary.size()); // 3
console.log(dictionary.keys()); // [ 'Gandalf', 'John', 'Tyrion' ]
console.log(dictionary.values()); // [ 'gandalf@email.com', 'john@email.com', 'tyrion@email.com' ]
console.log(dictionary.get('Tyrion')); // tyrion@email.com

dictionary.delete('John');
console.log(dictionary.keys()); // [ 'Gandalf', 'Tyrion' ]
console.log(dictionary.values()); // [ 'gandalf@email.com', 'tyrion@email.com' ]
console.log(dictionary.getItems()); // { Gandalf: 'gandalf@email.com', Tyrion: 'tyrion@email.com' }

　　相應地，下面是使用ES6的原生Map類的測試結果：數據結構

let dictionary = new Map();
dictionary.set('Gandalf', 'gandalf@email.com');
dictionary.set('John', 'john@email.com');
dictionary.set('Tyrion', 'tyrion@email.com');
console.log(dictionary.has('Gandalf')); // true
console.log(dictionary.size); // 3
console.log(dictionary.keys()); // [Map Iterator] { 'Gandalf', 'John', 'Tyrion' }
console.log(dictionary.values()); // [Map Iterator] { 'gandalf@email.com', 'john@email.com', 'tyrion@email.com' }
console.log(dictionary.get('Tyrion')); // tyrion@email.com

dictionary.delete('John');
console.log(dictionary.keys()); // [Map Iterator] { 'Gandalf', 'Tyrion' }
console.log(dictionary.values()); // [Map Iterator] { 'gandalf@email.com', 'tyrion@email.com' }
console.log(dictionary.entries()); // [Map Iterator] { [ Gandalf: 'gandalf@email.com' ], [ Tyrion: 'tyrion@email.com' ] }

　　和前面咱們自定義的Dictionary類稍微有一點不一樣，values()方法和keys()方法返回的不是一個數組，而是Iterator迭代器。另外一個就是這裏的size是一個屬性而不是方法，而後就是Map類沒有getItems()方法，取而代之的是entries()方法，它返回的也是一個Iterator。有關Map類的詳細詳細介紹能夠查看這裏。app

　　在ES6中，除了原生的Set和Map類外，還有它們的弱化版本，分別是WeakSet和WeakMap，咱們在《JavaScript數據結構——棧的實現與應用》一文中已經見過WeakMap的使用了。Map和Set與它們各自的弱化版本之間的主要區別是：編程語言

WeakSet或WeakMap類沒有entries、keys和values等迭代器方法，只能經過get和set方法訪問和設置其中的值。這也是爲何咱們在《JavaScript數據結構——棧的實現與應用》一文中要使用WeakMap類來定義類的私有屬性的緣由。
只能用對應做爲鍵值，或者說其中的內容只能是對象，而不能是數字、字符串、布爾值等基本數據類型。

　　弱化的Map和Set類主要是爲了提供JavaScript代碼的性能。函數

散列表

　　散列表（或者叫哈希表），是一種改進的dictionary，它將key經過一個固定的算法（散列函數或哈希函數）得出一個數字，而後將dictionary中key所對應的value存放到這個數字所對應的數組下標所包含的存儲空間中。在原始的dictionary中，若是要查找某個key所對應的value，咱們須要遍歷整個字典。爲了提升查詢的效率，咱們將key對應的value保存到數組裏，只要key不變，使用相同的散列函數計算出來的數字就是固定的，因而就能夠很快地在數組中找到你想要查找的value。下面是散列表的數據結構示意圖：性能

　　下面是咱們散列函數loseloseHashCode()的實現代碼：測試

loseloseHashCode (key) {
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
        hash += key.charCodeAt(i);
    }
    return hash % 37;
}

　　這個散列函數的實現很簡單，咱們將傳入的key中的每個字符使用charCodeAt()函數（有關該函數的詳細內容能夠查看這裏）將其轉換成ASCII碼，而後將這些ASCII碼相加，最後用37求餘，獲得一個數字，這個數字就是這個key所對應的hash值。接下來要作的就是將value存放到hash值所對應的數組的存儲空間內。下面是咱們的HashTable類的主要實現代碼：

class HashTable {
    constructor () {
        this.table = [];
    }

    loseloseHashCode (key) { // 散列函數
        let hash = 0;
        for (let i = 0; i < key.length; i++) {
            hash += key.charCodeAt(i);
        }
        return hash % 37;
    }

    put (key, value) { // 將鍵值對存放到哈希表中
        let position = this.loseloseHashCode(key);
        console.log(`${position} - ${key}`);
        this.table[position] = value;
    }

    get (key) { // 經過key查找哈希表中的值
        return this.table[this.loseloseHashCode(key)];
    }

    remove (key) { // 經過key從哈希表中刪除對應的值
        this.table[this.loseloseHashCode(key)] = undefined;
    }

    isEmpty () { // 判斷哈希表是否爲空
        return this.size() === 0;
    }

    size () { // 返回哈希表的長度
        let count = 0;
        this.table.forEach(item => {
            if (item !== undefined) count++;
        });
        return count;
    }

    clear () { // 清空哈希表
        this.table = [];
    }
}

　　測試一下上面的這些方法：

let HashTable = require('./hashtable');

let hash = new HashTable();
hash.put('Gandalf', 'gandalf@email.com'); // 19 - Gandalf
hash.put('John', 'john@email.com'); // 29 - John
hash.put('Tyrion', 'tyrion@email.com'); // 16 - Tyrion

console.log(hash.isEmpty()); // false
console.log(hash.size()); // 3
console.log(hash.get('Gandalf')); // gandalf@email.com
console.log(hash.get('Loiane')); // undefined

hash.remove('Gandalf');
console.log(hash.get('Gandalf')); // undefined
hash.clear();
console.log(hash.size()); // 0
console.log(hash.isEmpty()); // true

　　爲了方便查看hash值和value的對應關係，咱們在put()方法中加入了一行console.log()，用來打印key的hash值和value之間的對應關係。能夠看到，測試的結果和前面咱們給出的示意圖是一致的。

　　散列集合的實現和散列表相似，只不過在散列集合中再也不使用鍵值對，而是隻有值沒有鍵。這個咱們在前面介紹集合和字典的時候已經講過了，這裏再也不贅述。

　　細心的同窗可能已經發現了，這裏咱們提供的散列函數可能過於簡單，以至於咱們沒法保證經過散列函數計算出來的hash值必定是惟一的。換句話說，傳入不一樣的key值，咱們有可能會獲得相同的hash值。嘗試一下下面這些keys：

let hash = new HashTable();
hash.put('Gandalf', 'gandalf@email.com');
hash.put('John', 'john@email.com');
hash.put('Tyrion', 'tyrion@email.com');
hash.put('Aaron', 'aaron@email.com');
hash.put('Donnie', 'donnie@email.com');
hash.put('Ana', 'ana@email.com');
hash.put('Jamie', 'jamie@email.com');
hash.put('Sue', 'sue@email.com');
hash.put('Mindy', 'mindy@email.com');
hash.put('Paul', 'paul@email.com');
hash.put('Nathan', 'nathan@email.com');

　　從結果中能夠看到，儘管有些keys不一樣，可是經過咱們提供的散列函數竟然獲得了相同的hash值，這顯然違背了咱們的設計原則。在哈希表中，這個叫作散列衝突，爲了獲得一個可靠的哈希表，咱們必須儘量地避免散列衝突。那如何避免這種衝突呢？這裏介紹兩種解決衝突的方法：分離連接和線性探查。

分離連接

　　所謂分離連接，就是將本來存儲在哈希表中的值改爲鏈表，這樣在哈希表的同一個位置上，就能夠存儲多個不一樣的值。鏈表中的每個元素，同時存儲了key和value。示意圖以下：

　　這樣，當不一樣的key經過散列函數計算出相同的hash值時，咱們只須要找到數組中對應的位置，而後往其中的鏈表添加新的節點便可，從而有效地避免了散列衝突。爲了實現這種數據結構，咱們須要定義一個新的輔助類ValuePair，它的內容以下：

let ValuePair = function (key, value) {
  this.key = key;
  this.value = value;

  this.toString = function () { // 提供toString()方法以方便咱們測試
      return `[${this.key} - ${this.value}]`;
  }
};

　　ValuePair類具備兩個屬性，key和value，用來保存咱們要存入到散列表中的元素的鍵值對。toString()方法在這裏不是必須的，該方法是爲了後面咱們方便測試。

　　新的採用了分離連接的HashTableSeparateChaining類能夠繼承自前面的HashTable類，完整的代碼以下：

class HashTableSeparateChaining extends HashTable {
    constructor () {
        super();
    }

    put (key, value) {
        let position = this.loseloseHashCode(key);

        if (this.table[position] === undefined) {
            this.table[position] = new LinkedList(); // 單向鏈表，須要引入LinkedList類
        }
        this.table[position].append(new ValuePair(key, value));
    }

    get (key) {
        let position = this.loseloseHashCode(key);

        if (this.table[position] !== undefined) {
            let current = this.table[position].getHead();
            while (current) {
                if (current.element.key === key) return current.element.value;
                current = current.next;
            }
        }
        return undefined;
    }

    remove (key) {
        let position = this.loseloseHashCode(key);
        let hash = this.table[position];

        if (hash !== undefined) {
            let current = hash.getHead();
            while (current) {
                if (current.element.key === key) {
                    hash.remove(current.element);
                    if (hash.isEmpty()) this.table[position] = undefined;
                    return true;
                }
                current = current.next;
            }
        }

        return false;
    }

    size () {
        let count = 0;
        this.table.forEach(item => {
            if (item !== undefined) count += item.size();
        });
        return count;
    }

    toString() {
        let objString = "";
        for (let i = 0; i < this.table.length; i++) {
            let ci = this.table[i];
            if (ci === undefined) continue;

            objString += `${i}: `;
            let current = ci.getHead();
            while (current) {
                objString += current.element.toString();
                current = current.next;
                if (current) objString += ', ';
            }
            objString += '\r\n';
        }
        return objString;
    }
}

　　其中的LinkedList類爲單向鏈表，具體內容能夠查看《JavaScript數據結構——鏈表的實現與應用》。注意，如今hash數組中的每個元素都是一個單向鏈表，單向鏈表的全部操做咱們能夠藉助於LinkedList類來完成。咱們重寫了size()方法，由於如今要計算的是數組中全部鏈表的長度總和。

　　下面是HashTableSeparateChaining類的測試用例及結果：

let hash = new HashTableSeparateChaining();

hash.put('Gandalf', 'gandalf@email.com');
hash.put('John', 'john@email.com');
hash.put('Tyrion', 'tyrion@email.com');
hash.put('Aaron', 'aaron@email.com');
hash.put('Donnie', 'donnie@email.com');
hash.put('Ana', 'ana@email.com');
hash.put('Jamie', 'jamie@email.com');
hash.put('Sue', 'sue@email.com');
hash.put('Mindy', 'mindy@email.com');
hash.put('Paul', 'paul@email.com');
hash.put('Nathan', 'nathan@email.com');

console.log(hash.toString());
console.log(`size: ${hash.size()}`);
console.log(hash.get('John'));

console.log(hash.remove('Ana'));
console.log(hash.remove('John'));
console.log(hash.toString());

　　能夠看到，結果和上面示意圖上給出的是一致的，size()、remove()和get()方法的執行結果也符合預期。

線性探查

　　避免散列衝突的另外一種方法是線性探查。當向哈希數組中添加某一個新元素時，若是該位置上已經有數據了，就繼續嘗試下一個位置，直到對應的位置上沒有數據時，就在該位置上添加數據。咱們將上面的例子改爲線性探查的方式，存儲結果以下圖所示：

　　如今咱們不須要單向鏈表LinkedList類了，可是ValuePair類仍然是須要的。一樣的，咱們的HashTableLinearProbing類繼承自HashTable類，完整的代碼以下：

class HashTableLinearProbing extends HashTable {
    constructor () {
        super();
    }

    put (key, value) {
        let position = this.loseloseHashCode(key);

        if (this.table[position] === undefined) {
            this.table[position] = new ValuePair(key, value);
        }
        else {
            let index = position + 1;
            while (this.table[index] !== undefined) {
                index ++;
            }
            this.table[index] = new ValuePair(key, value);
        }
    }

    get (key) {
        let position = this.loseloseHashCode(key);

        if (this.table[position] !== undefined) {
            if (this.table[position].key === key) return this.table[position].value;
            let index = position + 1;
            while (this.table[index] !== undefined && this.table[index].key === key) {
                index ++;
            }
            return this.table[index].value;
        }
        return undefined;
    }

    remove (key) {
        let position = this.loseloseHashCode(key);

        if (this.table[position] !== undefined) {
            if (this.table[position].key === key) {
                this.table[position] = undefined;
                return true;
            }
            let index = position + 1;
            while (this.table[index] !== undefined && this.table[index].key !== key) {
                index ++;
            }
            this.table[index] = undefined;
            return true;
        }
        return false;
    }

    toString() {
        let objString = "";
        for (let i = 0; i < this.table.length; i++) {
            let ci = this.table[i];
            if (ci === undefined) continue;

            objString += `${i}: ${ci}\r\n`;
        }
        return objString;
    }
}

　　使用上面和HashTableSeparateChaining類相同的測試用例，咱們來看看測試結果：

　　能夠和HashTableSeparateChaining類的測試結果比較一下，多出來的位置六、1四、1七、33，正是HashTableSeparateChaining類中每個鏈表的剩餘部分。get()和remove()方法也能正常工做，咱們不須要重寫size()方法，和基類HashTable中同樣，hash數組中每個位置只保存了一個元素。另外一個要注意的地方是，因爲JavaScript中定義數組時不須要提早給出數組的長度，所以咱們能夠很容易地利用JavaScript語言的這一特性來實現線性探查。在某些編程語言中，數組的定義是必須明確給出長度的，這時咱們就須要從新考慮咱們的HashLinearProbing類的實現了。

　　loseloseHashCode()散列函數並非一個表現良好的散列函數，正如你所看到的，它會很輕易地產生散列衝突。一個表現良好的散列函數必須可以儘量低地減小散列衝突，並提升性能。咱們能夠在網上找一些不一樣的散列函數的實現方法，下面是一個比loseloseHashCode()更好的散列函數djb2HashCode()：

djb2HashCode (key) {
    let hash = 5381;
    for (let i = 0; i < key.length; i++) {
        hash = hash * 33 + key.charCodeAt(i);
    }
    return hash % 1013;
}

　　咱們用相同的測試用例來測試dj2HashCode()，下面是測試結果：

　　此次沒有衝突！然而這並非最好的散列函數，但它是社區最推崇的散列函數之一。

　　下一章咱們將介紹如何用JavaScript來實現樹。