九章算法筆記 8.哈希表與堆 Hash & Heap

時間 2019-11-11

標籤九章算法筆記哈希 hash heap 欄目 Java 简体版

原文原文鏈接

大綱 cs3k.com

數據結構概述
哈希表 Hash： a.原理 b.應用
堆 Heap： a.原理 b.應用-優先隊列 Priority Queue c.替代品-TreeMap

數據結構的兩類問題

cs3k.com

1.設計一個數據結構html

2.實現某個算法用到了某個/某幾個數據結構java

什麼是數據結構

能夠認爲是一個集合，而且提供集合上的若干操做node

LINEAR DATA STRUCTURE,一般用數組實現

-Queue
 -Stack
 -Hash

TREE DATA STRUCTURE,一般用指針

-Tree

QUEUE : BFS

O(1) Push Pop Topc++

STACK : DFS

O(1) Push Pop Top面試

題外話

算法要具象化，數據結構也要具象化算法

棧好像一個大箱子，往裏面一本本放書，拿的時候得從最上面的拿。express

queue就是排隊，從後面進，從前面出。數組

Queue的實現

用哪一種底層的數據結構實現Queue呢？安全

用linked list的實現很是直觀
循環數組和動態數組：數據結構

2.1 循環數組1 2 3 4…10十個坑, 1不用了, 把1 刪了,而後加11進去, 11佔得是1的坑, 每一個坑能夠循環利用.

2.2 動態數組就是c++裏的vector java 裏的array list

開一百個坑, 用滿了

而後開2*100個, 把前100個copy過去, 再把前100個刪掉.

Hash

cs3k.com

時間複雜度

O(key_size) Insert / O(key_size) Find / O(key_size) Delete

好比key一個整數, 四個字節

實際的插入, 查找, 刪除的時間複雜度是O(4)

hash table VS hash map VS hash set

hash set只有key 沒有value, 去重的時候用
hash table支持線程安全，能夠多個線程同事調用一個hash table，不會出問題
hash map不支持線程安全，多個線程一塊兒搞一個hash map會搞砸ps：由於加鎖和解鎖很慢，因此hash table會性能低一些

hash function/ hash code

使命: 對於任意的key,獲得一個固定且無規律的介於0~capacity-1的整數

理解: hash map能夠理解爲一個大數組, hash function 就是找到這個數組的index, 而後把一對存進去

著名的hash 算法

cs3k.com

MD5 SHA-1 SHA2 太複雜, 加密用的,此外

char->255 整數
最簡單的是取模，好比key%31轉換爲31進制, 31爲經驗值

• 邊乘邊取模, 以防越界

• java和c++都會自動把越界的減掉

通常hash function是針對string即char的，由於其它的數據形式均可以轉化成char

好比int是4byte的，就是4個char

double是8byte的，就是8個char

若是一個class是{2int加上1double}就能夠等同一個8+8的string

貌似好像java下面是每一個字節×33+字節對的整數取模，其實也就是轉換成33進制，再取模

hash function的設計要求是：越亂越好，越沒有規律越好

可是若是有一列數101,201,301,401。。那就坑爹了。。。

hash function的collision

cs3k.com

open hash table：有collision就存個linked list，拉鍊法相似於上廁所的時候, 看上了一個坑, 就等它, 就在後面排着若是要查找的時候, 就從排着的地方for一遍, 看有沒有
close hash table：有collision就佔下一個坑，佔坑法相似於上廁所佔坑, 坑佔了就找下一個, 你佔我, 我就佔別人的

其中須要注意的是close hash，在刪除一個key以後，要標註可用，而不是空位，具體以下：

加入7,3,12三個數字到一個hash function爲%5的table裏，假設前面一小部分以下：

其中7加到index爲2的，3加到index爲3的，到12的時候，算出來的index是2，可是2已經被佔了，因此向後挪一個，去看看3，結果3也被佔了，因此12就被塞到了index爲4的地方

當刪除3的時候，不能直接把index爲3的位置直接標空位，而應該標available，這樣查詢12的時候，會去2找，沒有去看3，發現available，知道以前被佔過，而後接着向後找

rehashing

open hashing 和close hashing 都要rehashing

空間大VS空間小

空間大非空間

空間小查找時間長

因此trade off一下

Rehashing

The size of the hash table is not determinate at the very beginning. If the total size of keys is too large (e.g. size >= capacity / 10), we should double the size of the hash table and rehash every keys.

public class Solution { /** * @param hashTable: A list of The first node of linked list * @return: A list of The first node of linked list which have twice size */ public ListNode[] rehashing(ListNode[] hashTable) { // write your code here if (hashTable.length <= 0) { return hashTable; } int newcapacity = 2 * hashTable.length; ListNode[] newTable = new ListNode[newcapacity]; for (int i = 0; i < hashTable.length; i++) { while (hashTable[i] != null) { int newindex = (hashTable[i].val % newcapacity + newcapacity) % newcapacity; if (newTable[newindex] == null) { newTable[newindex] = new ListNode(hashTable[i].val); // newTable[newindex].next = null; } else { ListNode dummy = newTable[newindex]; while (dummy.next != null) { dummy = dummy.next; } dummy.next = new ListNode(hashTable[i].val); } hashTable[i] = hashTable[i].next; } } return newTable; } }

標準：佔了超過10%就要rehashing

size是實際被佔的，若是實際被佔的空間超過十分之一，衝突率過高。若是數組須要開的更大，就須要開一個更大的數組，而且把原來的小的copy過去，相似於動態數組，可是不少時候hash function就會變，因此最好不要輕易折騰，舉個栗子：

原本有[4,1,2,3]四個數，其中他們的位置按照%4獲得的

咱們要擴充數組到八個坑，咱們開八個坑，要把1,2,3,4挪過去。可是此次1,2,3,4要根據%8來找他們的位置，而不是直接copy過去，因此增長了很多計算量

哈希重建

由於哈希表只膨脹，不收縮，因此對因而不是加一個又刪一個的操做，就要偶爾destroy了再重建一個

LRU cache 和LFU cache

cs3k.com

cache的原理就是比較hot的條目放速度快的地方存着（內存），不hot的放速度慢的（硬盤），評價hot與否的原則是：

LRU： last recent used 時間戳，坑不夠，淘汰最老的（此外還有LFU： last frequent used，不要求掌握）

假設一個LRU cache只有三個坑，最近的是

2->1->3

咱們如今出現一個新的使用是2 咱們要變成

1->3->2

出現了5，變成

3->2->5

LRU中，由於有衝突，因此須要鏈表，而給一個key，須要的知道鏈表在哪兒

因此實現方法爲 linked list+ hashmap

LRU Cache

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get and set.

get(key) – Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.

set(key, value) – Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

這個其中set和get都是visit

用doubly linked list實現:

在2日後挪的時候, hash不變, 可是1和3受影響, 因此可用doubly linked list

用singly linked list實現:

每一個key對應的value的值是prev的點

2挪到尾巴, 1.next = 1.next.next就好了

public class LRUCache { private class Node{ Node prev; Node next; int key; int value; public Node(int key, int value) { this.key = key; this.value = value; this.prev = null; this.next = null; } } private int capacity; private HashMap hs = new HashMap(); private Node head = new Node(-1, -1); private Node tail = new Node(-1, -1); public LRUCache(int capacity) { this.capacity = capacity; tail.prev = head; head.next = tail; } public int get(int key) { if( !hs.containsKey(key)) { return -1; } // remove current Node current = hs.get(key); current.prev.next = current.next; current.next.prev = current.prev; // move current to tail move_to_tail(current); return hs.get(key).value; } public void set(int key, int value) { if( get(key) != -1) { hs.get(key).value = value; return; } if (hs.size() == capacity) { hs.remove(head.next.key); head.next = head.next.next; head.next.prev = head; } Node insert = new Node(key, value); hs.put(key, insert); move_to_tail(insert); } private void move_to_tail(Node current) { current.prev = tail.prev; tail.prev = current; current.prev.next = current; current.next = tail; } }

heap

O(log N) Add / O(log N) Remove / O(1) Min or Max

用於設計最大最小值的問題

priority queue是一個閹割版的heap, 叫作queue，實際上是heap（只實現了部分heap的功能），每次優先級最高的出列

只能add一個和remove一個, 刪除是O(n)

算法時間複雜度相關

longest palindrome substring標準算法是叫manche algorithm O(n), 另有基於它的O(nlogn)算法, 可是面試寫出O(n^2)就能夠

Ugly Number

Ugly number is a number that only have factors 2, 3 and 5.

Design an algorithm to find the nth ugly number. The first 10 ugly numbers are 1, 2, 3, 4, 5, 6, 8, 9, 10, 12…

// version 1: O(n) scan class Solution { /** * @param n an integer * @return the nth prime number as description. */ public int nthUglyNumber(int n) { List uglys = new ArrayList(); uglys.add(1); int p2 = 0, p3 = 0, p5 = 0; // p2, p3 & p5 share the same queue: uglys for (int i = 1; i < n; i++) { int lastNumber = uglys.get(i - 1); while (uglys.get(p2) * 2 <= lastNumber) p2++; while (uglys.get(p3) * 3 <= lastNumber) p3++; while (uglys.get(p5) * 5 <= lastNumber) p5++; uglys.add(Math.min( Math.min(uglys.get(p2) * 2, uglys.get(p3) * 3), uglys.get(p5) * 5 )); } return uglys.get(n - 1); } }; // version 2 O(nlogn) HashMap + Heap class Solution { /** * @param n an integer * @return the nth prime number as description. */ public int nthUglyNumber(int n) { // Write your code here Queue Q = new PriorityQueue(); HashSet inQ = new HashSet(); Long[] primes = new Long[3]; primes[0] = Long.valueOf(2); primes[1] = Long.valueOf(3); primes[2] = Long.valueOf(5); for (int i = 0; i < 3; i++) { Q.add(primes[i]); inQ.add(primes[i]); } Long number = Long.valueOf(1); for (int i = 1; i < n; i++) { number = Q.poll(); for (int j = 0; j < 3; j++) { if (!inQ.contains(primes[j] * number)) { Q.add(number * primes[j]); inQ.add(number * primes[j]); } } } return number.intValue(); } };

Top k Largest Numbers II

Implement a data structure, provide two interfaces:

add(number). Add a new number in the data structure.

topk(). Return the top k largest numbers in this data structure. k is given when we create the data structure.

top之間比最弱, 用min heap.

add O(logk)

topk O(klogk)

可是kth largest number 用quick select O(n)

何時用QUICK SELECT 何時用HEAP呢

heap是Nlogk, 時長要知道前k個是誰，是流動的活數據

而quick sort是O(N), 找從小到大第k個, 是離線的死數據，一次行的

public class Solution { private int maxSize; private Queue minheap; public Solution(int k) { minheap = new PriorityQueue(); maxSize = k; } public void add(int num) { if (minheap.size() < maxSize) { minheap.offer(num); return; } if (num > minheap.peek()) { minheap.poll(); minheap.offer(num); } } public List topk() { Iterator it = minheap.iterator(); List result = new ArrayList(); while (it.hasNext()) { result.add((Integer) it.next()); } Collections.sort(result, Collections.reverseOrder()); return result; } };

Merge k Sorted Lists

cs3k.com

Merge k sorted linked lists and return it as one sorted list.

相似的題有external sorting

我只有1G內存，可是要排序4G的數組

就分4個1G的分別排好，再合併

1. k路歸併算法, 用heap

經典實現用heap，時間O(Nlogk), 誰小誰出列， k個數找最小，用heap

2. 用priority queue 的實現

重點是priority queue的comparator的實現

從小到大是第一個參數a減第二個參數b

從大到小是第二個參數減第一個參數

第一個參數減第二個參數爲何是從小到大呢？首先咱們看定義

Syntax:

In their implementation in the C++ Standard Template Library, priority queues take three template parameters:1
2 template < class T, class Container = vector<T>,
class Compare = less<typename Container::value_type> > class priority_queue;
Where the template parameters have the following meanings:
T: Type of the elements.
Container: Type of the underlying container object used to store and access the elements.
Compare: Comparison class: A class such that the expression comp(a,b), where comp is an object of this class and a and b are elements of the container, returns true if a is to be placed earlier than b in a strict weak ordering operation. This can either be a class implementing a function call operator or a pointer to a function. This defaults to less<T>, which returns the same as applying the less-than operator (a<b).
The priority_queue object uses this expression when an element is inserted or removed from it (using push or pop, respectively) to grant that the element popped is always the greater in the priority queue.

參考定義呢，comparator爲真的時候，就是a-b>0, a優先級高，先出列，a而後b，這不是由大到小麼？反了啊。。。此處若有知道爲何，請指教

答案找到了，能夠參考：http://www.cnblogs.com/cielosun/p/5654595.html，如下爲轉載：

首先函數在頭文件<queue>中，歸屬於命名空間std，使用的時候須要注意。

隊列有兩種經常使用的聲明方式：

std::priority_queue<T> pq;
std::priority_queue<T, std::vector<T>, cmp> pq;

第一種實現方式較爲經常使用，接下來我給出STL中的對應聲明，再加以解釋。

template<class _Ty,
    class _Container = vector<_Ty>,
    class _Pr = less<typename _Container::value_type> >
    class priority_queue

你們能夠看到，默認模板有三個參數，第一個是優先隊列處理的類，第二個參數比較有特色，是容納優先隊列的容器。實際上，優先隊列是由這個容器+C語言中關於heap的相關操做實現的。這個容器默認是vector，也能夠是dequeue，由於後者功能更強大，而性能相對於vector較差，考慮到包裝在優先隊列後，後者功能並不能很好發揮，因此通常選擇vector來作這個容器。第三個參數比較重要，支持一個比較結構，默認是less，默認狀況下，會選擇第一個參數決定的類的<運算符來作這個比較函數。

接下來開始坑爹了，雖然用的是less結構，然而，隊列的出隊順序倒是greater的先出！就是說，這裏這個參數其實很傲嬌，表示的意思是若是!cmp，則先出列，無論這樣實現的目的是啥，你們只能接受這個實現。實際上，這裏的第三個參數能夠更換成greater，像下面這樣：

std::priority_queue<T, std::vector<T>, greater<T>> pq;

通常你們若是是自定義類就乾脆重載<號時注意下方向了，沒人在這裏麻煩，這個選擇基本上是在使用int類還想小值先出列時。

從上面的剖析咱們也就知道了，想要讓自定義類可以使用優先隊列，咱們要重載小於號。

class Student
{
    int id;
    char name[20];
    bool gender;
    bool operator < (Student &a) const
    {
        return id > a.id;
    }
};

就拿這個例子說，咱們想讓id小的先出列，怎麼辦，就要很違和的給這個小於符號重載成其實是大於的定義。

若是咱們不使用自定義類，又要用非默認方法去排序怎麼辦？就好比說在Dijkstra中，咱們固然不會用點的序號去排列，不管是正序仍是反序，咱們想用點到起點的距離這個值來進行排序，咱們怎樣作呢？細心的讀者在閱讀個人有關Dijkstra那篇文章時應該就發現了作法——自定義比較結構。優先隊列默認使用的是小於結構，而上文的作法是爲咱們的自定義類去定義新的小於結構來符合優先隊列，咱們固然也能夠自定義比較結構。自定義方法以及使用以下，我直接用Dijkstra那篇的代碼來講明：

int cost[MAX_V][MAX_V];
int d[MAX_V], V, s;
//自定義優先隊列less比較函數
struct cmp
{
    bool operator()(int &a, int &b) const
    {
        //由於優先出列斷定爲!cmp，因此反向定義實現最小值優先
        return d[a] > d[b];
    }
};
void Dijkstra()
{
    std::priority_queue<int, std::vector<int>, cmp> pq;
    pq.push(s);
    d[s] = 0;
    while (!pq.empty()) { int tmp = pq.top();pq.pop(); for (int i = 0;i < V;++i) { if (d[i] > d[tmp] + cost[tmp][i]) { d[i] = d[tmp] + cost[tmp][i]; pq.push(i); } } } }

http://www.cnblogs.com/cielosun/p/5654595.html轉載結束。

同時推薦http://www.cnblogs.com/cielosun/p/6958802.html，是stack，queue和priority_queue的c++操做集合

public class Solution { private Comparator ListNodeComparator = new Comparator() { public int compare(ListNode left, ListNode right) { return left.val - right.val; } }; public ListNode mergeKLists(List lists) { if (lists == null || lists.size() == 0) { return null; } Queue heap = new PriorityQueue(lists.size(), ListNodeComparator); for (int i = 0; i < lists.size(); i++) { if (lists.get(i) != null) { heap.add(lists.get(i)); } } ListNode dummy = new ListNode(0); ListNode tail = dummy; while (!heap.isEmpty()) { ListNode head = heap.poll(); tail.next = head; tail = head; if (head.next != null) { heap.add(head.next); } } return dummy.next; } }

3.分治法

k個的時候劈一半

一半1~k/2 一半2/k+1~k

每層用時間O(N), 一共logk層

public class Solution { /** * @param lists: a list of ListNode * @return: The head of one sorted list. */ public ListNode mergeKLists(List lists) { if (lists.size() == 0) { return null; } return mergeHelper(lists, 0, lists.size() - 1); } private ListNode mergeHelper(List lists, int start, int end) { if (start == end) { return lists.get(start); } int mid = start + (end - start) / 2; ListNode left = mergeHelper(lists, start, mid); ListNode right = mergeHelper(lists, mid + 1, end); return mergeTwoLists(left, right); } private ListNode mergeTwoLists(ListNode list1, ListNode list2) { ListNode dummy = new ListNode(0); ListNode tail = dummy; while (list1 != null && list2 != null) { if (list1.val < list2.val) { tail.next = list1; tail = list1; list1 = list1.next; } else { tail.next = list2; tail = list2; list2 = list2.next; } } if (list1 != null) { tail.next = list1; } else { tail.next = list2; } return dummy.next; } }

4.兩兩歸併

本質和上面一個差很少

public class Solution { /** * @param lists: a list of ListNode * @return: The head of one sorted list. */ public ListNode mergeKLists(List lists) { if (lists == null || lists.size() == 0) { return null; } while (lists.size() > 1) { List new_lists = new ArrayList(); for (int i = 0; i + 1 < lists.size(); i += 2) { ListNode merged_list = merge(lists.get(i), lists.get(i+1)); new_lists.add(merged_list); } if (lists.size() % 2 == 1) { new_lists.add(lists.get(lists.size() - 1)); } lists = new_lists; } return lists.get(0); } private ListNode merge(ListNode a, ListNode b) { ListNode dummy = new ListNode(0); ListNode tail = dummy; while (a != null && b != null) { if (a.val < b.val) { tail.next = a; a = a.next; } else { tail.next = b; b = b.next; } tail = tail.next; } if (a != null) { tail.next = a; } else { tail.next = b; } return dummy.next; } }

heap是個最優二叉樹, 有如下兩個特性

1.結構特性：假設一個二叉樹的深度爲n。爲了知足徹底二叉樹的要求，該二叉樹的前n-1層必須填滿，第n層也必須按照從左到右的順序被填滿。即二叉樹嚴格遵循從上到下，再從左到右的方式構造

2.值特性：最大或最小的關係

若是是min heap，則父親要小於全部的兒子。

max heap，是父親要大於全部的兒子。

注意各個兒子之間沒有大小關係, 左兒子可能比右兒子大, 也可能小.

如下插入和刪除的操做來源於其它博客，侵刪。

——————————————————————

插入 add – O(logn)時間：

cs3k.com

在插入操做的時候，會破壞上述堆的性質，因此須要進行名爲sift up的操做，以進行恢復。

加入一個點放第一個能放的位置，即最下一層最左的空位。
不斷和父節點進行比較，直到比父節點小。sift up 操做：若是new節點比父節點小，那麼交換二者。交換以後，繼續和新的父節點比較…… 直到new節點不比父節點小，或者new節點成爲根節點。

咱們插入節點2:

pop() – O(logn)時間：

根節點和最下面一層最右的節點換, 而後刪瞭如今的最右下的
新的根節點不斷和本身的兒子最小的那個比較，而後和兒子中最小的那個換。直到last節點不大於任一子節點，或者last節點成爲葉節點。

刪除操做只能刪除根節點。

sift down: 將節點不斷的和子節點比較。若是節點比兩個子節點中小的那一個大，則和該子節點交換。直到last節點不大於任一子節點，或者last節點成爲葉節點。

刪除根節點1。如圖:

——————————————————————

當咱們插入或者刪除結點的時候，就是一路換大或者換小，最多換logN次，因此插入或者刪除操做的時間複雜度都是O（logn）

delete() / remove()任意節點 – O(logn)時間：

priority queue因爲是閹割版的緣由，刪除任意節點的時間是O(n), 它只能for一遍，而後刪

heap的任意節點的刪除操做是O(logn)時間：

用hash map找到這個點，並把這個點和最下一層的最右互換
刪了右下的點，而後換過去的和父節點進行比較 >= 父節點， sift up ; < 父節點， sift down

此外：

hasp map查找一個節點的位置，構造hash map <key節點的值， value節點在堆裏的位置. 因此能夠要求堆必須不能有重複數字
堆得節點數固定，則堆得形狀固定，用數組就能夠儲存這個堆：數組第0位存堆的大小，好比一個大小爲10的存5個節點的堆：

    Array    0   1   2   3   4   5   6   7   8   9

             5   1  2   3   4   5

對於一個下標位k的節點：

父節點下標 k/2
左兒子下標 2k
右兒子下標 2k+1

Tree Map

又叫red black tree / balanced binary tree, 全部操做logn

最小一路往左, logn

最大一路往右, logn

priority queque

適合用來解決data stream median 的問題：

Data Stream Median

Numbers keep coming, return the median of numbers at every time a new number added.

Clarification

What’s the definition of Median?

Median is the number that in the middle of a sorted array. If there are n numbers in a sorted array A, the median is A[(n – 1) / 2]. For example, if A=[1,2,3], median is 2. If A=[1,19], median is 1.

Example

For numbers coming list: [1, 2, 3, 4, 5], return [1, 1, 2, 2, 3].

For numbers coming list: [4, 5, 1, 3, 2, 6, 0], return [4, 4, 4, 3, 3, 3, 3].

For numbers coming list: [2, 20, 100], return [2, 2, 20].

用兩個堆, max heap 和 min heap. 維持兩個堆的大小相等(max堆能夠比min堆多一個). 則max堆的頂即爲median值.

Min Stack

cs3k.com

Implement a stack with min() function, which will return the smallest number in the stack.

It should support push, pop and min operation all in O(1) cost.

Notice

min operation will never be called if there is no number in the stack.

Example

Implement Queue by Two Stacks

As the title described, you should only use two stacks to implement a queue’s actions.

The queue should support push(element), pop() and top() where pop is pop the first(a.k.a front) element in the queue.

Both pop and top methods should return the value of first element.

Example

push(1)

pop() // return 1

push(2)

push(3)

top() // return 2

pop() // return 2

準備兩個stack，stack1和stack2

放stack1裏放正了，再倒到stack2裏就倒過來了，push就push倒stack1裏，pop要從stack2裏pop

Largest Rectangle in Histogram

cs3k.com

Given n non-negative integers representing the histogram’s bar height where the width of each bar is 1, find the area of largest rectangle in the histogram.

Above is a histogram where width of each bar is 1, given height = [2,1,5,6,2,3].

The largest rectangle is shown in the shaded area, which has area = 10 unit.

Example