[Algorithm] Beating the Binary Search algorithm – Interpolation Search, Galloping Search

時間 2019-11-18

標籤 algorithm beating binary search interpolation galloping 简体版

原文原文鏈接

本篇補充了Python版本的實現 Sep 2019.

順序查找

若是是」隨機放置「，則使用。html

# Python

def sequentialSearch(alist, item):
　　pos = 0
　　found = False

　　while pos < len(alist) and not Found: 
　　　　if alist[pos] == item:
　　　　　　found = True
　　　　else: 
　　　　　　pos = pos + 1

　　return found

二分檢索

代碼可寫成：遞歸，也能夠是非遞歸。java

# Python

def binarySearch(alist, item):
    if len(alist) == 0:
        return False
    else:
        midpoint = len(alist)//2
        if alist[midpoint] == item:
            return True
        else:
            # 不斷調整 "中間值" 便可 if item<alist[midpoint]:
                return binarySearch(alist[:midpoint], item)
            else:
                return binarySearch(alist[midpoint+1:], item)

testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42,]
print(binarySearch(testlist, 3))
print(binarySearch(testlist, 13))




def binarySearch2(alist, item):
    first = 0
    last = len(alist)-1
    found = False

    while first < last and not found:
        midpoint = (first + last) // 2 
        if alist[midpoint] == item:
            found = True
        else:
            # 不斷調整 "中間值" 便可 if item < alist[midpoint]:
                last = midpoint - 1
            else:
                first = midpoint + 1

    return found

testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42,]
print(binarySearch2(testlist, 3))
print(binarySearch2(testlist, 13))

渴望對手

From: http://blog.jobbole.com/73517/git

二分檢索是查找有序數組最簡單然而最有效的算法之一。如今的問題是，更復雜的算法能不能作的更好？github

有些狀況下，散列整個數據集是不可行的，或者要求既查找位置，又查找數據自己。這個時候，用哈希表就不能實現O(1)的運行時間了。但對有序數組，採用分治法一般能夠實現O(log(n))的最壞運行時間。算法

在下結論前，有一點值得注意，那就是能夠從不少方面「擊敗」一個算法：所需的空間，所需的運行時間，對底層數據結構的訪問需求。接下來咱們作一個運行時對比實驗，實驗中建立多個不一樣的隨機數組，其元素個數均在10,000到81,920,000之間，元素均爲4字節整型數據。數組

非緩存友好

二分檢索算法的每一步，搜索空間總會減半，所以保證了運行時間。在數組中查找一個特定元素，能夠保證在 O(log(n))時間內完成，並且若是找的正好是中間元素就更快了。也就是說，要從81,920,000個元素的數組中找某個元素的位置，只須要27個甚至更少的迭代。緩存

因爲二分檢索的隨機跳躍性，該算法並非緩存友好的，所以只要搜索空間小於特定值（64或者更少），一些微調的二分檢索算法就會切換回線性檢索繼續查找。然而，這個最終的空間值是極其架構相關的，所以大部分框架都沒有作這個優化。數據結構

快速檢索

也叫做 "飛馳檢索": http://www.cnblogs.com/jesse123/p/6026029.html架構

第一步

若是因爲某些緣由，數組長度未知，快速檢索能夠識別初始的搜索域。這個算法從第一個元素開始，一直加倍搜索域的上界，直到這個上界已經大於待查關鍵字。框架

第二步

以後，根據實現不一樣，

- 或者採用標準的二分檢索查找，保證O(log(n)) 的運行時間
- 或者開始另外一輪的快速檢索。更接近O(n)的運行時間。

若是咱們要找的元素比較接近數組的開頭，快速檢索就很是有效。

抽樣檢索

抽樣檢索有點相似二分檢索，不過在肯定主要搜索區域以前，它會先從數組中拿幾個樣例。最後，若是範圍足夠小，就採用標準的二分檢索肯定待查元素的準確位置。這個理論頗有趣，不過在實踐中執行效果並很差。

插值檢索

優化一，按照比例找下一個分界點

對於插值查找，就是對於二分查找的優化，將二分查找中的 mid = (low + high) / 2 ----> 改成 ----> mid = low + (high - low) * (key - a[low]) / (a[high] - a[low])。

插值查找是根據查找關鍵子key與查找表中最大最小記錄關鍵字比較後的查找方法，核心在於插值計算公式(key-a[low])/(a[high] - a[low])。

最後也能夠迴歸到順序查找的插值檢索。

在被測的算法中，插值檢索能夠說是「最聰明」的一個算法。它相似於人類使用電話簿的方法，它試圖經過假設元素在數組中均勻分佈，來猜想元素的位置。

首先，它抽樣選擇出搜索空間的開頭和結尾，而後猜想元素的位置。算法一直重複這個步驟，直到找到元素。

- 若是猜想是準確的，比較的次數大概是O(log(log(n))，運行時間大概是O(log(n))；
- 但若是猜想的不對，運行時間就會是O(n)了。

優化二，Interpolation + Seq

插值檢索的一個改進版本是，只要可推測咱們猜想的元素位置是接近最終位置的，就開始執行順序查找。

相比二分檢索，插值檢索的每次迭代計算代價都很高，所以在最後一步採用順序查找，無需猜想元素位置的複雜計算，很容易就能夠從很小的區域（大概10個元素）中找到最終的元素位置。

圍繞插值檢索的一大疑問就是，O(log(log(n))的比較次數可能產生O(log(log(n))的運行時間。這並不是個案，由於存儲訪問時間和計算下一次猜想的CPU時間相比，這二者之間要有所權衡。

大數據查找優點

若是數據量很大，並且存儲訪問時間也很顯著，好比在一個實際的硬盤上，插值檢索輕鬆擊敗二分檢索。然而，實驗代表，若是訪問時間很短，好比說RAM，插值檢索可能不會產生任何好處。

試驗結果

每次檢索的統計

試驗中的源代碼都是用Java寫的；每一個實驗在相同的數組上運行10次；數組是隨機產生的整型數組，存儲在內存中。

在插值檢索中，首先會採用抽樣檢索，從檢索空間拿20個樣例，以肯定接下來的搜索域。若是假定的域只有10個或更少的元素，就開始採用線性檢索。另外，若是這個搜索域元素個數小於2000，就回退到標準的二分檢索了。

做爲參考，java默認的Arrays.binarySearch算法也被加入實驗，以同自定義的算法對比運行時間。

儘管咱們對插值檢索指望很高，它的實際運行時間並未擊敗java默認的二分檢索算法。若是存儲訪問時間長，結合採用某些類型的哈希樹和B+樹多是一個更好的選擇。

但值得注意的是，對均勻分佈的數組，組合使用插值檢索和順序檢索在比較次數上總能賽過二分檢索。不過平臺的二分檢索已經很高效，因此不少狀況下，可能不須要用更復雜的算法來代替它。

平均運行時間 / 每次檢索

Size	Arrays. binarySearch	Interpolation +Seq	Interpolation	Sampling	Binary	Gallop	Gallop +Binary
10,000	1.50E-04 ms	1.60E-04 ms	2.50E-04 ms	3.20E-04 ms	5.00E-05 ms	1.50E-04 ms	1.00E-04 ms
20,000	5.00E-05 ms	5.50E-05 ms	1.05E-04 ms	2.35E-04 ms	7.00E-05 ms	1.15E-04 ms	6.50E-05 ms
40,000	4.75E-05 ms	5.00E-05 ms	9.00E-05 ms	1.30E-04 ms	5.25E-05 ms	1.33E-04 ms	8.75E-05 ms
80,000	4.88E-05 ms	5.88E-05 ms	9.88E-05 ms	1.95E-04 ms	6.38E-05 ms	1.53E-04 ms	9.00E-05 ms
160,000	5.25E-05 ms	5.94E-05 ms	1.01E-04 ms	2.53E-04 ms	6.56E-05 ms	1.81E-04 ms	9.38E-05 ms
320,000	5.16E-05 ms	6.13E-05 ms	1.22E-04 ms	2.19E-04 ms	6.31E-05 ms	2.45E-04 ms	1.04E-04 ms
640,000	5.30E-05 ms	6.06E-05 ms	9.61E-05 ms	2.12E-04 ms	7.27E-05 ms	2.31E-04 ms	1.16E-04 ms
1,280,000	5.39E-05 ms	6.06E-05 ms	9.72E-05 ms	2.59E-04 ms	7.52E-05 ms	2.72E-04 ms	1.18E-04 ms
2,560,000	5.53E-05 ms	6.40E-05 ms	1.11E-04 ms	2.57E-04 ms	7.37E-05 ms	2.75E-04 ms	1.05E-04 ms
5,120,000	5.53E-05 ms	6.30E-05 ms	1.26E-04 ms	2.69E-04 ms	7.66E-05 ms	3.32E-04 ms	1.18E-04 ms
10,240,000	5.66E-05 ms	6.59E-05 ms	1.22E-04 ms	2.92E-04 ms	8.07E-05 ms	4.27E-04 ms	1.42E-04 ms
20,480,000	5.95E-05 ms	6.54E-05 ms	1.18E-04 ms	3.50E-04 ms	8.31E-05 ms	4.88E-04 ms	1.49E-04 ms
40,960,000	5.87E-05 ms	6.58E-05 ms	1.15E-04 ms	3.76E-04 ms	8.59E-05 ms	5.72E-04 ms	1.75E-04 ms
81,920,000	6.75E-05 ms	6.83E-05 ms	1.04E-04 ms	3.86E-04 ms	8.66E-05 ms	6.89E-04 ms	2.15E-04 ms

平均比較次數 / 每次檢索

Size	Arrays. binarySearch	Interpolation +Seq	Interpolation	Sampling	Binary	Gallop	Gallop +Binary
10,000	?	10.6	17.6	19.0	12.2	58.2	13.2
20,000	?	11.3	20.7	19.0	13.2	66.3	14.2
40,000	?	11.0	16.9	20.9	14.2	74.9	15.2
80,000	?	12.1	19.9	38.0	15.2	84.0	16.2
160,000	?	11.7	18.3	38.0	16.2	93.6	17.2
320,000	?	12.4	25.3	38.2	17.2	103.8	18.2
640,000	?	12.4	19.0	41.6	18.2	114.4	19.2
1,280,000	?	12.5	20.2	57.0	19.2	125.5	20.2
2,560,000	?	12.8	22.7	57.0	20.2	137.1	21.2
5,120,000	?	12.7	26.5	57.5	21.2	149.2	22.2
10,240,000	?	13.2	25.2	62.1	22.2	161.8	23.2
20,480,000	?	13.4	23.4	76.0	23.2	175.0	24.2
40,960,000	?	13.4	21.9	76.1	24.2	188.6	25.2
81,920,000	?	14.0	19.7	77.0	25.2	202.7	26.2

源代碼點此獲取檢索算法的完整源代碼。注意，代碼不是產品級別的；好比，在某些例子裏，可能有過多或過少的範圍檢查。

1. Beam Search Algorithm
2. A* Search Algorithm
3. overview search algorithm
4. [Algorithm] Hashing for search
5. Boyer–Moore string search algorithm（java）
6. An Industrial-Strength Audio Search Algorithm
7. Binary Search Tree
8. Binary Search
9. Lintcode: Search Range in Binary Search Tree
10. stl binary search
更多相關文章...
• Docker search 命令 - Docker命令大全
• MySQL BIT、BINARY、VARBINARY、BLOB（二進制類型） - MySQL教程
• Composer 安裝與使用
• 算法總結-二分查找法

相關標籤/搜索

algorithm&datastructure

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。