算法圖解---讀書筆記

算法簡介

An algorithm is an effective method that can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function.算法是在一個明肯定義的標準語言裏在有限空間與時間內計算函數的有效方法。node

  1. 二分查找:

猜數字遊戲的二分查找代碼:git

def binary_search(list, item):
  # low and high keep track of which part of the list you'll search in.
  low = 0
  high = len(list) - 1

  # While you haven't narrowed it down to one element ...
  while low <= high:
    # ... check the middle element
    mid = (low + high) // 2
    guess = list[mid]
    # Found the item.
    if guess == item:
      return mid
    # The guess was too high.
    if guess > item:
      high = mid - 1
    # The guess was too low.
    else:
      low = mid + 1

  # Item doesn't exist
  return None

my_list = [1, 3, 5, 7, 9]

print binary_search(my_list, 3) # => 1

# 'None' means nil in Python. We use to indicate that the item wasn't found.
print binary_search(my_list, -1) # => None
複製代碼
  1. 一些常見的大O運行時間算法

    • O(logn): 對數時間,如二分查找;
    • O(n): 線性時間,如簡單查找;
    • O(n*logn): 如快排;
    • O(n^2): 速度較慢的排序算法;
    • -O(n!):如旅行商問題解決方案。
  2. 一些結論:express

    • 算法速度並不是時間,而是操做數的增速;
    • 談及算法速度,通常爲隨着輸入的增長其運行時間將如何增長;
    • 算法運行時間以大O表示法表示;

選擇排序

內存的工做原理: 將數據存入內存時,請求計算機提供存儲空間,計算機給你一存儲地址。當存儲多項數據時,有數組和鏈表兩種存儲方式。數組

  • 常見的數組和鏈表操做的運行時間:
數組 鏈表
讀取 O(1) O(n)
插入 O(n) O(1)
刪除 O(n) O(1)

注意: 僅當可以當即訪問要刪除的元素時,刪除操做的運行時間才爲O(1)。一般咱們都記錄了鏈表的首尾元素。緩存

  • 隨機訪問和順序訪問: 鏈表只能順序訪問,數組都行;因爲數組支持隨機訪問故其讀取速度快,且其應用場景較多。
  • 選擇排序實現
# Finds the smallest value in an array
def findSmallest(arr):
  # Stores the smallest value
  smallest = arr[0]
  # Stores the index of the smallest value
  smallest_index = 0
  for i in range(1, len(arr)):
    if arr[i] < smallest:
      smallest = arr[i]
      smallest_index = i
  return smallest_index

# Sort array
def selectionSort(arr):
  newArr = []
  for i in range(len(arr)):
      # Finds the smallest element in the array and adds it to the new array
      smallest = findSmallest(arr)
      newArr.append(arr.pop(smallest))
  return newArr

print selectionSort([5, 3, 6, 2, 10]) #[2, 3, 5, 6, 10]
複製代碼

遞歸

若是使用循環,程序的性能可能更高;若是使用遞歸,程序可能更容易理解。bash

  • 遞歸條件:基線條件(base case,不遞歸)和遞歸條件(recursive case,遞歸)
  • 調用棧 & 遞歸調用棧:計算機內存被使用的順序,全部函數調用都進入調用棧。
  • 棧:計算機在內部使用被稱爲調用棧的棧
  • 棧有兩種操做: 壓入和彈出
  • 調用棧可能很長,將佔用很大內存,可使用尾遞歸或循環或重寫代碼來優化。
  • 示例代碼:
# recursive count
def count(list):
  if list == []:
    return 0
  return 1 + count(list[1:])
  
# recursive max
def max(list):
  if len(list) == 2:
    return list[0] if list[0] > list[1] else list[1]
  sub_max = max(list[1:])
  return list[0] if list[0] > sub_max else sub_max
  
 # factorial
def fact(x):
  if x == 1:
    return 1
  else:
    return x * fact(x-1)

print fact(5)
複製代碼

快速排序

D & C(divide and Conquer):分而治之,一種遞歸式問題解決方案,快速排序就是很好的🌰。數據結構

  • 快排代碼(層數爲O(logn),每層所需時間爲O(n),算法複雜度O(n*log(n))):
def quicksort(array):
  if len(array) < 2:
    # base case, arrays with 0 or 1 element are already "sorted"
    return array
  else:
    # recursive case
    pivot = array[0]
    # sub-array of all the elements less than the pivot
    less = [i for i in array[1:] if i <= pivot]
    # sub-array of all the elements greater than the pivot
    greater = [i for i in array[1:] if i > pivot]
    return quicksort(less) + [pivot] + quicksort(greater)

print quicksort([10, 5, 2, 3])
複製代碼
  • 回顧一下大O表示法(基於每秒10次操做,僅做大體認識):

  • 平均(最佳)狀況和最糟狀況:快速排序的高度依賴於所選的基準值,由此出現最佳和最糟狀況。

散列表

  • 散列函數:將輸入映射到數字。特色:將相同的輸入映射到相同的數字,將不一樣輸入映射到不一樣數字。
  • 散列表應用場景:模擬映射關係,防止重複,緩存數據等。
  • 散列函數很重要,理想的散列函數將鍵均勻地映射到散列表的不一樣位置。
  • 用法示例:
voted = {}
def check_voter(name):
  if voted.get(name):
    print "kick them out!"
  else:
    voted[name] = True
    print "let them vote!"

check_voter("tom")
check_voter("mike")
check_voter("mike")
複製代碼

廣度優先搜索

  • 廣度優先算法:解決最短路徑問題(shortest-path-problem)的算法,其運行時間爲O(V+E),Vertice爲頂點數,Edge爲邊數。
  • 圖:圖有節點和邊組成。一個節點可能與衆多節點直接相連(鄰居節點)。
  • 隊列:是一種先進先出(FIFO)的數據結構,而棧是一種後進先出(LIFO)的數據結構。
  • 應用案例:
from collections import deque


def person_is_seller(name):
    return name[-1] == 'm'


graph = {}
graph["you"] = ["alice", "bob", "claire"]
graph["bob"] = ["anuj", "peggy"]
graph["alice"] = ["peggy"]
graph["claire"] = ["thom", "jonny"]
graph["anuj"] = []
graph["peggy"] = []
graph["thom"] = []
graph["jonny"] = []


def search(name):
    search_queue = deque()
    search_queue += graph[name]
    # This array is how you keep track of which people you've searched before.
    searched = []
    while search_queue:
        person = search_queue.popleft()
        # Only search this person if you haven't already searched them.
        if not person in searched:
            if person_is_seller(person):
                print person + " is a mango seller!"
                return True
            else:
                search_queue += graph[person]
                # Marks this person as searched
                searched.append(person)
    return False


search("you")
複製代碼

狄克斯特拉算法(Dijkstra’s algorithm)

廣度優先搜索,找出的是段數最少的路徑,但不必定是最快但路徑。而Dijkstra’s algorithm就是解決找出最快路徑但問題。app

  • 四個步驟:less

    1. 找出最便宜的節點,即最短期內可到達的節點;
    2. 對於該節點的鄰居,堅持是否有前往它們的最短路徑,有則更新其開銷;
    3. 重複此過程,直到每一個節點都這麼作了;
    4. 計算最終路徑。
  • 權重:該算法每條邊都有關聯數字的圖。

  • (非)加權圖((un)weight graph):(不)帶權重的圖。計算非加權圖的最短路徑用廣度優先算法,計算加權圖最短路徑用狄克斯特拉算法。

  • 示例:

# the graph
graph = {}
graph["start"] = {}
graph["start"]["a"] = 6
graph["start"]["b"] = 2

graph["a"] = {}
graph["a"]["fin"] = 1

graph["b"] = {}
graph["b"]["a"] = 3
graph["b"]["fin"] = 5

graph["fin"] = {}

# the costs table
infinity = float("inf")
costs = {}
costs["a"] = 6
costs["b"] = 2
costs["fin"] = infinity

# the parents table
parents = {}
parents["a"] = "start"
parents["b"] = "start"
parents["fin"] = None

processed = []


def find_lowest_cost_node(costs):
    lowest_cost = float("inf")
    lowest_cost_node = None
    # Go through each node.
    for node in costs:
        cost = costs[node]
        # If it's the lowest cost so far and hasn't been processed yet...
        if cost < lowest_cost and node not in processed:
            # ... set it as the new lowest-cost node.
            lowest_cost = cost
            lowest_cost_node = node
    return lowest_cost_node


# Find the lowest-cost node that you haven't processed yet.
node = find_lowest_cost_node(costs)
# If you've processed all the nodes, this while loop is done.
while node is not None:
    cost = costs[node]
    # Go through all the neighbors of this node.
    neighbors = graph[node]
    for n in neighbors.keys():
        new_cost = cost + neighbors[n]
        # If it's cheaper to get to this neighbor by going through this node...
        if costs[n] > new_cost:
            # ... update the cost for this node.
            costs[n] = new_cost
            # This node becomes the new parent for this neighbor.
            parents[n] = node
    # Mark the node as processed.
    processed.append(node)
    # Find the next node to process, and loop.
    node = find_lowest_cost_node(costs)

print "Cost from the start to each node:"
print costs
複製代碼

貪婪算法

  • 貪婪算法:每步都選擇局部最優解,最終獲得的就是全局最優解。簡單易行。
  • 近似算法:在得到精確解須要⌚️過長時,🉑️使用近似解算法,其判斷標準有速度有多快和與最優解的接近程度兩方面。
  • NP徹底問題:沒有快速算法的問題。爲解決集合覆蓋問題,必須計算每一個可能的組。
  • 如何判斷NP徹底問題(旅行商問題):
    • 元素較少時算法運行速度很是快,但隨元素🌲的增長,速度會很是慢;
    • 涉及全部組合;
    • 不能將問題分紅小問題,必須💭各類可能狀況;
    • 涉及排列或組合或集合覆蓋問題且難以解決;

動態規劃

工做原理:先解決子問題,在逐步解決大問題。

  • 動態規劃啓示:
    • 在約束條件下找到最優解;
    • 在問題可被分爲獨立的子問題時可考慮;
    • 涉及網格;
    • 單元格🀄️的值一般就是你要優化的值;
    • 每一個單元格都是☝️子問題,你該考慮如何將問題劃分爲子問題,有利於你找出網格的座標軸。
  • 費曼算法:
    • 將問題寫下來;
    • 好好🤔;
    • 將答案寫下來。
  • 應用場景:
    • 生物學家利用最長公共序列來✅DNA 鏈的類似性 ;
    • git diff 算法的實現;
    • 字符串類似度(編輯距離);
    • Microsoft Word等斷字功能的實現等。

K 最近鄰算法

  • 餘弦類似度:距離公式;
  • 分類(classification):編組;
  • 迴歸(regression):預測結果;
  • 特徵抽取:將物品轉爲一系列能夠比較的數字

接下來如何作

  • 🌲:二叉查找樹,B🌲,紅黑🌲;
  • 反向索引:搜索引擎的工做原理(根據網頁內容建立一散列表,鍵爲單詞,值爲包含指定單詞的📃);
  • 傅立葉變換;
  • 並行算法;
  • MapReduce;
  • 布隆過濾器;
  • SHA算法;
  • 局部敏感的散列算法;
  • Diffie-Hellman密鑰交換;
  • 線性規劃
相關文章
相關標籤/搜索