如何對你的Python代碼進行基準測試

時間 2019-12-04

原文原文鏈接

啥叫作基準測試(benchmark)代碼?其實主要就是測試(benchmarking)和分析(profiling)你的代碼執行多快，而且找到代碼瓶頸(bottlenecks)在哪裏。執行該動做的主要的目的就是優化(optimization)。也許由於業務須要你而且讓你的代碼執行更快一些。當面臨這種狀況時，你就須要找出你的代碼是在哪些部分慢的。python

本文覆蓋如何使用不一樣工具進行代碼測試。數據庫

timeit

Python再帶一個模塊timeit。你可使用它來對小代碼進行計時。timeit模塊使用針對平臺的時間函數，你能夠獲得很是準確的一個時間結果。dom

timeit模塊有個命令行接口，也能夠經過導入的方式。咱們將從命令行中使用開始。打開終端執行以下命令：函數

$ python3 -m timeit -s "[ord(x) for x in 'abcdfghi']" 100000000 loops, best of 3: 0.00907 usec per loop $ python3 -m timeit -s "[chr(int(x)) for x in '123456789']" 100000000 loops, best of 3: 0.00897 usec per loop

發生啥啦？好吧，當你執行Python命令而且傳遞-m參數時，你是在告訴解釋器去找該模塊而且做爲主要程序執行。 -s告知timeit模塊執行一次代碼。而後timeit就行執行代碼三次，而且返回該三次的平均結果。該例子太過減小，它們之間的區別可能不太明顯。工具

你的機器上運行的結果可能跟個人會有一些誤差。oop

咱們再寫個簡單函數，看看是否能夠從命令行測試學習

# simple_func.py def my_function(): try: 1 / 0 except ZeroDivisionError: pass

整個函數的功能就是拋出一個異常而後直接忽略。很爛的例子。要經過命令使用timeit來測試，咱們須要在命令空間中導入它，確保工做目錄在當前文件目錄下，而後執行：測試

$ python3 -m timeit "import simple_func; simple_func.my_function()" 1000000 loops, best of 3: 0.753 usec per loop

這裏咱們就導入該模塊而且直接my_function()函數。注意咱們使用了;來分來導入和執行代碼。如今能夠開始學習如何在Python腳本代碼內部使用timeit了。優化

導入timeit測試

在代碼中使用timeit模塊也是至關容易的。咱們仍是使用以前很爛的例子：ui

def my_function(): try: 1 / 0 except ZeroDivisionError: pass if __name__ == "__main__": import timeit setup = "from __main__ import my_function" print(timeit.timeit("my_function()", setup=setup))

咱們執行經過執行該腳原本驗證。咱們導入了timeit模塊，而後調用timeit函數，參數的setup字符串將導入要測試函數到timeit的做用域中。如今咱們實現本身的裝飾器測試函數。

使用裝飾器

實現本身的裝飾器計時函數是一件頗有趣的事情，雖然不必定有timeit精確。

import random import time def timerfunc(func): """  A timer decorator  """ def function_timer(*args, **kwargs): """  A nested function for timing other functions  """ start = time.time() value = func(*args, **kwargs) end = time.time() runtime = end - start msg = "The runtime for {func} took {time} seconds to complete" print(msg.format(func=func.__name__, time=runtime)) return value return function_timer @timerfunc def long_runner(): for x in range(5): sleep_time = random.choice(range(1, 5)) time.sleep(sleep_time) if __name__ == "__main__": long_runner()

該例子中，咱們導入了Python標準庫的random和time模塊。而後咱們建立了裝飾器函數。你將注意到它接收了一個函數以及包含了另外一函數。嵌套的函數在調用實際函數時將抓取時間，而後等待函數返回並抓取結束時間。如今咱們知道函數所花費的時間，咱們將它打印出來。固然該裝飾器函數也須要將實際函數執行返回的結果返回回來。

接下來的函數就是包裝了該計時裝飾器函數。你能夠看到這裏使用了隨機休眠，該操做只是用來模擬耗時長程序。也許你用來執行鏈接數據庫（或者較大查詢），跑線程或其餘事情。

每次跑這段代碼，結果都會有些不一樣。本身試試！

建立計時上下文管理器

一些碼農可能更喜歡使用上下文的方式來測試小段代碼。咱們來構建看看：

import random import time class MyTimer: def __init__(self): self.start = time.time() def __enter__(self): return self def __exit__(self, exc_type, exc_val, exc_tb): end = time.time() runtime = end - self.start msg = 'The function took {time} seconds to complete' print(msg.format(time=runtime)) def long_runner(): for x in range(5): sleep_time = random.choice(range(1, 5)) time.sleep(sleep_time) if __name__ == "__main__": with MyTimer(): long_runner()

本例中，咱們使用__init__方法來開始咱們計時，__enter__方法不作任何事情，僅僅返回它本身。__exit__方法中抓取告終束時間，計算了總執行時間並打印。

而後咱們就使用上下文管理器來測試咱們執行的代碼。

cProfile

Python內建了分析器。profile和cProfile模塊。profile模塊是純Python實現，在測試過程當中可能會增長不少額外的開支。因此推薦仍是使用更快的cProfile。

咱們不會過細的講解該模塊，咱們來看看一些例子。

>>> import cProfile >>> cProfile.run("[x for x in range(1500)]") 4 function calls in 0.000 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 <string>:1(<listcomp>) 1 0.000 0.000 0.000 0.000 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {built-in method exec} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

咱們來了解一下內容。第一行顯示了有4個函數調用。下一行告訴結果排序方式。根據文檔，standard name指的是最右邊的列。這裏咱們有好多列數據：

ncalls: 調用了多少次
tottime: 函數所花費的總時間
percall: ncalls / tottime
cumtime: 涵蓋了該函數下的全部子函數（甚至地櫃）函數所話費的時間
第二個percall: cumtime / 原始釣魚那個
filename: fileno (function) 每一個函數響應信息

你也能夠像timeit模塊同樣在命令行執行cProfile。主要的區別就是須要傳遞Python腳本而不是代碼做爲參數。

python3 -m cProfile test.py

嘗試執行下該命令，看看執行結果。

line_profiler

有個第三方提供的項目叫line_profiler，它用來分析每行代碼執行時間。它也有一個叫作kenprof的腳本使用line_profiler來分析Python應用和腳本。只需使用pip來安裝該包。

 pip install line_profiler

在使用line_profiler以前咱們須要一些代碼來分析。不過我向先告訴你命令行中如何調用line_profiler.實際上須要執行kernprof腳原本調用line_profiler。

 kernprof -l silly_functions.py

執行完後將會打印出：Wrote profile results to silly_functions.py.lpro。該文件不能直接查看，是一個二進制文件。當咱們執行kernprof時，實際上是在__builtins__命名空間中注入LineProfiler實例。該實例稱爲profile，用來做爲裝飾器函數。讓咱們來寫實際腳本:

import time @profile def fast_function(): print("I'm fast function!") @profile def slow_function(): time.sleep(2) print("I'm slow function!") if __name__ == "__main__": fast_function() slow_function()

如今咱們擁有使用了profile裝飾的函數。若是你直接執行該腳本將會報NameError錯誤，由於並無定義profile。因此記得再實際使用時移除@profile裝飾器函數。

咱們來學習若是查看分析器的結果。有兩種辦法，第一使用line_profiler模塊讀結果文件：

$ python -m line_profiler silly_functions.py.lprof

還有種是在執行kernprof時帶上-v參數：

$ kernprof -l -v silly_functions.py

不管哪一種辦法你都將獲得以下結果：

I'm fast function! I'm slow function! Wrote profile results to silly_functions.py.lprof Timer unit: 1e-06 s Total time: 4e-05 s File: silly_functions.py Function: fast_function at line 3 Line # Hits Time Per Hit % Time Line Contents ============================================================== 3 @profile 4 def fast_function(): 5 1 40 40.0 100.0 print("I'm fast function!") Total time: 2.00227 s File: silly_functions.py Function: slow_function at line 7 Line # Hits Time Per Hit % Time Line Contents ============================================================== 7 @profile 8 def slow_function(): 9 1 2002114 2002114.0 100.0 time.sleep(2) 10 1 157 157.0 0.0 print("I'm slow function!")

咱們能夠看到結果中顯示了每行代碼的執行時間相關信息。有6列內容。咱們來看看每列表明的意思。

Line#: 分析的代碼行數
Hits: 執行次數
Time: 該行代碼執行時間
Per Hit: 每次執行時間
% Time: 函數中所佔時間比率
Line Contents: 執行的代碼

若是你用IPython，你可能想知道IPython的魔術方法%lprun能夠分析制定函數，甚至是具體代碼。

memory_profiler

另外一個用來分析的第三方包叫作memory_profiler.它用來分析每行代碼在進程中所消耗的內存。咱們也使用pip來安裝它。

pip install memory_profiler

裝好後，咱們也須要代碼來評測。memory_prifiler執行方式和line_profiler很是類似，它將在__builtins__注入profiler裝飾器來測試函數。

@profile
def mem_func():
    lots_of_numbers = list(range(1500))
    x = ['letters'] * (5 ** 10)
    del lots_of_numbers
    return None

if __name__ == "__main__":
    mem_func()

本例中咱們建立了1500整形list。而後咱們又建立了9765625(5的10次方)字符串李彪。最後咱們刪除了第一個list並返回。memory_profiler沒有提供另一個腳原本讓你執行該程序，而是直接經過-m參數來調用模塊。

$ python -m memory_profiler memo_prof.py 
Filename: memo_prof.py

Line # Mem usage Increment Line Contents ================================================ 1 13.133 MiB 0.000 MiB @profile 2 def mem_func(): 3 13.387 MiB 0.254 MiB lots_of_numbers = list(range(1500)) 4 87.824 MiB 74.438 MiB x = ['letters'] * (5 ** 10) 5 87.938 MiB 0.113 MiB del lots_of_numbers 6 87.938 MiB 0.000 MiB return None

結果的提供的信息就很是直觀了。包含代碼行數和改行執行完後的內存使用，接下來是內存增長量，最後是執行的代碼。

memory_profiler也提供mprof用來建立完整的內存使用報表，而不是每行結果。

$ mprof run memo_prof.py
mprof: Sampling memory every 0.1s
running as a Python program...

mprof也提供來圖示來查看應用內存消耗，須要查看圖表前咱們須要先安裝matplotlib

$ pip install matplotlib

而後執行:

$ mprof plot

咱們獲得以下圖：

profilehooks

最後個要介紹的第三方包是profilehooks.它擁有一系列包裝器來分期函數。安裝：

pip install profilehooks

安裝完後，咱們修改上次例子替換成profilehooks:

from profilehooks import profile @profile def mem_func(): lots_of_numbers = list(range(1500)) x = ['letters'] * (5 ** 10) del lots_of_numbers return None if __name__ == "__main__": mem_func()

僅僅要作的是導入profilehooks而且包裝到要測試的函數中。執行後：

*** PROFILER RESULTS *** mem_func (profhooks.py:3) function called 1 times 2 function calls in 0.036 seconds Ordered by: cumulative time, internal time, call count ncalls tottime percall cumtime percall filename:lineno(function) 1 0.036 0.036 0.036 0.036 profhooks.py:3(mem_func) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 0 0.000 0.000 profile:0(profiler)

輸出的結果和cProfile是同樣的，你能夠看看我以前對這些列的描述。profilehooks提供了兩個裝飾器。咱們來看看第一個timecall，它提供了函數執行時間。

from profilehooks import timecall @timecall def mem_func(): lots_of_numbers = list(range(1500)) x = ['letters'] * (5 ** 10) del lots_of_numbers return None if __name__ == "__main__": mem_func()

執行後你將看到以下輸出：

  mem_func (profhooks2.py:3):
    0.056 seconds

整個包裝其輸出整個函數執行時間，和timeit是同樣的。

另外一個profilehooks提供的裝飾器是coverage。它用來打印函數每行代碼涵蓋數。不過我看來用處到不是很大，你能夠本身試試：

*** COVERAGE RESULTS *** mem_func (profhooks3.py:3) function called 1 times @coverage def mem_func(): 1: lots_of_numbers = list(range(1500)) 1: x = ['letters'] * (5 ** 10) 1: del lots_of_numbers 1: return None

最後要告訴你的是，你也能夠經過-m參數來從命令行中執行profilehooks.

python -m profilehooks mymodule.py

profilehooks是一個新包，不過我到以爲應該會頗有前途。

總結

本文涵蓋了須要信息。分享瞭如何使用Python內建模塊,timeit和cProfile來測試和分期你的代碼，我還學會了如何本身包裝計時包裝器和上下文管理器。而後咱們還學習了一些第三方包：line_profiler, memory_profiler和profilehooks。就目前，你已經學會如何分析你的代碼，查找代碼瓶頸。