Githubcss
咱們的合做方式採起pair coding和 separate coding相結合的方式。剛開始的討論設計,分配功能,創建GitHub倉庫是一塊兒作的,夥伴搭建好了框架,互相分配好要實現的函數,經過GitHub源碼管理,進行分頭編程。當遇到框架/關鍵函數/目標功能等問題時候進行討論,pair coding解決html
說話不夠逗node
咱們使用了coverage包進行迴歸測試python
xxxxxxxxxx
coverage run coverage_test.py
coverage report
結果以下:git
xxxxxxxxxx
Name Stmts Miss Cover
--------------------------------------
coverage_test.py 36 0 100%
modes.py 94 0 100%
utils.py 68 0 100%
--------------------------------------
TOTAL 198 0 100%
咱們使用 python 的 cProfile 進行效能分析,根據最初稿的時間效能分析咱們作了兩次優化,如下是隊友的分析與工做:github
優化前:web
x
Tue Oct 30 20:14:19 2018 profile.stats
697390 function calls (690360 primitive calls) in 0.650 seconds
Ordered by: internal time
List reduced from 2079 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
22391 0.141 0.000 0.141 0.000 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:102(<listcomp>)
1375 0.061 0.000 0.061 0.000 {built-in method nt.stat}
22391 0.060 0.000 0.074 0.000 C:\Users\v-yizzha\Desktop\WordFrequency\utils.py:14(get_phrases)
1 0.045 0.045 0.382 0.382 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:83(mode_p)
27395 0.039 0.000 0.039 0.000 {method 'split' of 're.Pattern' objects}
306 0.023 0.000 0.023 0.000 {built-in method marshal.loads}
12/11 0.020 0.002 0.023 0.002 {built-in method _imp.create_dynamic}
306 0.017 0.000 0.027 0.000 <frozen importlib._bootstrap_external>:914(get_data)
27798 0.011 0.000 0.062 0.000 C:\Users\v-yizzha\AppData\Local\Continuum\anaconda3\envs\nltk\lib\re.py:271(_compile)
1067/1064 0.010 0.000 0.039 0.000 {built-in method
builtins.__build_class__}
發現用時最長是modes.py的listcomp操做,隊友發現他存stopword使用了list而不是set,增長了查找效率shell
xxxxxxxxxx
pre_list = [word for word in pre_list if word not in stop_words]
改變以後 效果以下:編程
xxxxxxxxxx
Tue Oct 30 20:23:31 2018 profile.stats
697516 function calls (690485 primitive calls) in 0.510 seconds
Ordered by: internal time
List reduced from 2094 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1379 0.060 0.000 0.060 0.000 {built-in method nt.stat}
22391 0.058 0.000 0.072 0.000 C:\Users\v-yizzha\Desktop\WordFrequency\utils.py:14(get_phrases)
1 0.040 0.040 0.234 0.234 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:83(mode_p)
27395 0.037 0.000 0.037 0.000 {method 'split' of 're.Pattern' objects}
304 0.023 0.000 0.023 0.000 {built-in method marshal.loads}
12/11 0.018 0.002 0.020 0.002 {built-in method _imp.create_dynamic}
308 0.018 0.000 0.028 0.000 <frozen importlib._bootstrap_external>:914(get_data)
22391 0.011 0.000 0.011 0.000 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:102(<listcomp>)
1067/1064 0.010 0.000 0.039 0.000 {built-in method builtins.__build_class__}
27798 0.010 0.000 0.058 0.000 C:\Users\v-yizzha\AppData\Local\Continuum\anaconda3\envs\nltk\lib\re.py:271(_compile)
發現listcomp時間 -0.13s,改變很是有效!bootstrap
以後是個人測試與改變:
測試前:
xxxxxxxxxx
Thu Nov 1 18:20:35 2018 proflie.status
1714748 function calls (1701302 primitive calls) in 1.118 seconds
Ordered by: internal time
List reduced from 3945 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
22391 0.179 0.000 0.238 0.000 C:\Users\v-qiyao\Documents\WordFrequency\utils.py:14(get_phrases)
3163 0.111 0.000 0.111 0.000 {built-in method nt.stat}
100/78 0.059 0.001 0.085 0.001 {built-in method _imp.create_dynamic}
741 0.052 0.000 0.052 0.000 {built-in method marshal.loads}
1 0.041 0.041 0.455 0.455 C:\Users\v-qiyao\Documents\WordFrequency\modes.py:83(mode_p)
27395 0.040 0.000 0.040 0.000 {method 'split' of '_sre.SRE_Pattern' objects}
105354 0.035 0.000 0.035 0.000 C:\Users\v-qiyao\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\probability.py:127(__setitem__)
743 0.035 0.000 0.054 0.000 <frozen importlib._bootstrap_external>:830(get_data)
992/1 0.032 0.000 1.119 1.119 {built-in method builtins.exec}
1 0.030 0.030 0.065 0.065 {built-in method _collections._count_eleme
根據結果發現耗時最多的是get_phrases子函數,做用是從一句話中截取短語,通過對以前源代碼的分析
x
while(len(pre_list) >= n):
target_phrase = []
for i in range(n):
if not_word(pre_list[i]):
for j in range(i+1):
pre_list.pop(0)
break
else:
target_phrase.append(pre_list[i])
if len(target_phrase) == n :
target_str = target_phrase[0]
for i in range(n-1):
target_str += " "+target_phrase[i+1]
result.append(target_str)
pre_list.pop(0)
return result
結果多增長了一個tuple,多了不必的pop操做,因而進行了如下優化:
xxxxxxxxxx
for j in range(len(pre_list)+1-n):
target_phrase = ""
for i in range(n):
if not_word(pre_list[i+j]):
j += i
break
elif target_phrase == "":
target_phrase += pre_list[i+j]
else :
target_phrase += (' ' + pre_list[i+j])
if i == n-1:
result.append(target_phrase)
結果以下顯示:
xxxxxxxxxx
Thu Nov 1 18:22:38 2018 proflie.status
1187845 function calls (1174399 primitive calls) in 0.972 seconds
Ordered by: internal time
List reduced from 3945 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
3163 0.109 0.000 0.109 0.000 {built-in method nt.stat}
22391 0.095 0.000 0.118 0.000 C:\Users\v-qiyao\Documents\WordFrequency\utils.py:14(get_phrases)
100/78 0.055 0.001 0.081 0.001 {built-in method _imp.create_dynamic}
741 0.052 0.000 0.052 0.000 {built-in method marshal.loads}
1 0.040 0.040 0.336 0.336 C:\Users\v-qiyao\Documents\WordFrequency\modes.py:83(mode_p)
27395 0.039 0.000 0.039 0.000 {method 'split' of '_sre.SRE_Pattern' objects}
105544 0.036 0.000 0.036 0.000 C:\Users\v-qiyao\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\probability.py:127(__setitem__)
743 0.034 0.000 0.053 0.000 <frozen importlib._bootstrap_external>:830(get_data)
1 0.033 0.033 0.068 0.068 {built-in method _collections._count_elements}
992/1 0.030 0.000 0.973 0.973 {built-in method builtins.exec}
get_phrases 函數運行時間 -0.08s 效果顯著
根據結果輸出,-n 10 -p 2 -v verbs.txt下時間已經縮小到0.27s,咱們使用nltk函數庫進行從list到dic而且sort的操做,cProfile輸出顯示最多時間爲build-in函數,而通過大文件的測試,時間結果基本符合O(nlgn)增加,以前有過屢次文件操做致使時間很慢,已經經過優化代碼邏輯馬上解決掉了,並無存爲commit。