第二次結對編程

時間 2019-11-10

標籤第二次編程简体版

原文原文鏈接

第二次結對編程

合做方式

咱們的合做方式採起pair coding和 separate coding相結合的方式。剛開始的討論設計，分配功能，創建GitHub倉庫是一塊兒作的，夥伴搭建好了框架，互相分配好要實現的函數，經過GitHub源碼管理，進行分頭編程。當遇到框架/關鍵函數/目標功能等問題時候進行討論，pair coding解決html

討論內容

Design Guideline: 咱們根據目標討論了一下大體的代碼結構，根據討論好的函數分工搭好框架便可完成design guideline，分頭行動便可。
Coding Convention: 由隊友寫好了函數接口，以後只要按需求完成函數便可，命名按所有小寫約定。
Reach agreement: 咱們分別提出了一些方案，選擇理論上最快的實現，以後若是有改動的想法只須要跟隊友通報一聲push便可。

評價個人隊友

優勢：

經驗豐富，搭建框架使用代碼解決問題能力很是優秀
很是願意分享經驗知識
debug能力十分優秀

缺點：

說話不夠逗node

單元測試與迴歸測試

單元測試：咱們在 model.py 爲每個功能定義了專門的函數，一些共同的事情交給utils.py專門子函數負責，所以咱們從 utils.py 開始測試每個子函數，以後測試模塊函數便可
迴歸測試：每增長一個新的功能帶來以前公用的utils.py子函數的改變，測試一下以前的功能正常就能夠了，所以咱們去掉了最基礎的 utils.py 裏面的支持函數的測試，整合到了 coverage_test.py 中。

代碼覆蓋率測試

咱們使用了coverage包進行迴歸測試python

  
  
  
  
   
   
   
    
     
   
   
   
   
   
   
   
   
   
   
   
   
    
     
      
       
        
         
         xxxxxxxxxx 
         
         
         
         
          
           
           
          coverage run coverage_test.py 
          
          
          coverage report

結果以下：git

  
  
  
  
   
   
   
    
     
   
   
   
   
   
   
   
   
   
   
   
   
    
     
      
       
        
         
         xxxxxxxxxx 
         
         
         
         
          
           
           
          Name               Stmts   Miss  Cover 
          
         -------------------------------------- 
         coverage_test.py      36      0   100% 
         modes.py              94      0   100% 
         utils.py              68      0   100% 
         -------------------------------------- 
          
          TOTAL                198      0   100%

時間控制與效能分析

咱們使用 python 的 cProfile 進行效能分析，根據最初稿的時間效能分析咱們作了兩次優化，如下是隊友的分析與工做：github

優化前：web

  
  
  
  
   
   
   
    
     
   
   
   
   
   
   
   
   
   
   
   
   
    
     
      
       
        
         
         x 
         
         
         
         
          
           
           
          Tue Oct 30 20:14:19 2018    profile.stats 
          
          
                  697390 function calls (690360 primitive calls) in 0.650 seconds 
          
            Ordered by: internal time 
            List reduced from 2079 to 10 due to restriction <10> 
          
            ncalls  tottime  percall  cumtime  percall filename:lineno(function) 
             22391    0.141    0.000    0.141    0.000 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:102(<listcomp>) 
              1375    0.061    0.000    0.061    0.000 {built-in method nt.stat} 
             22391    0.060    0.000    0.074    0.000 C:\Users\v-yizzha\Desktop\WordFrequency\utils.py:14(get_phrases) 
                 1    0.045    0.045    0.382    0.382 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:83(mode_p) 
             27395    0.039    0.000    0.039    0.000 {method 'split' of 're.Pattern' objects} 
               306    0.023    0.000    0.023    0.000 {built-in method marshal.loads} 
             12/11    0.020    0.002    0.023    0.002 {built-in method _imp.create_dynamic} 
               306    0.017    0.000    0.027    0.000 <frozen importlib._bootstrap_external>:914(get_data) 
             27798    0.011    0.000    0.062    0.000 C:\Users\v-yizzha\AppData\Local\Continuum\anaconda3\envs\nltk\lib\re.py:271(_compile) 
         1067/1064    0.010    0.000    0.039    0.000 {built-in method 
          
          builtins.__build_class__}

發現用時最長是modes.py的listcomp操做，隊友發現他存stopword使用了list而不是set，增長了查找效率shell

  
  
  
  
   
   
   
    
     
   
   
   
   
   
   
   
   
   
   
   
   
    
     
      
       
        
         
         xxxxxxxxxx 
         
         
         
         
          
           
           
          pre_list = [word for word in pre_list if word not in stop_words]

改變以後效果以下：編程

  
  
  
  
   
   
   
    
     
   
   
   
   
   
   
   
   
   
   
   
   
    
     
      
       
        
         
         xxxxxxxxxx 
         
         
         
         
          
           
           
          Tue Oct 30 20:23:31 2018    profile.stats 
          
          
                  697516 function calls (690485 primitive calls) in 0.510 seconds 
          
            Ordered by: internal time 
            List reduced from 2094 to 10 due to restriction <10> 
          
            ncalls  tottime  percall  cumtime  percall filename:lineno(function) 
              1379    0.060    0.000    0.060    0.000 {built-in method nt.stat} 
             22391    0.058    0.000    0.072    0.000 C:\Users\v-yizzha\Desktop\WordFrequency\utils.py:14(get_phrases) 
                 1    0.040    0.040    0.234    0.234 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:83(mode_p) 
             27395    0.037    0.000    0.037    0.000 {method 'split' of 're.Pattern' objects} 
               304    0.023    0.000    0.023    0.000 {built-in method marshal.loads} 
             12/11    0.018    0.002    0.020    0.002 {built-in method _imp.create_dynamic} 
               308    0.018    0.000    0.028    0.000 <frozen importlib._bootstrap_external>:914(get_data) 
             22391    0.011    0.000    0.011    0.000 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:102(<listcomp>) 
         1067/1064    0.010    0.000    0.039    0.000 {built-in method builtins.__build_class__} 
          
              27798    0.010    0.000    0.058    0.000 C:\Users\v-yizzha\AppData\Local\Continuum\anaconda3\envs\nltk\lib\re.py:271(_compile)

發現listcomp時間 -0.13s，改變很是有效！bootstrap

以後是個人測試與改變：

測試前：

  
  
  
  
   
   
   
    
     
   
   
   
   
   
   
   
   
   
   
   
   
    
     
      
       
        
         
         xxxxxxxxxx 
         
         
         
         
          
           
           
          Thu Nov  1 18:20:35 2018    proflie.status 
          
          
                  1714748 function calls (1701302 primitive calls) in 1.118 seconds 
          
            Ordered by: internal time 
            List reduced from 3945 to 10 due to restriction <10> 
          
            ncalls  tottime  percall  cumtime  percall filename:lineno(function) 
             22391    0.179    0.000    0.238    0.000 C:\Users\v-qiyao\Documents\WordFrequency\utils.py:14(get_phrases) 
              3163    0.111    0.000    0.111    0.000 {built-in method nt.stat} 
            100/78    0.059    0.001    0.085    0.001 {built-in method _imp.create_dynamic} 
               741    0.052    0.000    0.052    0.000 {built-in method marshal.loads} 
                 1    0.041    0.041    0.455    0.455 C:\Users\v-qiyao\Documents\WordFrequency\modes.py:83(mode_p) 
             27395    0.040    0.000    0.040    0.000 {method 'split' of '_sre.SRE_Pattern' objects} 
            105354    0.035    0.000    0.035    0.000 C:\Users\v-qiyao\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\probability.py:127(__setitem__) 
               743    0.035    0.000    0.054    0.000 <frozen importlib._bootstrap_external>:830(get_data) 
             992/1    0.032    0.000    1.119    1.119 {built-in method builtins.exec} 
          
                  1    0.030    0.030    0.065    0.065 {built-in method _collections._count_eleme

根據結果發現耗時最多的是get_phrases子函數，做用是從一句話中截取短語，通過對以前源代碼的分析

  
  
  
  
   
   
   
    
     
   
   
   
   
   
   
   
   
   
   
   
   
    
     
      
       
        
         
         x 
         
         
         
         
          
           
           
          while(len(pre_list) >= n): 
          
          
                  target_phrase = [] 
          
                 for i in range(n): 
                     if not_word(pre_list[i]): 
                         for j in range(i+1): 
          
                              pre_list.pop(0) 
          
                         break 
                     else: 
          
                          target_phrase.append(pre_list[i]) 
          
                 if len(target_phrase) == n : 
                     target_str = target_phrase[0] 
                     for i in range(n-1): 
                         target_str += " "+target_phrase[i+1]  
                     result.append(target_str) 
                     pre_list.pop(0) 
          
              return result

結果多增長了一個tuple，多了不必的pop操做，因而進行了如下優化：

  
  
  
  
   
   
   
    
     
   
   
   
   
   
   
   
   
   
   
   
   
    
     
      
       
        
         
         xxxxxxxxxx 
         
         
         
         
          
           
           
          for j in range(len(pre_list)+1-n): 
          
                 target_phrase = "" 
                 for i in range(n): 
                     if not_word(pre_list[i+j]): 
                         j += i 
                         break 
                     elif target_phrase == "": 
                         target_phrase += pre_list[i+j] 
                     else : 
                         target_phrase += (' ' + pre_list[i+j]) 
                     if i == n-1: 
          
                          result.append(target_phrase)

結果以下顯示：

  
  
  
  
   
   
   
    
     
   
   
   
   
   
   
   
   
   
   
   
   
    
     
      
       
        
         
         xxxxxxxxxx 
         
         
         
         
          
           
           
          Thu Nov  1 18:22:38 2018    proflie.status 
          
          
                  1187845 function calls (1174399 primitive calls) in 0.972 seconds 
          
            Ordered by: internal time 
            List reduced from 3945 to 10 due to restriction <10> 
          
            ncalls  tottime  percall  cumtime  percall filename:lineno(function) 
              3163    0.109    0.000    0.109    0.000 {built-in method nt.stat} 
             22391    0.095    0.000    0.118    0.000 C:\Users\v-qiyao\Documents\WordFrequency\utils.py:14(get_phrases) 
            100/78    0.055    0.001    0.081    0.001 {built-in method _imp.create_dynamic} 
               741    0.052    0.000    0.052    0.000 {built-in method marshal.loads} 
                 1    0.040    0.040    0.336    0.336 C:\Users\v-qiyao\Documents\WordFrequency\modes.py:83(mode_p) 
             27395    0.039    0.000    0.039    0.000 {method 'split' of '_sre.SRE_Pattern' objects} 
            105544    0.036    0.000    0.036    0.000 C:\Users\v-qiyao\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\probability.py:127(__setitem__) 
               743    0.034    0.000    0.053    0.000 <frozen importlib._bootstrap_external>:830(get_data) 
                 1    0.033    0.033    0.068    0.068 {built-in method _collections._count_elements} 
          
              992/1    0.030    0.000    0.973    0.973 {built-in method builtins.exec}

get_phrases 函數運行時間 -0.08s 效果顯著

時間總結

根據結果輸出，-n 10 -p 2 -v verbs.txt下時間已經縮小到0.27s，咱們使用nltk函數庫進行從list到dic而且sort的操做，cProfile輸出顯示最多時間爲build-in函數，而通過大文件的測試，時間結果基本符合O(nlgn)增加，以前有過屢次文件操做致使時間很慢，已經經過優化代碼邏輯馬上解決掉了，並無存爲commit。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。