第三次課後做業

時間 2019-12-11

標籤第三次課後简体版

原文原文鏈接

1. 博客開頭給出本身的基本信息，格式建議以下：
  學號:2017*****7147；
  姓名：何曉航;
  碼雲項目倉庫：https://gitee.com/hxhdemayun/word_frequency_count/blob/master/%E4%BD%9C%E4%B8%9A03
2. 程序分析，對程序中的四個函數作簡要說明。要求附上每一段代碼及對應的說明。
首先聲明編碼方式和導入string模塊中的punctuation方法javascript

# -*- coding: UTF-8 -*- from string import punctuation

1.讀取文件函數--打開文件讀入緩衝區並關閉文件java

def process_file(dst): # 讀文件到緩衝區 try: # 打開文件 txt = open(dst, "r") except IOError, s: print s return None try: # 讀文件到緩衝區 bvffer=txt.read() except: print "Read File Error!" return None txt.close() return bvffer

2.數據處理--去除字符串中的符號將單詞分割並讀入字典。python

def process_buffer(bvffer):
    if bvffer: word_freq = {} # 下面添加處理緩衝區 bvffer代碼，統計每一個單詞的頻率，存放在字典word_freq for item in bvffer.strip().split(): word = item.strip(punctuation + ' ') if word in word_freq.keys(): word_freq[word] += 1 else: word_freq[word] = 1 return word_freq

3.輸出Top10結果--遍歷字典並輸出Top10的單詞git

def output_result(word_freq):
    if word_freq: sorted_word_freq = sorted(word_freq.items(), key=lambda v: v[1], reverse=True) for item in sorted_word_freq[:10]: # 輸出 Top 10 的單詞 print(item)

4.導入argparse庫用於解析命令行數據，依次執行函數函數

if __name__ == "__main__": import argparse parser = argparse.ArgumentParser() parser.add_argument('dst') args = parser.parse_args() dst = args.dst bvffer = process_file(dst) word_freq = process_buffer(bvffer) output_result(word_freq)

在命令中輸入python word_freq.py Gone_with_the_wind.txt運行代碼
結果以下,輸出了詞頻Top10的單詞和次數:
post