統計文本詞頻

時間 2019-11-18

標籤統計文本詞頻简体版

原文原文鏈接

方法一：python

#將文本內容轉換爲字典進行統計
file01 = open('art.txt','r')
list = file01.read().replace(',','').replace('.','').replace(';','').split()    #讀取文件去除文本中的特殊符號並切片
list01 = {}
for i in list:  #生成字典，單詞爲keys，出現的次數爲value
    if i in list01.keys():
        list01[i] = list01[i] + 1
    else:
        list01[i] = 1

a = sorted(list01.items(), key=lambda va:va[1],reverse=True)    #排序
count = 0
for j in a:
    if count <5:
        print('單詞 %s 出現了 %d 次' % (j[0],j[1]))   #打印前5名
        count += 1
    else:
        break
file01.close()

方法二：blog

#將文本內容轉換爲列表進行統計
from collections import Counter
file = open('art.txt','r')
list01 =  file.read().replace(',','').replace('.','').replace(';','').split()   #讀取文件去除文本中的特殊符號並切片
a = Counter(list01)     #排序
b = a.most_common(5)    #取出前5名
for i in b:
    print('單詞 %s 出現了 %d 次' % (i[0], i[1]))
file01.close()

輸出結果：排序

單詞 the 出現了 6 次
單詞 of 出現了 5 次
單詞 in 出現了 3 次
單詞 to 出現了 3 次
單詞 something 出現了 3 次

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。