如今有一段文本ide
As I was waiting, a man came out of a side room, and at a glance I was sure he must be Long John. His left leg was cut off close by the hip, and under the left shoulder he carried a crutch, which he managed with wonderful dexterity, hopping about upon it like a bird. He was very tall and strong, with a face as big as a ham—plain and pale, but intelligent and smiling. Indeed, he seemed in the most cheerful spirits, whistling as he moved about among the tables, with a merry word or a slap on the shoulder for the more favoured of his guests.
我就是想看看 裏面的詞的高頻和低頻spa
我須要作兩件事情code
1. 先分詞,分詞咱們就按照標點和空格來分orm
2. 接着統計詞頻blog
import re from collections import Counter def count_words(text): """Count """ counts = dict() # convert to lower case text_lower = text.lower() tokens = re.split('\W+', text_lower) counts = Counter(tokens) return counts def test_run(): with open("text.txt", "r") as f: text = f.read() counts = count_words(text) sorted_counts = sorted(counts.items(), key=lambda pair: pair[1], reverse=True) print("10 most common words:\nWord\nCount") for word, count in sorted_counts[:10]: print("{}\t{}".format(word, count)) print("\n10 least common words:\nWord\tCount") for word, count in sorted_counts[-10:]: print("{}\t{}".format(word, count)) if __name__ == '__main__': test_run()
運行結果以下token
Word
Count
a 9
he 6
the 6
and 5
as 4
was 4
with 3
i 2of 2ip
his 2it
10 least common words:
Word Count
merry 1
word 1
or 1
slap 1
on 1
for 1
more 1
favoured 1
guests 1
1io
Process finished with exit code 0table