符號分詞和詞頻統計

如今有一段文本ide

As I was waiting, a man came out of a side room, and at a glance I was sure he must be Long John. His left leg was cut off close by the hip, and under the left shoulder he carried a crutch, which he managed with wonderful dexterity, hopping about upon it like a bird. He was very tall and strong, with a face as big as a ham—plain and pale, but intelligent and smiling. Indeed, he seemed in the most cheerful spirits, whistling as he moved about among the tables, with a merry word or a slap on the shoulder for the more favoured of his guests.

 

我就是想看看 裏面的詞的高頻和低頻spa

 

我須要作兩件事情code

1. 先分詞,分詞咱們就按照標點和空格來分orm

2. 接着統計詞頻blog

 

import re
from collections import Counter


def count_words(text):
    """Count """
    counts = dict()
    # convert to lower case
    text_lower = text.lower()
    tokens = re.split('\W+', text_lower)
    counts = Counter(tokens)
    return counts


def test_run():
    with open("text.txt", "r") as f:
        text = f.read()
        counts = count_words(text)
        sorted_counts = sorted(counts.items(), key=lambda pair: pair[1], reverse=True)

        print("10 most common words:\nWord\nCount")
        for word, count in sorted_counts[:10]:
            print("{}\t{}".format(word, count))

        print("\n10 least common words:\nWord\tCount")
        for word, count in sorted_counts[-10:]:
            print("{}\t{}".format(word, count))


if __name__ == '__main__':
    test_run()

運行結果以下token

Word
Count
a 9
he 6
the 6
and 5
as 4
was 4
with 3
i 2of 2ip

his 2it

10 least common words:
Word Count
merry 1
word 1
or 1
slap 1
on 1
for 1
more 1
favoured 1
guests 1
1io

Process finished with exit code 0table

相關文章
相關標籤/搜索