[Bash]LeetCode192. 統計詞頻 | Word Frequency

★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
➤微信公衆號:山青詠芝(shanqingyongzhi)
➤博客園地址:山青詠芝(https://www.cnblogs.com/strengthen/
➤GitHub地址:https://github.com/strengthen/LeetCode
➤原文地址:http://www.javashuo.com/article/p-bhjmwdak-md.html 
➤若是連接不是山青詠芝的博客園地址,則多是爬取做者的文章。
➤原文已修改更新!強烈建議點擊原文地址閱讀!支持做者!支持原創!
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★html

Write a bash script to calculate the frequency of each word in a text file words.txt.git

For simplicity sake, you may assume:github

  • words.txt contains only lowercase characters and space ' ' characters.
  • Each word must consist of lowercase characters only.
  • Words are separated by one or more whitespace characters.

Example:bash

Assume that words.txt has the following content:微信

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:spa

the 4
is 3
sunny 2
day 1

Note:code

  • Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.
  • Could you write it in one-line using Unix pipes?

寫一個 bash 腳本以統計一個文本文件 words.txt 中每一個單詞出現的頻率。htm

爲了簡單起見,你能夠假設:blog

  • words.txt只包括小寫字母和 ' ' 。
  • 每一個單詞只由小寫字母組成。
  • 單詞間由一個或多個空格字符分隔。

示例:排序

假設 words.txt 內容以下:

the day is sunny the the
the sunny is is

你的腳本應當輸出(以詞頻降序排列):

the 4
is 3
sunny 2
day 1

說明:

  • 不要擔憂詞頻相同的單詞的排序問題,每一個單詞出現的頻率都是惟一的。
  • 你能夠使用一行 Unix pipes 實現嗎?

4ms

1 # Read from the file words.txt and output the word frequency list to stdout.
2 cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1 }'

8ms

1 # Read from the file words.txt and output the word frequency list to stdout.
2 awk '{
3     for (i = 1; i <= NF; ++i) ++s[$i];
4 } END {
5     for (i in s) print i, s[i];
6 }' words.txt | sort -nr -k 2

16ms

1 # Read from the file words.txt and output the word frequency list to stdout.
2 
3 # try 1
4 sed 's/ \{1,\}/\n/g' words.txt | sed '/^$/d' | sort | uniq -c | sort -nr | awk '{print $2,$1}'
相關文章
相關標籤/搜索