【python cookbook】找出序列中出現次數最多的元素

時間 2020-06-02

標籤 python cookbook 找出序列出現次數最多元素欄目 Python 简体版

原文原文鏈接

問題

《Python Cookbook》中有這麼一個問題，給定一個序列，找出該序列出現次數最多的元素。
例如：html

words = [ 'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes', 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the', 'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into', 'my', 'eyes', "you're", 'under' ]

統計出words中出現次數最多的元素？python

初步探討

一、collections模塊的Counter類
首先想到的是collections模塊的Counter類，具體用法看這裏！具體用法看這裏！具體用法看這裏！https://docs.python.org/3.6/l...，重要的事情強調三遍。centos

from collections import Counter words = [ 'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes', 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the', 'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into', 'my', 'eyes', "you're", 'under' ] counter_words = Counter(words) print(counter_words) most_counter = counter_words.most_common(1) print(most_counter)

關於most_common([n])：dom

二、根據dict鍵值惟一性和sorted()函數函數

import operator words = [ 'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes', 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the', 'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into', 'my', 'eyes', "you're", 'under' ] dict_num = {} for item in words: if item not in dict_num.keys(): dict_num[item] = words.count(item) # print(dict_num)
 most_counter = sorted(dict_num.items(),key=lambda x: x[1],reverse=True)[0] print(most_counter)

sorted函數：
傳送門：https://docs.python.org/3.6/l...性能

iterable：可迭代類型；
key：用列表元素的某個屬性或函數進行做爲關鍵字，有默認值，迭代集合中的一項;
reverse：排序規則. reverse = True 降序或者 reverse = False 升序，有默認值。
返回值：是一個通過排序的可迭代類型，與iterable同樣。測試

這裏，咱們使用匿名函數key=lambda x: x[1]
等同於:centos7

def key(x): return x[1]

這裏，咱們利用每一個元素出現的次數進行降序排序，獲得的結果的第一項就是出現元素最多的項。spa

更進一步

這裏給出的序列很簡單，元素的數目不多，可是有時候，咱們的列表中可能存在上百萬上千萬個元素，那麼在這種狀況下，不一樣的解決方案是否是效率就會有很大差異了呢？
爲了驗證這個問題，咱們來生成一個隨機數列表，元素個數爲一百萬個。
這裏使用numpy Package,使用前，咱們須要安裝該包，numpy包下載地址：https://pypi.python.org/pypi/...。這裏咱們環境是centos7，選擇numpy-1.14.2.zip (md5, pgp)進行下載安裝，解壓後python setup.py install.net

def generate_data(num=1000000): return np.random.randint(num / 10, size=num)

np.random.randint(low[, high, size]) 返回隨機的整數，位於半開區間 [low, high)
具體用法參考https://pypi.python.org/pypi

OK,數據生成了，讓咱們來測試一下兩個方法所消耗的時間,統計時間，咱們用time函數就能夠。

#!/usr/bin/python # coding=utf-8 # # File: most_elements.py # Author: ralap # Data: 2018-4-5 # Description: find most elements in list # 
from collections import Counter import operator import numpy as np import random import time def generate_data(num=1000000): return np.random.randint(num / 10, size=num) def collect(test_list): counter_words = Counter(test_list) print(counter_words) most_counter = counter_words.most_common(1) print(most_counter) def list_to_dict(test_list): dict_num = {} for item in test_list: if item not in dict_num.keys(): dict_num[item] = test_list.count(item) most_counter = sorted(dict_num.items(), key=lambda x: x[1], reverse=True)[0] print(most_counter) if __name__ == "__main__": list_value = list(generate_data()) t1 = time.time() collect(list_value) t2 = time.time() print("collect took: %sms" % (t2 - t1)) t1 = t2 list_to_dict(list_value) t2 = time.time() print("list_to_dict took: %sms" % (t2 - t1))