python中sorted()和set()去重，排序

前言html

在看一個聊天機器人的神經網絡模型訓練前準備訓練數據，須要對訓練材料作處理（轉化成張量）須要先提煉詞幹，而後對詞幹作去重和排序python

words = sorted(list(set(words)))

對這三個方法作一下整理：網絡

1.set()app

語法：set([iterable])dom

參數：可迭代對象（可選），a sequence (string, tuple, etc.) or collection (list, set, dictionary, etc.) or an iterator object to be converted into a set函數

返回值：set集合spa

做用：去重，由於set集合的本質是無序，不重複的集合。因此轉變爲set集合的過程就是去重的過程rest

 1 # empty set
 2 print(set())  3 
 4 # from string
 5 print(set('google'))  6 
 7 # from tuple  8 print(set(('a', 'e', 'i', 'o', 'u')))  9 
10 # from list 11 print(set(['g', 'o', 'o', 'g', 'l', 'e'])) 
12 
13 # from range 14 print(set(range(5)))

運行結果：code

set() {'o', 'G', 'l', 'e', 'g'} {'a', 'o', 'e', 'u', 'i'} {'e', 'g', 'l', 'o'} {0, 1, 2, 3, 4}

2.sorted()htm

語法：sorted(iterable[, key][, reverse])

參數：

iterable 可迭代對象，- sequence (string, tuple, list) or collection (set, dictionary, frozen set) or any iterator

reverse 反向（可選），If true, the sorted list is reversed (or sorted in Descending order)

key （可選），function that serves as a key for the sort comparison

返回值：a sorted list 一個排好序的列表

示例1：排序

# vowels list pyList = ['e', 'a', 'u', 'o', 'i'] print(sorted(pyList)) # string pyString = 'Python' print(sorted(pyString)) # vowels tuple pyTuple = ('e', 'a', 'u', 'o', 'i') print(sorted(pyTuple))

結果：

['a', 'e', 'i', 'o', 'u'] ['P', 'h', 'n', 'o', 't', 'y'] ['a', 'e', 'i', 'o', 'u']

示例2：反向排序

# set pySet = {'e', 'a', 'u', 'o', 'i'} print(sorted(pySet, reverse=True)) # dictionary pyDict = {'e': 1, 'a': 2, 'u': 3, 'o': 4, 'i': 5} print(sorted(pyDict, reverse=True)) # frozen set pyFSet = frozenset(('e', 'a', 'u', 'o', 'i')) print(sorted(pyFSet, reverse=True))

結果：

['u', 'o', 'i', 'e', 'a'] ['u', 'o', 'i', 'e', 'a'] ['u', 'o', 'i', 'e', 'a']

示例3：指定key parameter排序

 1 # take second element for sort  2 def takeSecond(elem):  3     return elem[1]  4 
 5 # random list  6 random = [(2, 2), (3, 4), (4, 1), (1, 3)]  7 
 8 # sort list with key  9 sortedList = sorted(random, key=takeSecond) 10 
11 # print list 12 print('Sorted list:', sortedList)

結果：

Sorted list: [(4, 1), (2, 2), (1, 3), (3, 4)]

值得一提的是，sort()和sorted()的區別：

sort 是應用在 list 上的方法（list.sort()），sorted 能夠對全部可迭代的對象進行排序操做（sorted(iterable)）。
list 的 sort 方法返回的是對已經存在的列表進行操做，無返回值，而內建函數 sorted 方法返回的是一個新的 list，而不是在原來的基礎上進行的操做。

在瞭解這幾個函數的過程當中，發現了一個博友的文章，關於校招題目的，摘其中一道題以下：

原文連接：http://www.cnblogs.com/klchang/p/4752441.html

用python實現統計一篇英文文章內每一個單詞的出現頻率，並返回出現頻率最高的前10個單詞及其出現次數，並解答如下問題？（標點符號可忽略）

答案以下：

 1 def findTopFreqWords(filename, num=1):  2     'Find Top Frequent Words:'
 3     fp = open(filename, 'r')  4     text = fp.read()  5  fp.close()  6 
 7     lst = re.split('[0-9\W]+', text)  8 
 9     # create words set, no repeat 10     words = set(lst) 11     d = {} 12     for word in words: 13         d[word] = lst.count(word) 14     del d[''] 15     
16     result = [] 17     for key, value in sorted(d.iteritems(), key=lambda (k,v): (v,k),reverse=True): 18  result.append((key, value)) 19     return result[:num] 20 
21 def test(): 22     topWords = findTopFreqWords('test.txt',10) 23  print topWords 24 
25 if __name__=='__main__': 26     test()

使用的 test.txt 內容以下，

3.1 Accessing Text from the Web and from Disk Electronic Books A small sample of texts from Project Gutenberg appears in the NLTK corpus collection. However, you may be interested in analyzing other texts from Project Gutenberg. You can browse the catalog of 25,000 free online books at http://www.gutenberg.org/catalog/, and obtain a URL to an ASCII text file. Although 90% of the texts in Project Gutenberg are in English, it includes material in over 50 other languages, including Catalan, Chinese, Dutch, Finnish, French, German, Italian,