python 去停用詞

時間 2019-11-10

標籤 python 用詞欄目 Python 简体版

原文原文鏈接

Try caching the stopwords object, as shown below. Constructing this each time you call the function seems to be the bottleneck.python

 from nltk.corpus import stopwords cachedStopWords = stopwords.words("english") def testFuncOld(): text = 'hello bye the the hi' text = ' '.join([word for word in text.split() if word not in stopwords.words("english")]) def testFuncNew(): text = 'hello bye the the hi' text = ' '.join([word for word in text.split() if word not in cachedStopWords]) if __name__ == "__main__": for i in xrange(10000): testFuncOld() testFuncNew()

I ran this through the profiler: python -m cProfile -s cumulative test.py. The relevant lines are posted below.post

nCalls Cumulative Timethis

10000 7.723 words.py:7(testFuncOld)spa

10000 0.140 words.py:11(testFuncNew)code

So, caching the stopwords instance gives a ~70x speedup.it

相關標籤/搜索