python 去停用詞

Try caching the stopwords object, as shown below. Constructing this each time you call the function seems to be the bottleneck.python

 from nltk.corpus import stopwords cachedStopWords = stopwords.words("english") def testFuncOld(): text = 'hello bye the the hi' text = ' '.join([word for word in text.split() if word not in stopwords.words("english")]) def testFuncNew(): text = 'hello bye the the hi' text = ' '.join([word for word in text.split() if word not in cachedStopWords]) if __name__ == "__main__": for i in xrange(10000): testFuncOld() testFuncNew()

I ran this through the profiler: python -m cProfile -s cumulative test.py. The relevant lines are posted below.post

nCalls Cumulative Timethis

10000 7.723 words.py:7(testFuncOld)spa

10000 0.140 words.py:11(testFuncNew)code

So, caching the stopwords instance gives a ~70x speedup.it

相關文章
相關標籤/搜索