NLTK vs SKLearn vs Gensim vs TextBlob vs spaCy

Generally, 
  • NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)
  • Sklearn is used primarily for machine learning (classification, clustering, etc.)
  • Gensim is used primarily for topic modeling and document similarity.
Having said that, NLTK provides a nice wrapper for Sklearn's classifiers - 
nltk.classify package
Combining Scikit-Learn and NTLK
Python NLP - NLTK and scikit-learn

And, to confuse you further, there also exist TextBlob: Simplified Text Processing

and spaCy.io | Build Tomorrow's Language Technologies - 
aiming to give industry-ready NLP modules instead of NLTK,
including a single quick algorithm for each of tokenization, POS tagging and parsing and word vectors for similarity calculation.

I suggest that you mix and match, according to your needs.

一般,
NLTK主要用於通常NLP任務(標記化,POS標記,解析等)
Sklearn主要用於機器學習(分類,聚類等)
Gensim主要用於主題建模和文檔類似性。
話雖如此,NLTK爲Sklearn的分類器提供了一個很好的包裝器 -
nltk.classify包
結合Scikit-Learn和NTLK
Python NLP - NLTK和scikit學習html

並且,更爲混淆的是,還有TextBlob:簡化文本處理python

spaCy.io | 構建明天的語言技術 -
旨在提供行業準備的NLP模塊而不是NLTK,
包括用於每一個標記化,POS標記和解析的單個快速算法和用於類似性計算的字矢量。算法

我建議你根據你的須要混合搭配。api

相關文章
相關標籤/搜索