NLTK vs SKLearn vs Gensim vs TextBlob vs spaCy

時間 2019-11-10

標籤 nltk sklearn gensim textblob spacy 简体版

原文原文鏈接

Generally,

NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)
Sklearn is used primarily for machine learning (classification, clustering, etc.)
Gensim is used primarily for topic modeling and document similarity.

Having said that, NLTK provides a nice wrapper for Sklearn's classifiers -
nltk.classify package
Combining Scikit-Learn and NTLK
Python NLP - NLTK and scikit-learn

And, to confuse you further, there also exist TextBlob: Simplified Text Processing

and spaCy.io | Build Tomorrow's Language Technologies -
aiming to give industry-ready NLP modules instead of NLTK,
including a single quick algorithm for each of tokenization, POS tagging and parsing and word vectors for similarity calculation.

I suggest that you mix and match, according to your needs.

一般，
NLTK主要用於通常NLP任務（標記化，POS標記，解析等）
Sklearn主要用於機器學習（分類，聚類等）
Gensim主要用於主題建模和文檔類似性。
話雖如此，NLTK爲Sklearn的分類器提供了一個很好的包裝器 -
nltk.classify包
 結合Scikit-Learn和NTLK
Python NLP - NLTK和scikit學習html

並且，更爲混淆的是，還有TextBlob：簡化文本處理python

和spaCy.io | 構建明天的語言技術 -
旨在提供行業準備的NLP模塊而不是NLTK，
包括用於每一個標記化，POS標記和解析的單個快速算法和用於類似性計算的字矢量。算法

我建議你根據你的須要混合搭配。api

相關標籤/搜索