這篇博客主要內容:python
sklearn.feature_extraction
做用:對字典數據進行特徵值化api
# 數據 [{'city': '北京','temperature':100} {'city': '上海','temperature':60} {'city': '深圳','temperature':30}]
# 代碼 from sklearn.feature_extraction import DictVectorizer def dict_demo(): data = [{'city': '北京','temperature':100}, {'city': '上海','temperature':60}, {'city': '深圳','temperature':30}] # 一、實例化一個轉換器類 transfer = DictVectorizer(sparse=False) # 二、調用fit_transform data_new = transfer.fit_transform(data) print("data_new:\n",data_new) # 打印特徵名字 print("特徵名字:\n",transfer.get_feature_names()) return None
注意DictVectorizer
默認是true,輸出爲稀疏矩陣,false輸出爲普通矩陣數組
做用:對文本數據進行特徵值化spa
sklearn.feature_extraction.text.CountVectorizer(stop_words=[])code
CountVectorizer.fit_transform(X) X:文本或者包含文本字符串的可迭代對象 返回值:返回sparse矩陣orm
CountVectorizer.inverse_transform(X) X:array數組或者sparse矩陣 返回值:轉換以前數據格對象
CountVectorizer.get_feature_names() 返回值:單詞列表blog
sklearn.feature_extraction.text.TfidfVectorizerci
# 數據 ["life is short,i like python", "life is too long,i dislike python"]
# 代碼 from sklearn.feature_extraction.text import CountVectorizer def count_demo(): data = ["life is short,i like like python", "life is too long,i dislike python"] transfer = CountVectorizer() data_new = transfer.fit_transform(data) print("data_new:\n",data_new.toarray()) print("特徵名字:\n",transfer.get_feature_names()) return None
注意代碼中的使用了toarray()
,能夠不加這個方法,再運行一下看看📑字符串