機器學習之路：python 字典特徵提取器 DictVectorizer

python3 學習使用apipython

將字典類型數據結構的樣本，抽取特徵，轉化成向量形式git

源碼git: https://github.com/linyi0604/MachineLearninggithub

代碼：api

 1 from sklearn.feature_extraction import DictVectorizer  2 
 3 '''
 4 字典特徵提取器：  5  將字典數據結構抽和向量化  6  類別類型特徵藉助原型特徵名稱採用0 1 二值方式進行向量化  7  數值類型特徵保持不變  8 '''
 9 
10 # 定義一個字典列表 用來表示多個數據樣本
11 measurements = [ 12     {"city": "Dubai", "temperature": 33.0}, 13     {"city": "London", "temperature": 12.0}, 14     {"city": "San Fransisco", "temperature": 18.0}, 15 ] 16 
17 # 初始化字典特徵抽取器
18 vec = DictVectorizer() 19 data = vec.fit_transform(measurements).toarray() 20 # 查看提取後的特徵值
21 print(data) 22 '''
23 [[ 1. 0. 0. 33.] 24  [ 0. 1. 0. 12.] 25  [ 0. 0. 1. 18.]] 26 '''
27 # 查看提取後特徵的含義
28 print(vec.get_feature_names()) 29 '''
30 ['city=Dubai', 'city=London', 'city=San Fransisco', 'temperature'] 31 '''