sklearn 中的 Pipeline 機制 和FeatureUnion

1、pipeline的用法數組

pipeline能夠用於把多個estimators級聯成一個estimator,這麼 作的緣由是考慮了數據處理過程當中一系列先後相繼的固定流程,好比feature selection->normalization->classificationdom

pipeline提供了兩種服務:函數

  • Convenience:只須要調用一次fit和predict就能夠在數據集上訓練一組estimators
  • Joint parameter selection能夠把grid search 用在pipeline中全部的estimators參數的參數組合上面

注意:Pipleline中最後一個以外的全部estimators都必須是變換器(transformers),最後一個estimator能夠是任意類型(transformer,classifier,regresser)工具

若是最後一個estimator是個分類器,則整個pipeline就能夠做爲分類器使用,若是最後一個estimator是個聚類器,則整個pipeline就能夠做爲聚類器使用ui

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression

from sklearn.pipeline import Pipeline

estimator=[('pca', PCA()),
           ('clf', LogisticRegression())
           ]
pipe=Pipeline(estimator)
print(pipe)
#Pipeline(steps=[('pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,svd_solver='auto', tol=0.0, whiten=False)), ('clf', LogisticRegression(C=1.0, class_weight=None, dual=False,fit_intercept=True,intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,penalty='l2', random_state=None, solver='liblinear', tol=0.0001,verbose=0, warm_start=False))])
print(pipe.steps[0])
#('pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,  svd_solver='auto', tol=0.0, whiten=False))
print(pipe.named_steps['pca'])
#PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,  svd_solver='auto', tol=0.0, whiten=False)

在pipeline中estimator的參數經過使用<estimator>__<parameter>語法來獲取spa

#修改參數並打印輸出
print(pipe.set_params(clf__C=10))
#Pipeline(steps=[('pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,svd_solver='auto', tol=0.0, whiten=False)), ('clf', LogisticRegression(C=10, class_weight=None, dual=False,fit_intercept=True,intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,penalty='l2', random_state=None, solver='liblinear', tol=0.0001,verbose=0, warm_start=False))])

既然有參數的存在,就可使用網格搜索方法來調節參數.net

from sklearn.model_selection import GridSearchCV
params=dict(pca__n_components=[2,5,10],clf__C=[0,1,10,100])
grid_research=GridSearchCV(pipe,param_grid=params)

單個階段(step)能夠用參數替換,並且非最後階段還能夠將其設置爲None來忽略:code

from sklearn.linear_model import LogisticRegression
params=dict(pca=[None,PCA(5),PCA(10)],clf=[SVC(),LogisticRegression()],
            clf_C=[0.1,10,100])
grid_research=GridSearchCV(pipe,param_grid=params)

函數make_pipeline是一個構造pipeline的簡短工具,他接受可變數量的estimators並返回一個pipeline,每一個estimator的名稱自動填充。component

from sklearn.pipeline import make_pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import Binarizer
print(make_pipeline(Binarizer(),MultinomialNB()))

#Pipeline(steps=[('binarizer', Binarizer(copy=True, threshold=0.0)), ('multinomialnb', MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))])

FeatureUnion:composite(組合)feature spacesorm

FeatureUnion把若干個transformer objects組合成一個新的transformer,這個新的transformer組合了他們的輸出,一個FeatureUnion對象接受一個transformer對象列表

2、FeatureUnion 的用法

from sklearn.pipeline import FeatureUnion
from sklearn.decomposition import PCA
from sklearn.decomposition import KernelPCA
estimators=[('linear_pca',PCA()),('kernel_pca',KernelPCA())]
combined=FeatureUnion(estimators)
print(combined)

#FeatureUnion(n_jobs=1, transformer_list=[('linear_pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,  svd_solver='auto', tol=0.0, whiten=False)), ('kernel_pca', KernelPCA(alpha=1.0, coef0=1, copy_X=True, degree=3, eigen_solver='auto',     fit_inverse_transform=False, gamma=None, kernel='linear',     kernel_params=None, max_iter=None, n_components=None, n_jobs=1,  random_state=None, remove_zero_eig=False, tol=0))],transformer_weights=None)

與pipeline相似,feature union也有一種比較簡單的構造方法:make_union,不須要顯示的給每一個estimator指定名稱。

 Featu熱Union設置參數

#修改參數
print(combined.set_params(kernel_pca=None))

#FeatureUnion(n_jobs=1,transformer_list=[('linear_pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,svd_solver='auto', tol=0.0, whiten=False)), ('kernel_pca', None)],transformer_weights=None)

 另一篇講pipleline不錯的文章:http://blog.csdn.net/lanchunhui/article/details/50521648

相關文章
相關標籤/搜索