1、pipeline的用法數組
pipeline能夠用於把多個estimators級聯成一個estimator,這麼 作的緣由是考慮了數據處理過程當中一系列先後相繼的固定流程,好比feature selection->normalization->classificationdom
pipeline提供了兩種服務:函數
注意:Pipleline中最後一個以外的全部estimators都必須是變換器(transformers),最後一個estimator能夠是任意類型(transformer,classifier,regresser)工具
若是最後一個estimator是個分類器,則整個pipeline就能夠做爲分類器使用,若是最後一個estimator是個聚類器,則整個pipeline就能夠做爲聚類器使用。ui
from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline estimator=[('pca', PCA()), ('clf', LogisticRegression()) ] pipe=Pipeline(estimator) print(pipe) #Pipeline(steps=[('pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,svd_solver='auto', tol=0.0, whiten=False)), ('clf', LogisticRegression(C=1.0, class_weight=None, dual=False,fit_intercept=True,intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,penalty='l2', random_state=None, solver='liblinear', tol=0.0001,verbose=0, warm_start=False))]) print(pipe.steps[0]) #('pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None, svd_solver='auto', tol=0.0, whiten=False)) print(pipe.named_steps['pca']) #PCA(copy=True, iterated_power='auto', n_components=None, random_state=None, svd_solver='auto', tol=0.0, whiten=False)
在pipeline中estimator的參數經過使用<estimator>__<parameter>語法來獲取spa
#修改參數並打印輸出 print(pipe.set_params(clf__C=10)) #Pipeline(steps=[('pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,svd_solver='auto', tol=0.0, whiten=False)), ('clf', LogisticRegression(C=10, class_weight=None, dual=False,fit_intercept=True,intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,penalty='l2', random_state=None, solver='liblinear', tol=0.0001,verbose=0, warm_start=False))])
既然有參數的存在,就可使用網格搜索方法來調節參數.net
from sklearn.model_selection import GridSearchCV params=dict(pca__n_components=[2,5,10],clf__C=[0,1,10,100]) grid_research=GridSearchCV(pipe,param_grid=params)
單個階段(step)能夠用參數替換,並且非最後階段還能夠將其設置爲None來忽略:code
from sklearn.linear_model import LogisticRegression params=dict(pca=[None,PCA(5),PCA(10)],clf=[SVC(),LogisticRegression()], clf_C=[0.1,10,100]) grid_research=GridSearchCV(pipe,param_grid=params)
函數make_pipeline是一個構造pipeline的簡短工具,他接受可變數量的estimators並返回一個pipeline,每一個estimator的名稱自動填充。component
from sklearn.pipeline import make_pipeline from sklearn.naive_bayes import MultinomialNB from sklearn.preprocessing import Binarizer print(make_pipeline(Binarizer(),MultinomialNB())) #Pipeline(steps=[('binarizer', Binarizer(copy=True, threshold=0.0)), ('multinomialnb', MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))])
FeatureUnion:composite(組合)feature spacesorm
FeatureUnion把若干個transformer objects組合成一個新的transformer,這個新的transformer組合了他們的輸出,一個FeatureUnion對象接受一個transformer對象列表
2、FeatureUnion 的用法
from sklearn.pipeline import FeatureUnion from sklearn.decomposition import PCA from sklearn.decomposition import KernelPCA estimators=[('linear_pca',PCA()),('kernel_pca',KernelPCA())] combined=FeatureUnion(estimators) print(combined) #FeatureUnion(n_jobs=1, transformer_list=[('linear_pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None, svd_solver='auto', tol=0.0, whiten=False)), ('kernel_pca', KernelPCA(alpha=1.0, coef0=1, copy_X=True, degree=3, eigen_solver='auto', fit_inverse_transform=False, gamma=None, kernel='linear', kernel_params=None, max_iter=None, n_components=None, n_jobs=1, random_state=None, remove_zero_eig=False, tol=0))],transformer_weights=None)
與pipeline相似,feature union也有一種比較簡單的構造方法:make_union,不須要顯示的給每一個estimator指定名稱。
Featu熱Union設置參數
#修改參數 print(combined.set_params(kernel_pca=None)) #FeatureUnion(n_jobs=1,transformer_list=[('linear_pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,svd_solver='auto', tol=0.0, whiten=False)), ('kernel_pca', None)],transformer_weights=None)
另一篇講pipleline不錯的文章:http://blog.csdn.net/lanchunhui/article/details/50521648