不少框架都會提供一種Pipeline的機制,經過封裝一系列操做的流程,調用時按計劃執行便可。好比netty中有ChannelPipeline,TensorFlow的計算圖也是如此。框架
下面簡要介紹sklearn中pipeline的使用:dom
from sklearn.pipeline import Pipeline from sklearn.preprocessing import OneHotEncoder from sklearn.impute import SimpleImputer from sklearn.compose import ColumnTransformer from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split # 定義類別型特徵預處理器 categorical_transformer=Pipeline(steps=[ ('imputer',SimpleImputer(strategy='most_frequent')), ('onehot',OneHotEncoder(handle_unknown='ignore')) ]) # 定義數值型特徵預處理器 numerical_transformer=SimpleImputer(strategy='constant') # 將類別與數值型特徵預處理器,分別應用於對應列上 preprocessor = ColumnTransformer( transformers=[ ('num', numerical_transformer, ['Age']), ('cat', categorical_transformer, ['Embarked']) ]) # 定義Pipeline,傳入預處理器與選擇的模型 my_pipeline=Pipeline(steps=[ ('preprocessor',preprocessor), ('model',RandomForestClassifier(n_estimators=100,random_state=0)) ]) # 使用pipeline X_train,X_valid,y_train,y_valid=train_test_split(X,y,test_size=0.2,random_state=0) my_pipeline.fit(X_train.copy(),y_train.copy())# 訓練,預處理會改變原始數據,不想改變copy一下 preds=my_pipeline.predict(X_valid)# 預測