利用sklearn的Pipeline簡化建模過程

 

不少框架都會提供一種Pipeline的機制,經過封裝一系列操做的流程,調用時按計劃執行便可。好比netty中有ChannelPipeline,TensorFlow的計算圖也是如此。框架

下面簡要介紹sklearn中pipeline的使用:dom

from sklearn.pipeline import Pipeline from sklearn.preprocessing import OneHotEncoder from sklearn.impute import SimpleImputer from sklearn.compose import ColumnTransformer from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split # 定義類別型特徵預處理器
categorical_transformer=Pipeline(steps=[ ('imputer',SimpleImputer(strategy='most_frequent')), ('onehot',OneHotEncoder(handle_unknown='ignore')) ]) # 定義數值型特徵預處理器
numerical_transformer=SimpleImputer(strategy='constant') # 將類別與數值型特徵預處理器,分別應用於對應列上
preprocessor = ColumnTransformer( transformers=[ ('num', numerical_transformer, ['Age']), ('cat', categorical_transformer, ['Embarked']) ]) # 定義Pipeline,傳入預處理器與選擇的模型
my_pipeline=Pipeline(steps=[ ('preprocessor',preprocessor), ('model',RandomForestClassifier(n_estimators=100,random_state=0)) ]) # 使用pipeline
X_train,X_valid,y_train,y_valid=train_test_split(X,y,test_size=0.2,random_state=0) my_pipeline.fit(X_train.copy(),y_train.copy())# 訓練,預處理會改變原始數據,不想改變copy一下
preds=my_pipeline.predict(X_valid)# 預測
相關文章
相關標籤/搜索