Examined ensemble methods
- Averaging (or blending)
- Weighted averaging
- Conditional averaging
- Bagging
- Boosting
- Stacking
- StackNet
Averaging ensemble methods
舉個例子,假設咱們有一個名爲age的變量,就像年齡同樣,咱們試着預測它。咱們有兩個模型:python
低於50,模型效果更好
git
高於50,模型效果更好
github
那麼若是咱們試圖結合它們將會發生什麼呢?算法
Averaging(or blending)
app
- (model1 + model2) / 2

$R^2$上升到0.95,較以前有所改善。但該模型並無比單模型作的好的地方更好,儘管如此,它平均表現更好。也許可能會有更好的組合呢?來試試加權平均框架
Weighted averaging
dom
- (model1 x 0.7 + model 2 x 0.3)

看起來沒有以前的好ide
Conditional averaging
工具
- 各取好的部分

理想狀況下,咱們但願獲得相似的結果性能
Bagging
Why Bagging
建模中有兩個主要偏差來源
- 1.因爲誤差而存在偏差(underfitting)
- 2.因爲方差而存在偏差(overfitting)
經過略微不一樣的模型,確保預測不會有讀取很是高的方差。這一般使它更具廣泛性。
Parameters that control bagging?
- Changing the seed
- Row(Sub) sampling or Bootstrapping
- Shuffling
- Column(Sub) sampling
- Model-specific parameters
- Number of models (or bags)
- (Optionally) parallelism
Examples of bagging

Boosting
Boosting是對每一個模型構建的模型進行加權平均的一種形式,順序地考慮之前的模型性能。
Weight based boosting

假設咱們有一個表格數據集,有四個特徵。 咱們稱它們爲x0,x1,x2和x3,咱們但願使用這些功能來預測目標變量y。
咱們將預測值稱爲pred,這些預測有必定的偏差。咱們能夠計算這些絕對偏差,|y - pred|
。咱們能夠基於今生成一個新列或向量,在這裏咱們建立一個權重列,使用1加上絕對偏差。固然有不一樣的方法來計算這個權重,如今咱們只是以此爲例。
全部接下來要作的是用這些特徵去擬合新的模型,但每次也要增長這個權重列。這就是按順序添加模型的方法。
Weight based boosting parameters
- Learning rate (or shrinkage or eta)
- 每一個模型只相信一點點:
predictionN = pred0*eta + pred1*eta + ... + predN*eta
- Number of estimators
- estimators擴大一倍,eta減少一倍
- Input model - can be anything that accepts weights
- Sub boosting type:
- AdaBoost-Good implementation in sklearn(python)
- LogitBoost-Good implementation in Weka(Java)
Residual based boosting [&]
咱們使用一樣的數據集作相同的事。預測出pred後

接下來會計算偏差

將error做爲新的y獲得新的預測new_pred

以Rownum=1爲例:
最終預測=0.75 + 0.20 = 0.95更接近於1
這種方法頗有效,能夠很好的減少偏差。
Residual based boosting parameters
- Learning rate (or shrinkage or eta)
predictionN = pred0 + pred1*eta + ... + predN*eta
- 前面的例子,若是eta爲0.1,則Prediction=0.75 + 0.2*(0.1) = 0.77
- Number of estimators
- Row (sub)sampling
- Column (sub)sampling
- Input model - better be trees.
- Sub boosting type:
- Full gradient based
- Dart
Residual based favourite implementations
- Xgboost
- Lightgbm
- H2O's GBM
- Catboost
- Sklearn's GBM
Stacking
Methodology
- Wolpert in 1992 introduced stacking. It involves:
-
- Splitting the train set into two disjoint sets.
-
- Train several base learners on the first part.
-
- Make predictions with the base learners on the second (validation) part.
具體步驟
假設有A,B,C三個數據集,其中A,B的目標變量y已知。

而後
Stacking example
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn.model_selection import train_test_split
train = '' # your training set
y = '' # your target variable
# split train data in 2 part, training and valdiation.
training, valid, ytraining, yvalid = train_test_split(train, y, test_size=0.5)
# specify models
model1 = RandomForestRegressor()
model2 = LinearRegression()
#fit models
model1.fit(training, ytraining)
model2.fit(trainging, ytraining)
# make predictions for validation
preds1 = model1.predict(valid)
preds2 = model2.predict(valid)
# make predictions for test data
test_preds1 = model1.predict(test)
test_preds2 = model2.predict(test)
# From a new dataset for valid and test via stacking the predictions
stacked_predictions = np.colum_stack((preds1, preds2))
stacked_test_predictions = np.column_stack((test_preds1, test_preds2))
# specify meta model
meta_model = LinearRegression()
meta_model.fit(stacked_predictions, yvalid)
# make predictions on the stacked predictions of the test data
final_predictions = meta_model.predict(stacked_test_predictions)
Stacking(past) example

能夠看到,它與咱們使用Conditional averaging
的結果很是近似。只是在50附件作的不夠好,這是有道理的,由於模型沒有見到目標變量,沒法準確識別出50這個缺口。因此它只是嘗試根據模型的輸入來肯定。
Things to be mindful of
- With time sensitive data - respect time
- 若是你的數據帶有時間元素,你須要指定你的stacking,以便尊重時間。
- Diversity as important as performance
- 單一模型表現很重要,但模型的多樣性也很是重要。當模型是壞的或弱的狀況,你不需太擔憂,stacking實際上能夠從每一個預測中提取到精華,獲得好的結果。所以,你真正須要關注的是,我正在製做的模型能給我帶來哪些信息,即便它一般很弱。
- Diversity may come from:
- Different algorithms
- Different input features
- Performance plateauing after N models
- Meta model is normally modest
StackNet
https://github.com/kaz-Anova/StackNet
Ensembling Tips and Tricks
$1^{st}$ level tips
- Diversity based on algorithms:
- 2-3 gradient boosted trees (lightgbm, xgboost, H2O, catboost)
- 2-3 Neural nets (keras, pytorch)
- 1-2 ExtraTrees/RandomForest (sklearn)
- 1-2 linear models as in logistic/ridge regression, linear svm (sklearn)
- 1-2 knn models (sklearn)
- 1 Factorization machine (libfm)
- 1 svm with nonlinear kernel(like RBF) if size/memory allows (sklearn)
- Diversity based on input data:
- Categorical features: One hot, label encoding, target encoding, likelihood encoding, frequency or counts
- Numerical features: outliers, binning, derivatives, percentiles, scaling
- Interactions: col1*/+-col2, groupby, unsupervised
$2^{st}$ level tips
Additional materials
