1.首先導入包html
import xgboost as xgb
2.使用如下的函數實現交叉驗證訓練xgboost。函數
bst_cvl = xgb.cv(xgb_params, dtrain, num_boost_round=50,
nfold=3, seed=0, feval=xg_eval_mae, maximize=False, early_stopping_rounds=10)
3.cv參數說明:函數cv的第一個參數是對xgboost訓練器的參數的設置,具體見如下學習
xgb_params = { 'seed': 0, 'eta': 0.1, 'colsample_bytree': 0.5, 'silent': 1, 'subsample': 0.5, 'objective': 'reg:linear', 'max_depth': 5, 'min_child_weight': 3 }
參數說明以下:優化
4.cv參數說明:dtrain是使用下面的函數DMatrix獲得的訓練集lua
dtrain = xgb.DMatrix(train_x, train_y)
5.cv參數說明:feval參數是自定義的偏差函數spa
def xg_eval_mae(yhat, dtrain): y = dtrain.get_label() return 'mae', mean_absolute_error(np.exp(y), np.exp(yhat))
6.cv參數說明:nfold是交叉驗證的折數, early_stopping_round是多少次模型沒有提高後就結束, num_boost_round是加入的決策樹的數目。.net
7. bst_cv是cv返回的結果,是一個DataFram的類型,其列爲如下列組成線程
8.自定義評價函數:具體見這個博客:https://blog.csdn.net/wl_ss/article/details/78685984code
def customedscore(preds, dtrain): label = dtrain.get_label() pred = [int(i>=0.5) for i in preds] confusion_matrixs = confusion_matrix(label, pred) recall =float(confusion_matrixs[0][0]) / float(confusion_matrixs[0][1]+confusion_matrixs[0][0]) precision = float(confusion_matrixs[0][0]) / float(confusion_matrixs[1][0]+confusion_matrixs[0][0]) F = 5*precision* recall/(2*precision+3*recall)*100 return 'FSCORE',float(F)
這種自定義的評價函數能夠用於XGboost的cv函數或者train函數中的feval參數htm
還有一種定義評價函數的方式,以下
def mae_score(y_ture, y_pred): return mean_absolute_error(y_true=np.exp(y_ture), y_pred=np.exp(y_pred))
這種定義的函數能夠用於gridSearchCV函數的scorning參數中。
首先初始化參數的值
xgb1 = XGBClassifier(max_depth=3, learning_rate=0.1, n_estimators=5000, silent=False, objective='binary:logistic', booster='gbtree', n_jobs=4, gamma=0, min_child_weight=1, subsample=0.8, colsample_bytree=0.8, seed=7)
用cv函數求得參數n_estimators的最優值。
cv_result = xgb.cv(xgb1.get_xgb_params(), dtrain, num_boost_round=xgb1.get_xgb_params()['n_estimators'], nfold=5, metrics='auc', early_stopping_rounds=50, callbacks=[xgb.callback.early_stop(50), xgb.callback.print_evaluation(period=1,show_stdv=True)])
param_grid = {'max_depth':[1,2,3,4,5], 'min_child_weight':[1,2,3,4,5]} grid_search = GridSearchCV(xgb1,param_grid,scoring='roc_auc',iid=False,cv=5) grid_search.fit(train[feature_name],train['label']) print('best_params:',grid_search.best_params_) print('best_score:',grid_search.best_score_)
首先將上面調好的參數設置好,以下所示
xgb1 = XGBClassifier(max_depth=2, learning_rate=0.1, n_estimators=33, silent=False, objective='binary:logistic', booster='gbtree', n_jobs=4, gamma=0, min_child_weight=9, subsample=0.8, colsample_bytree=0.8, seed=7)
而後繼續網格調參
param_grid = {'gamma':[1,2,3,4,5,6,7,8,9]} grid_search = GridSearchCV(xgb1,param_grid,scoring='roc_auc',iid=False,cv=5) grid_search.fit(train[feature_name],train['label']) print('best_params:',grid_search.best_params_) print('best_score:',grid_search.best_score_)
param_grid = {'subsample':[i/10.0 for i in range(5,11)], 'colsample_bytree':[i/10.0 for i in range(5,11)]} grid_search = GridSearchCV(xgb1,param_grid,scoring='roc_auc',iid=False,cv=5) grid_search.fit(train[feature_name],train['label']) print('best_params:',grid_search.best_params_) print('best_score:',grid_search.best_score_)
param_grid = {'reg_lambda':[i/10.0 for i in range(1,11)]} grid_search = GridSearchCV(xgb1,param_grid,scoring='roc_auc',iid=False,cv=5) grid_search.fit(train[feature_name],train['label']) print('best_params:',grid_search.best_params_) print('best_score:',grid_search.best_score_)
最後咱們使用較低的學習率以及使用更多的決策樹,能夠用CV
來實現這一步驟
xgb1 = XGBClassifier(max_depth=2, learning_rate=0.01, n_estimators=5000, silent=False, objective='binary:logistic', booster='gbtree', n_jobs=4, gamma=2.1, min_child_weight=9, subsample=0.8, colsample_bytree=0.8, seed=7, )
具體的關於調參的知識請看如下連接:
https://www.cnblogs.com/TimVerion/p/11436001.html
http://www.pianshen.com/article/3311175716/