lightgbm的sklearn接口和原生接口參數詳細說明及調參指點

class lightgbm.LGBMClassifier(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=200000, objective=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=1, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, **kwargs)
 
 
boosting_type
default="gbdt"

"gbdt":Gradient Boosting Decision Treehtml

"dart":Dropouts meet Multiple Additive Regression Trees數組

"goss":Gradient-based One-Side Samplingdom

"rf": Random Forestide

 
num_leaves  (intoptional (default=31)) 每一個基學習器的最大葉子節點 <=2^max_depth
max_depth  (intoptional (default=-1)) 每一個基學習器的最大深度, -1 means no limit 當模型過擬合,首先下降max_depth 
learning_rate (floatoptional (default=0.1))  Boosting learning rate  
n_estimators (intoptional (default=10))   基學習器的數量  
max_bin  (intoptional (default=255)) feature將存入的bin的最大數量,應該是直方圖的k值  
subsample_for_bin  (intoptional (default=50000)) Number of samples for constructing bins  
objective  (stringcallable or Noneoptional (default=None))

default:函數

‘regression’ for LGBMRegressor, 學習

‘binary’ or ‘multiclass’ for LGBMClassifier, url

‘lambdarank’ for LGBMRanker.spa

 
min_split_gain  (floatoptional (default=0.)) 樹的葉子節點上進行進一步劃分所需的最小損失減小  
min_child_weight   (floatoptional (default=1e-3))
Minimum sum of instance weight(hessian) needed in a child(leaf)
 
 
min_child_samples 
(intoptional (default=20) 葉子節點具備的最小記錄數  
subsample 
(floatoptional (default=1.) 訓練時採樣必定比例的數據  
subsample_freq  (intoptional (default=1)) Frequence of subsample, <=0 means no enable  
colsample_bytree 
(floatoptional (default=1.) Subsample ratio of columns when constructing each tree  
reg_alpha
(floatoptional (default=0.))  L1 regularization term on weights  

reg_lambda
.net

 

   

(floatoptional (default=0.)) L2 regularization term on weights  

random_state翻譯

   

(int or Noneoptional (default=None)    
silent (booloptional (default=True))    
n_jobs  (intoptional (default=-1))    









        












   

 

 

 

 

 

 

 

 

 ######################################################################################################

下表對應了Faster Spread,better accuracy,over-fitting三種目的時,能夠調整的參數:

 

 

########################################################################################### 

類的屬性:

n_features_ int  特徵的數量
classes_ rray of shape = [n_classes]  類標籤數組(只針對分類問題)
n_classes_ int  類別數量   (只針對分類問題)
best_score_ dict or None  最佳擬合模型得分
best_iteration_ int or None  若是已經指定了early_stopping_rounds,則擬合模型的最佳迭代次數
objective_ string or callable  擬合模型時的具體目標
booster_ Booster 這個模型的Booster
evals_result_ dict or None  若是已經指定了early_stopping_rounds,則評估結果
feature_importances_ array of shape = [n_features] 特徵的重要性

 

 

 

 

 

 

 

 

 

 

 ###########################################################################################

類的方法: 

fit(X, y, sample_weight=None, init_score=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_metric='logloss', early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)

 

X array-like or sparse matrix of shape = [n_samplesn_features] 特徵矩陣
y array-like of shape = [n_samples] The target values (class labels in classification, real numbers in regression)
sample_weight  array-like of shape = [n_samples] or Noneoptional (default=None)) 樣本權重,能夠採用np.where設置
init_score array-like of shape = [n_samples] or Noneoptional (default=None)) Init score of training data
group array-like of shape = [n_samples] or Noneoptional (default=None) Group data of training data.
eval_set  list or Noneoptional (default=None)  A list of (X, y) tuple pairs to use as a validation sets for early-stopping
eval_names  list of strings or Noneoptional (default=None) Names of eval_set
eval_sample_weight  list of arrays or Noneoptional (default=None) Weights of eval data
eval_init_score  list of arrays or Noneoptional (default=None)  Init score of eval data
eval_group list of arrays or Noneoptional (default=None)  Group data of eval data
eval_metric stringlist of stringscallable or Noneoptional (default="logloss")  "mae","mse",...
early_stopping_rounds int or Noneoptional (default=None)  必定rounds,即中止迭代
verbose  booloptional (default=True)  
feature_name  list of strings or 'auto'optional (default="auto") If ‘auto’ and data is pandas DataFrame, data columns names are used
categorical_feature  list of strings or int, or 'auto'optional (default="auto") If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used
callbacks list of callback functions or Noneoptional (default=None)  



















###############################################################################################
  predict_proba(X, raw_score=False, num_iteration=0)
X  array-like or sparse matrix of shape = [n_samplesn_features] Input features matrix
raw_score booloptional (default=False) Whether to predict raw scores
num_iteration intoptional (default=0)  Limit number of iterations in the prediction; defaults to 0 (use all trees).
Returns predicted_probability  The predicted probability for each class for each sample.
Return type array-like of shape = [n_samples, n_classes]  

 

 

 

 

 

 

不平衡處理的參數:

1.一個簡單的方法是設置is_unbalance參數爲True或者設置scale_pos_weight,兩者只能選一個。 設置is_unbalance參數爲True時會把負樣本的權重設爲:正樣本數/負樣本數。這個參數只能用於二分類。

2.自定義評價函數:

https://cloud.tencent.com/developer/article/1357671

 

lightGBM的原理總結:

http://www.cnblogs.com/gczr/p/9024730.html

論文翻譯:https://blog.csdn.net/u010242233/article/details/79769950https://zhuanlan.zhihu.com/p/42939089

處理分類變量的原理:https://blog.csdn.net/anshuai_aw1/article/details/83275299

CatBoost、LightGBM、XGBoost的對比

https://blog.csdn.net/LrS62520kV/article/details/79620615

相關文章
相關標籤/搜索