lightgbm的sklearn接口和原生接口參數詳細說明及調參指點

class lightgbm.LGBMClassifier(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=200000, objective=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=1, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, **kwargs)

boosting_type	default="gbdt"	"gbdt":Gradient Boosting Decision Treehtml "dart":Dropouts meet Multiple Additive Regression Trees數組 "goss":Gradient-based One-Side Samplingdom "rf": Random Forestide
num_leaves	(int, optional (default=31))	每一個基學習器的最大葉子節點	<=2^max_depth
max_depth	(int, optional (default=-1))	每一個基學習器的最大深度, -1 means no limit	當模型過擬合，首先下降max_depth
learning_rate	(float, optional (default=0.1))	Boosting learning rate
n_estimators	(int, optional (default=10))	基學習器的數量
max_bin	(int, optional (default=255))	feature將存入的bin的最大數量，應該是直方圖的k值
subsample_for_bin	(int, optional (default=50000))	Number of samples for constructing bins
objective	(string, callable or None, optional (default=None))	default:函數 ‘regression’ for LGBMRegressor, 學習 ‘binary’ or ‘multiclass’ for LGBMClassifier, url ‘lambdarank’ for LGBMRanker.spa
min_split_gain	(float, optional* (default=0.))*	樹的葉子節點上進行進一步劃分所需的最小損失減小
min_child_weight	(float, optional* (default=1e-3))*	Minimum sum of instance weight(hessian) needed in a child(leaf)
min_child_samples	(int, optional (default=20))	葉子節點具備的最小記錄數
subsample	(float, optional (default=1.))	訓練時採樣必定比例的數據
subsample_freq	(int, optional* (default=1))*	Frequence of subsample, <=0 means no enable
colsample_bytree	(float, optional (default=1.))	Subsample ratio of columns when constructing each tree
reg_alpha	(float, optional* (default=0.))*	L1 regularization term on weights
reg_lambda .net	(float, optional* (default=0.))*	L2 regularization term on weights
random_state翻譯	(int* or None, optional (default=None))*
silent	(bool, optional (default=True))
n_jobs	(int, optional (default=-1))

######################################################################################################

下表對應了Faster Spread，better accuracy，over-fitting三種目的時，能夠調整的參數:

###########################################################################################

類的屬性：

n_features_	int	特徵的數量
classes_	rray of shape = [n_classes]	類標籤數組（只針對分類問題）
n_classes_	int	類別數量（只針對分類問題）
best_score_	dict or None	最佳擬合模型得分
best_iteration_	int or None	若是已經指定了early_stopping_rounds，則擬合模型的最佳迭代次數
objective_	string or callable	擬合模型時的具體目標
booster_	Booster	這個模型的Booster
evals_result_	dict or None	若是已經指定了early_stopping_rounds，則評估結果
feature_importances_	array of shape = [n_features]	特徵的重要性

###########################################################################################

類的方法：

fit(X, y, sample_weight=None, init_score=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_metric='logloss', early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)

X	array-like or sparse matrix of shape = [n_samples, n_features]	特徵矩陣
y	array-like of shape = [n_samples]	The target values (class labels in classification, real numbers in regression)
sample_weight	array-like of shape = [n_samples] or None, optional (default=None))	樣本權重,能夠採用np.where設置
init_score	array-like of shape = [n_samples] or None, optional (default=None))	Init score of training data
group	array-like of shape = [n_samples] or None, optional (default=None)	Group data of training data.
eval_set	list or None, optional (default=None)	A list of (X, y) tuple pairs to use as a validation sets for early-stopping
eval_names	list of strings or None, optional (default=None)	Names of eval_set
eval_sample_weight	list of arrays or None, optional (default=None)	Weights of eval data
eval_init_score	list of arrays or None, optional (default=None)	Init score of eval data
eval_group	list of arrays or None, optional (default=None)	Group data of eval data
eval_metric	string, list of strings, callable or None, optional (default="logloss")	"mae","mse",...
early_stopping_rounds	int or None, optional (default=None)	必定rounds,即中止迭代
verbose	bool, optional (default=True)
feature_name	list of strings or 'auto', optional (default="auto")	If ‘auto’ and data is pandas DataFrame, data columns names are used
categorical_feature	list of strings or int, or 'auto', optional (default="auto")	If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used
callbacks	list of callback functions or None, optional (default=None)




















###############################################################################################

predict_proba(X, raw_score=False, num_iteration=0)

X	array-like or sparse matrix of shape = [n_samples, n_features]	Input features matrix
raw_score	bool, optional (default=False)	Whether to predict raw scores
num_iteration	int, optional (default=0)	Limit number of iterations in the prediction; defaults to 0 (use all trees).
Returns	predicted_probability	The predicted probability for each class for each sample.
Return type	array-like of shape = [n_samples, n_classes]

不平衡處理的參數：

1.一個簡單的方法是設置is_unbalance參數爲True或者設置scale_pos_weight,兩者只能選一個。設置is_unbalance參數爲True時會把負樣本的權重設爲：正樣本數/負樣本數。這個參數只能用於二分類。

2.自定義評價函數：

https://cloud.tencent.com/developer/article/1357671

lightGBM的原理總結：

http://www.cnblogs.com/gczr/p/9024730.html

論文翻譯：https://blog.csdn.net/u010242233/article/details/79769950，https://zhuanlan.zhihu.com/p/42939089

處理分類變量的原理：https://blog.csdn.net/anshuai_aw1/article/details/83275299

CatBoost、LightGBM、XGBoost的對比

https://blog.csdn.net/LrS62520kV/article/details/79620615