sklearn邏輯迴歸(Logistic Regression,LR)調參指南

時間 2019-11-06

標籤 sklearn 邏輯迴歸 logistic regression 指南欄目應用數學简体版

原文原文鏈接

python信用評分卡建模（附代碼，博主錄製）

https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=sharehtml

sklearn邏輯迴歸官網調參指南python

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.htmlgit

`sklearn.linear_model`.LogisticRegression

class sklearn.linear_model. LogisticRegression (penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver=’warn’, max_iter=100, multi_class=’warn’, verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)[source]¶

Logistic Regression (aka logit, MaxEnt) classifier.github

In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the ‘multi_class’ option is set to ‘ovr’, and uses the cross-entropy loss if the ‘multi_class’ option is set to ‘multinomial’. (Currently the ‘multinomial’ option is supported only by the ‘lbfgs’, ‘sag’, ‘saga’ and ‘newton-cg’ solvers.)算法

This class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ solvers. Note that regularization is applied by default. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).shell

The ‘newton-cg’, ‘sag’, and ‘lbfgs’ solvers support only L2 regularization with primal formulation, or no regularization. The ‘liblinear’ solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. The Elastic-Net regularization is only supported by the ‘saga’ solver.微信

Parameters:	penalty : str, ‘l1’, ‘l2’, ‘elasticnet’ or ‘none’, optional (default=’l2’) Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties. ‘elasticnet’ is only supported by the ‘saga’ solver. If ‘none’ (not supported by the liblinear solver), no regularization is applied.less New in version 0.19: l1 penalty with SAGA solver (allowing ‘multinomial’ + L1)dom dual : bool, optional (default=False) Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features. tol : float, optional (default=1e-4) Tolerance for stopping criteria. C : float, optional (default=1.0) Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. fit_intercept : bool, optional (default=True) Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. intercept_scaling : float, optional (default=1) Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a 「synthetic」 feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes `intercept_scaling * synthetic_feature_weight`. Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased. class_weight : dict or ‘balanced’, optional (default=None) Weights associated with classes in the form `{class_label: weight}`. If not given, all classes are supposed to have weight one. The 「balanced」 mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as `n_samples / (n_classes * np.bincount(y))`. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. New in version 0.17: class_weight=’balanced’ random_state : int, RandomState instance or None, optional (default=None) The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. Used when `solver` == ‘sag’ or ‘liblinear’. solver : str, {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, optional (default=’liblinear’). Algorithm to use in the optimization problem. For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones. For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to one-versus-rest schemes. ‘newton-cg’, ‘lbfgs’, ‘sag’ and ‘saga’ handle L2 or no penalty ‘liblinear’ and ‘saga’ also handle L1 penalty ‘saga’ also supports ‘elasticnet’ penalty ‘liblinear’ does not handle no penalty Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing. New in version 0.17: Stochastic Average Gradient descent solver. New in version 0.19: SAGA solver. Changed in version 0.20: Default will change from ‘liblinear’ to ‘lbfgs’ in 0.22. max_iter : int, optional (default=100) Maximum number of iterations taken for the solvers to converge. multi_class : str, {‘ovr’, ‘multinomial’, ‘auto’}, optional (default=’ovr’) If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’. New in version 0.18: Stochastic Average Gradient descent solver for ‘multinomial’ case. Changed in version 0.20: Default will change from ‘ovr’ to ‘auto’ in 0.22. verbose : int, optional (default=0) For the liblinear and lbfgs solvers set verbose to any positive number for verbosity. warm_start : bool, optional (default=False) When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See the Glossary. New in version 0.17: warm_start to support lbfgs, newton-cg, sag, saga solvers. n_jobs : int or None, optional (default=None) Number of CPU cores used when parallelizing over classes if multi_class=’ovr’」. This parameter is ignored when the `solver` is set to ‘liblinear’ regardless of whether ‘multi_class’ is specified or not. `None` means 1 unless in a `joblib.parallel_backend` context. `-1` means using all processors. See Glossary for more details. l1_ratio : float or None, optional (default=None) The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Only used if penalty='elasticnet'`. Setting ``l1_ratio=0 is equivalent to using `penalty='l2'`, while setting `l1_ratio=1` is equivalent to using `penalty='l1'`. For `0 < l1_ratio <1`, the penalty is a combination of L1 and L2.
Attributes:	classes_ : array, shape (n_classes, ) A list of class labels known to the classifier. coef_ : array, shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function. coef_ is of shape (1, n_features) when the given problem is binary. In particular, when `multi_class='multinomial'`, coef_ corresponds to outcome 1 (True) and `-coef_` corresponds to outcome 0 (False). intercept_ : array, shape (1,) or (n_classes,) Intercept (a.k.a. bias) added to the decision function. If `fit_intercept` is set to False, the intercept is set to zero. `intercept_` is of shape (1,) when the given problem is binary. In particular, when `multi_class='multinomial'`, `intercept_` corresponds to outcome 1 (True) and `-intercept_` corresponds to outcome 0 (False). n_iter_ : array, shape (n_classes,) or (1, ) Actual number of iterations for all classes. If binary or multinomial, it returns only 1 element. For liblinear solver, only the maximum number of iteration across all classes is given. Changed in version 0.20: In SciPy <= 1.0.0 the number of lbfgs iterations may exceed `max_iter`. `n_iter_` will now report at most `max_iter`.

1. 概述

　　　　在scikit-learn中，與邏輯迴歸有關的主要是這3個類。LogisticRegression， LogisticRegressionCV 和logistic_regression_path。其中LogisticRegression和LogisticRegressionCV的主要區別是LogisticRegressionCV使用了交叉驗證來選擇正則化係數C。而LogisticRegression須要本身每次指定一個正則化係數。除了交叉驗證，以及選擇正則化係數C之外， LogisticRegression和LogisticRegressionCV的使用方法基本相同。

　　　　logistic_regression_path類則比較特殊，它擬合數據後，不能直接來作預測，只能爲擬合數據選擇合適邏輯迴歸的係數和正則化係數。主要是用在模型選擇的時候。通常狀況用不到這個類，因此後面再也不講述logistic_regression_path類。

　　　　此外，scikit-learn裏面有個容易讓人誤解的類RandomizedLogisticRegression,雖然名字裏有邏輯迴歸的詞，可是主要是用L1正則化的邏輯迴歸來作特徵選擇的，屬於維度規約的算法類，不屬於咱們常說的分類算法的範疇。

　　　　後面的講解主要圍繞LogisticRegression和LogisticRegressionCV中的重要參數的選擇來來展開，這些參數的意義在這兩個類中都是同樣的。

2. 正則化選擇參數：penalty

　　　　LogisticRegression和LogisticRegressionCV默認就帶了正則化項。penalty參數可選擇的值爲"l1"和"l2".分別對應L1的正則化和L2的正則化，默認是L2的正則化。

　　　　在調參時若是咱們主要的目的只是爲了解決過擬合，通常penalty選擇L2正則化就夠了。可是若是選擇L2正則化發現仍是過擬合，即預測效果差的時候，就能夠考慮L1正則化。另外，若是模型的特徵很是多，咱們但願一些不重要的特徵係數歸零，從而讓模型係數稀疏化的話，也可使用L1正則化。

　　　　penalty參數的選擇會影響咱們損失函數優化算法的選擇。即參數solver的選擇，若是是L2正則化，那麼4種可選的算法{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’}均可以選擇。可是若是penalty是L1正則化的話，就只能選擇‘liblinear’了。這是由於L1正則化的損失函數不是連續可導的，而{‘newton-cg’, ‘lbfgs’,‘sag’}這三種優化算法時都須要損失函數的一階或者二階連續導數。而‘liblinear’並無這個依賴。

　　　　具體使用了這4個算法有什麼不一樣以及有什麼影響咱們下一節講。

3. 優化算法選擇參數：solver

　　　　solver參數決定了咱們對邏輯迴歸損失函數的優化方法，有4種算法能夠選擇，分別是：

　　　　a) liblinear：使用了開源的liblinear庫實現，內部使用了座標軸降低法來迭代優化損失函數。

　　　　b) lbfgs：擬牛頓法的一種，利用損失函數二階導數矩陣即海森矩陣來迭代優化損失函數。

　　　　c) newton-cg：也是牛頓法家族的一種，利用損失函數二階導數矩陣即海森矩陣來迭代優化損失函數。

　　　　d) sag：即隨機平均梯度降低，是梯度降低法的變種，和普通梯度降低法的區別是每次迭代僅僅用一部分的樣原本計算梯度，適合於樣本數據多的時候，SAG是一種線性收斂算法，這個速度遠比SGD快。關於SAG的理解，參考博文線性收斂的隨機優化算法之 SAG、SVRG（隨機梯度降低）

　　　　從上面的描述能夠看出，newton-cg, lbfgs和sag這三種優化算法時都須要損失函數的一階或者二階連續導數，所以不能用於沒有連續導數的L1正則化，只能用於L2正則化。而liblinear通吃L1正則化和L2正則化。

　　　　同時，sag每次僅僅使用了部分樣本進行梯度迭代，因此當樣本量少的時候不要選擇它，而若是樣本量很是大，好比大於10萬，sag是第一選擇。可是sag不能用於L1正則化，因此當你有大量的樣本，又須要L1正則化的話就要本身作取捨了。要麼經過對樣本採樣來下降樣本量，要麼回到L2正則化。

在sklearn的官方文檔中，對於solver的使用說明以下：

In a nutshell, one may choose the solver with the following rules:

Case	Solver
Small dataset or L1 penalty	「liblinear」
Multinomial loss or large dataset	「lbfgs」, 「sag」 or 「newton-cg」
Very Large dataset	「sag」

　　　　從上面的描述，你們可能以爲，既然newton-cg, lbfgs和sag這麼多限制，若是不是大樣本，咱們選擇liblinear不就好了嘛！錯，由於liblinear也有本身的弱點！咱們知道，邏輯迴歸有二元邏輯迴歸和多元邏輯迴歸。對於多元邏輯迴歸常見的有one-vs-rest(OvR)和many-vs-many(MvM)兩種。而MvM通常比OvR分類相對準確一些。鬱悶的是liblinear只支持OvR，不支持MvM，這樣若是咱們須要相對精確的多元邏輯迴歸時，就不能選擇liblinear了。也意味着若是咱們須要相對精確的多元邏輯迴歸不能使用L1正則化了。

總結而言，liblinear支持L1和L2，只支持OvR作多分類，「lbfgs」, 「sag」「newton-cg」只支持L2，支持OvR和MvM作多分類。

　　　　具體OvR和MvM有什麼不一樣咱們下一節講。

4. 分類方式選擇參數：multi_class

　　　　multi_class參數決定了咱們分類方式的選擇，有 ovr和multinomial兩個值能夠選擇，默認是 ovr。

　　　　ovr即前面提到的one-vs-rest(OvR)，而multinomial即前面提到的many-vs-many(MvM)。若是是二元邏輯迴歸，ovr和multinomial並無任何區別，區別主要在多元邏輯迴歸上。

　　　　OvR的思想很簡單，不管你是多少元邏輯迴歸，咱們均可以看作二元邏輯迴歸。具體作法是，對於第K類的分類決策，咱們把全部第K類的樣本做爲正例，除了第K類樣本之外的全部樣本都做爲負例，而後在上面作二元邏輯迴歸，獲得第K類的分類模型。其餘類的分類模型得到以此類推。

　　　　而MvM則相對複雜，這裏舉MvM的特例one-vs-one(OvO)做講解。若是模型有T類，咱們每次在全部的T類樣本里面選擇兩類樣本出來，不妨記爲T1類和T2類，把全部的輸出爲T1和T2的樣本放在一塊兒，把T1做爲正例，T2做爲負例，進行二元邏輯迴歸，獲得模型參數。咱們一共須要T(T-1)/2次分類。

　　　　從上面的描述能夠看出OvR相對簡單，但分類效果相對略差（這裏指大多數樣本分佈狀況，某些樣本分佈下OvR可能更好）。而MvM分類相對精確，可是分類速度沒有OvR快。

　　　　若是選擇了ovr，則4種損失函數的優化方法liblinear，newton-cg, lbfgs和sag均可以選擇。可是若是選擇了multinomial,則只能選擇newton-cg, lbfgs和sag了。

5. 類型權重參數： class_weight

　　　　class_weight參數用於標示分類模型中各類類型的權重，能夠不輸入，即不考慮權重，或者說全部類型的權重同樣。若是選擇輸入的話，能夠選擇balanced讓類庫本身計算類型權重，或者咱們本身輸入各個類型的權重，好比對於0,1的二元模型，咱們能夠定義class_weight={0:0.9, 1:0.1}，這樣類型0的權重爲90%，而類型1的權重爲10%。

　　　　若是class_weight選擇balanced，那麼類庫會根據訓練樣本量來計算權重。某種類型樣本量越多，則權重越低，樣本量越少，則權重越高。

sklearn的官方文檔中，當class_weight爲balanced時，類權重計算方法以下：

n_samples / (n_classes * np.bincount(y))，n_samples爲樣本數，n_classes爲類別數量，np.bincount(y)會輸出每一個類的樣本數，例如y=[1,0,0,1,1],則np.bincount(y)=[2,3]

　　　　那麼class_weight有什麼做用呢？在分類模型中，咱們常常會遇到兩類問題：

　　　　第一種是誤分類的代價很高。好比對合法用戶和非法用戶進行分類，將非法用戶分類爲合法用戶的代價很高，咱們寧願將合法用戶分類爲非法用戶，這時能夠人工再甄別，可是卻不肯將非法用戶分類爲合法用戶。這時，咱們能夠適當提升非法用戶的權重。

　　　　第二種是樣本是高度失衡的，好比咱們有合法用戶和非法用戶的二元樣本數據10000條，裏面合法用戶有9995條，非法用戶只有5條，若是咱們不考慮權重，則咱們能夠將全部的測試集都預測爲合法用戶，這樣預測準確率理論上有99.95%，可是卻沒有任何意義。這時，咱們能夠選擇balanced，讓類庫自動提升非法用戶樣本的權重。

　　　　提升了某種分類的權重，相比不考慮權重，會有更多的樣本分類劃分到高權重的類別，從而能夠解決上面兩類問題。

　　　　固然，對於第二種樣本失衡的狀況，咱們還能夠考慮用下一節講到的樣本權重參數： sample_weight，而不使用class_weight。sample_weight在下一節講。

6. 樣本權重參數： sample_weight

　　　　上一節咱們提到了樣本不失衡的問題，因爲樣本不平衡，致使樣本不是整體樣本的無偏估計，從而可能致使咱們的模型預測能力降低。遇到這種狀況，咱們能夠經過調節樣本權重來嘗試解決這個問題。調節樣本權重的方法有兩種，第一種是在class_weight使用balanced。第二種是在調用fit函數時，經過sample_weight來本身調節每一個樣本權重。

　　　　在scikit-learn作邏輯迴歸時，若是上面兩種方法都用到了，那麼樣本的真正權重是class_weight*sample_weight.

　　　　以上就是scikit-learn中邏輯迴歸類庫調參的一個小結，還有些參數好比正則化參數C（交叉驗證就是 Cs），迭代次數max_iter等，因爲和其它的算法類庫並無特別不一樣，這裏很少累述了。

python風控建模實戰lendingClub(博主錄製，catboost，lightgbm建模，2K超清分辨率)

https://study.163.com/course/courseMain.htm?courseId=1005988013&share=2&shareId=400000000398149

微信掃二維碼，免費學習更多python資源