[Scikit-learn] 1.5 Generalized Linear Models - SGD for Classification

時間 2019-11-18

標籤 scikit learn 1.5 generalized linear models sgd classification 简体版

原文原文鏈接

NB：由於softmax，NN看上去是分類，實際上是擬合（迴歸），擬合最大似然。html

多分類參見：[Scikit-learn] 1.1 Generalized Linear Models - Logistic regression & Softmax算法

感知機採用的是形式最簡單的梯度網絡

Perceptron and SGDClassifier share the same underlying implementation.In fact, Perceptron() is equivalent to SGDClassifier(loss=」perceptron」, eta0=1, learning_rate=」constant」, penalty=None).dom

1.5. Stochastic Gradient Descent機器學習

損失函數

須要一些背景知識，參見斯坦福 CS231n - CNN for Visual Recognition 2 - lecture3函數

參考：斯坦福CS231n - CNN for Visual Recognition 2 - lecture3 Optimization
post

1、Loss function 計算

Linear SVM classifier的一個例子。學習

(1) 計算損失函數：Multiclass SVM loss優化

一個批次，三張圖片，分別獲得以下的預測值；然後計算loss。ui

與"另外兩個"的比較：

L = (2.9 + 0 + 10.9)/3
= 4.6

(2) 正則化

典型例子說服你：咱們固然prefer後一個，w₂。

2、其餘loss function

Ref: Loss functions for classification

3、loss計算對比

(a) Softmax classifier 的 Softmax's Loss 計算：

(b) Linear SVM classifier 的 hinge loss 計算：

經過該演示體會：http://vision.stanford.edu/teaching/cs231n-demos/linear-classify/

梯度降低

1、邏輯迴歸

兩種損失函數

第一步，邏輯迴歸的損失函數能夠是「得分差」，固然也能夠是其餘。

第二步，利用「得分差」來進行梯度降低，進行參數優化。

常見有選擇兩種損失函數，以下：

（１）最小二乘損失函數：邏輯迴歸與梯度降低法所有詳細推導

（２）交叉熵損失函數：機器學習算法 --- 邏輯迴歸及梯度降低（正統策略）

兩個函數接口

Softmax參見：[Scikit-learn] 1.1 Generalized Linear Models - Logistic regression & Softmax

LogisticRegression (交叉熵損失，迭代) versus SGDClassifier(loss="log")

the major difference is the optimization algorithm:

Question: Liblinear/Coordinate Descent vs. Stochastic Gradient Descent.

問題：線性梯度降低　vs　隨機梯度降低

If your problem is high dimensional (10K or more) and you have a large
number of examples (100K or more) you should choose the latter -
otherwise, LogisticRegression should be fine.

高維，更高的數據：隨機梯度降低

反之：Liblinear/Coordinate梯度降低

迭代便可，
Both are not proper multinomial logistic regression models;

LogisticRegression does not care and simply computes the probability
estimates of each OVR classifier and normalized to make sure they sum
to one. You could do the same for SGDClassifier(loss='log') but you
have to implement it on your own. You should be aware of the fact that
SGDClassifier(n_jobs > 1) uses multiple processes, thus, if your
dataset (``X``) is too large (more than 50% of your RAM) you'll run
into troubles.

2、梯度降低實踐

SGD + Linear SVM classifier

========================================= SGD: Maximum margin separating hyperplane ========================================= Plot the maximum margin separating hyperplane within a two-class separable dataset using a linear Support Vector Machines classifier trained using SGD. """
print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import SGDClassifier from sklearn.datasets.samples_generator import make_blobs # we create 50 separable points
X, Y = make_blobs(n_samples=50, centers=2, random_state=0, cluster_std=0.60) 
# 生成樣本（上），即刻訓練（下）
 # fit the model
clf = SGDClassifier(loss="hinge", alpha=0.01, n_iter=200, fit_intercept=True) clf.fit(X, Y) # plot the line, the points, and the nearest vectors to the plane
xx = np.linspace(-1, 5, 10) yy = np.linspace(-1, 5, 10) X1, X2 = np.meshgrid(xx, yy) Z = np.empty(X1.shape) for (i, j), val in np.ndenumerate(X1): x1 = val x2 = X2[i, j] p = clf.decision_function([[x1, x2]]) Z[i, j] = p[0]
 levels = [-1.0, 0.0, 1.0] linestyles = ['dashed', 'solid', 'dashed'] colors = 'k' plt.contour(X1, X2, Z, levels, colors=colors, linestyles=linestyles) plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired) plt.axis('tight') plt.show()

Result:

`SGDClassifier 的重要參數`

具體的損失函數能夠經過 loss 參數來設置。SGDClassifier 支持如下幾種損失函數:

loss="hinge": (soft-margin) linear Support Vector Machine,

loss="modified_huber": smoothed hinge loss,

loss="log": logistic regression,

and all regression losses below.

上述中前兩個損失函數lazy的，它們只有在某個樣本違反了margin（間隔）限制纔會更新模型參數，這樣的訓練過程很是有效，而且能夠應用在稀疏模型上，甚至當使用了L2罰項的時候。

具體的罰項能夠經過 penalty 參數。SGD支持一下幾種罰項:

penalty="l2": L2 norm penalty on coef_.

penalty="l1": L1 norm penalty on coef_.

penalty="elasticnet": Convex combination of L2 and L1; (1 - l1_ratio) * L2 + l1_ratio * L1.

Ref: [Scikit-learn] 1.1. Generalized Linear Models - from Linear Regression to L1&L2

Ref: [Scikit-learn] 1.1. Generalized Linear Models - Lasso Regression

默認的設置是 penalty="l2"。L1罰項會致使稀疏的解，使大多數稀疏爲0。彈性網絡解決了當屬性高度相關狀況下L1罰項的不足。參數 l1_ratio 控制 L1 和 L2 罰項的凸組合。

3、多類分類

SGDClassifier 經過組合多個「one versus all(OVA)」形式的二分類器來支持多類分類。

"Softmax 迴歸 vs. k 個二元分類器 —— 這一選擇取決於你的類別之間是否互斥"

對於 $K$ 類中每一個類別，二分類器經過判別該類和其它 $K-1$ 類來學習。

經過隨機梯度降低解線性分類問題。

""" ======================================== Plot multi-class SGD on the iris dataset ======================================== Plot decision surface of multi-class SGD on iris dataset. The hyperplanes corresponding to the three one-versus-all (OVA) classifiers are represented by the dashed lines. """
print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.linear_model import SGDClassifier # import some data to play with
iris = datasets.load_iris() X = iris.data[:, :2]  # we only take the first two features. We could
                      # avoid this ugly slicing by using a two-dim dataset
y = iris.target colors = "bry"

# shuffle 洗牌
idx = np.arange(X.shape[0]) np.random.seed(13) np.random.shuffle(idx) X = X[idx] y = y[idx] # standardize
mean = X.mean(axis=0) std = X.std(axis=0) X = (X - mean) / std h = .02  # step size in the mesh
 clf = SGDClassifier(alpha=0.001, n_iter=100).fit(X, y) # create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) # Plot the decision boundary. For that, we will assign a color to each # point in the mesh [x_min, x_max]x[y_min, y_max].
Z  = clf.predict(np.c_[xx.ravel(), yy.ravel()]) # Put the result into a color plot
Z  = Z.reshape(xx.shape) cs = plt.contourf(xx, yy, Z, cmap=plt.cm.Paired) plt.axis('tight') # Plot also the training points
for i, color in zip(clf.classes_, colors): idx = np.where(y == i) plt.scatter(X[idx, 0], X[idx, 1], c=color, label=iris.target_names[i], cmap=plt.cm.Paired)
 plt.title("Decision surface of multi-class SGD") plt.axis('tight') # Plot the three one-against-all classifiers
xmin, xmax = plt.xlim() ymin, ymax = plt.ylim() coef = clf.coef_ intercept = clf.intercept_ def plot_hyperplane(c, color): def line(x0): return (-(x0 * coef[c, 0]) - intercept[c]) / coef[c, 1] plt.plot([xmin, xmax], [line(xmin), line(xmax)], ls="--", color=color) for i, color in zip(clf.classes_, colors): plot_hyperplane(i, color) plt.legend() plt.show()

Result:

4、考慮權重的二分類

""" ===================== SGD: Weighted samples ===================== Plot decision function of a weighted dataset, where the size of points is proportional to its weight. """
print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn import linear_model # we create 20 points
np.random.seed(0) X = np.r_[np.random.randn(10, 2) + [1, 1], np.random.randn(10, 2)] y = [1] * 10 + [-1] * 10 sample_weight = 100 * np.abs(np.random.randn(20)) # and assign a bigger weight to the last 10 samples
sample_weight[:10] *= 10

# plot the weighted data points
xx, yy = np.meshgrid(np.linspace(-4, 5, 500), np.linspace(-4, 5, 500)) plt.figure() plt.scatter(X[:, 0], X[:, 1], c=y, s=sample_weight, alpha=0.9, cmap=plt.cm.bone)　　#散點圖 ## fit the unweighted model
clf = linear_model.SGDClassifier(alpha=0.01, n_iter=100) clf.fit(X, y) Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) no_weights = plt.contour(xx, yy, Z, levels=[0], linestyles=['solid']) ## fit the weighted model
clf = linear_model.SGDClassifier(alpha=0.01, n_iter=100) clf.fit(X, y, sample_weight=sample_weight) Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) samples_weights = plt.contour(xx, yy, Z, levels=[0], linestyles=['dashed']) plt.legend([no_weights.collections[0], samples_weights.collections[0]], ["no weights", "with weights"], loc="lower left") plt.xticks(()) plt.yticks(()) plt.show()

Result:

End.

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。