上一篇咱們實現了使用梯度降低法的自適應線性神經元,這個方法會使用全部的訓練樣原本對權重向量進行更新,也能夠稱之爲批量梯度降低(batch gradient descent)。假設如今咱們數據集中擁有大量的樣本,好比百萬條樣本,那麼若是咱們如今使用批量梯度降低來訓練模型,每更新一次權重向量,咱們都要使用百萬條樣本,訓練時間很長,效率很低,咱們能不能找到一種方法,既能使用梯度降低法,可是又不要每次更新權重都要使用到全部的樣本,因而隨機梯度降低法(stochastic gradient descent)便被提出來了。python
隨機梯度降低法能夠只用一個訓練樣原本對權重向量進行更新:
\[ \eta(y^i-\phi(z^i))x^i \]
這種方法比批量梯度降低法收斂的更快,由於它能夠更加頻繁的更新權重向量,而且使用當個樣原本更新權重,相比於使用所有的樣原本更新更具備隨機性,有助於算法避免陷入到局部最小值,使用這個方法的要注意在選取樣本進行更新時必定要隨機選取,每次迭代前都要打亂全部的樣本順序,保證訓練的隨機性,而且在訓練時的學習率也不是固定不變的,能夠隨着迭代次數的增長,學習率逐漸減少,這種方法能夠有助於算法收斂。算法
如今咱們有了使用所有樣本的批量梯度降低法,也有了使用單個樣本的隨機梯度降低法,那麼一種折中的方法,稱爲最小批學習(mini-batch learning),它每次使用一部分訓練樣原本更新權重向量。app
接下來咱們實現使用隨機梯度降低法的Adalinedom
from numpy.random import seed
class AdalineSGD(object): """ADAptive LInear NEuron classifier. Parameters ---------- eta:float Learning rate(between 0.0 and 1.0 n_iter:int Passes over the training dataset. Attributes ---------- w_: 1d-array weights after fitting. errors_: list Number of miscalssifications in every epoch. shuffle:bool(default: True) Shuffle training data every epoch if True to prevent cycles. random_state: int(default: None) Set random state for shuffling and initalizing the weights. """ def __init__(self, eta=0.01, n_iter=10, shuffle=True, random_state=None): self.eta = eta self.n_iter = n_iter self.w_initialized = False self.shuffle = shuffle if random_state: seed(random_state) def fit(self, X, y): """Fit training data. :param X:{array-like}, shape=[n_samples, n_features] :param y: array-like, shape=[n_samples] :return: self:object """ self._initialize_weights(X.shape[1]) self.cost_ = [] for i in range(self.n_iter): if self.shuffle: X, y = self._shuffle(X, y) cost = [] for xi, target in zip(X, y): cost.append(self._update_weights(xi, target)) avg_cost = sum(cost)/len(y) self.cost_.append(avg_cost) return self def partial_fit(self, X, y): """Fit training data without reinitializing the weights.""" if not self.w_initialized: self._initialize_weights(X.shape[1]) if y.ravel().shape[0] > 1: for xi, target in zip(X, y): self._update_weights(xi, target) else: self._update_weights(X, y) return self def _shuffle(self, X, y): """Shuffle training data""" r = np.random.permutation(len(y)) return X[r], y[r] def _initialize_weights(self, m): """Initialize weights to zeros""" self.w_ = np.zeros(1 + m) self.w_initialized = True def _update_weights(self, xi, target): """Apply Adaline learning rule to update the weights""" output = self.net_input(xi) error = (target - output) self.w_[1:] += self.eta * xi.dot(error) self.w_[0] += self.eta * error cost = 0.5 * error ** 2 return cost def net_input(self, X): """Calculate net input""" return np.dot(X, self.w_[1:]) + self.w_[0] def activation(self, X): """Computer linear activation""" return self.net_input(X) def predict(self, X): """Return class label after unit step""" return np.where(self.activation(X) >= 0.0, 1, -1)
其中_shuffle方法中,調用numpy.random中的permutation函數獲得0-100的一個隨機序列,而後這個序列做爲特徵矩陣和類別向量的下標,就能夠起到打亂樣本順序的功能。函數
如今開始訓練學習
ada = AdalineSGD(n_iter=15, eta=0.01, random_state=1) ada.fit(X_std, y)
畫出分界圖和訓練曲線圖spa
plot_decision_region(X_std, y, classifier=ada) plt.title('Adaline - Stochastic Gradient Desent') plt.xlabel('sepal length [standardized]') plt.ylabel('petal length [standardized]') plt.legend(loc = 'upper left') plt.show() plt.plot(range(1, len(ada.cost_) + 1), ada.cost_, marker='o') plt.xlabel('Epochs') plt.ylabel('Average Cost') plt.show()
從上圖能夠看出,平均損失降低很快,在大概第15次迭代後,分界線和使用批量梯度降低的Adaline分界線很相似。.net