Stanford機器學習筆記-5.神經網絡Neural Networks (part two)

時間 2019-11-12

標籤 stanford 機器學習筆記神經網絡 neural networks 简体版

原文原文鏈接

5 Neural Networks (part two)

content:php

　　5 Neural Networks (part two)html

　　　　5.1 cost function算法

　　　　5.2 Back Propagation網絡

　　　　5.3 神經網絡總結app

接上一篇4. Neural Networks (part one). 本文將先定義神經網絡的代價函數，而後介紹逆向傳播(Back Propagation: BP)算法，它能有效求解代價函數對鏈接權重的偏導，最後對訓練神經網絡的過程進行總結。dom

5.1 cost function

(注：正則化相關內容參見3.Bayesian statistics and Regularization)機器學習

5.2 Back Propagation

（詳細推導過程參見反向傳播算法，以及李宏毅的機器學習課程：youtube,B站）。ide

圖5-1 BP算法步驟函數

在實現反向傳播算法時，有以下幾個須要注意的地方。學習

須要對全部的鏈接權重(包括偏移單元)初始化爲接近0但不全等於0的隨機數。若是全部參數都用相同的值做爲初始值，那麼全部隱藏層單元最終會獲得與輸入值有關的、相同的函數（也就是說，全部神經元的激活值都會取相同的值，對於任何輸入x 都會有：）。隨機初始化的目的是使對稱失效。具體地，咱們能夠如圖5-2同樣隨機初始化。（matlab實現見後文代碼1）
若是實現的BP算法計算出的梯度（偏導數）是錯誤的，那麼用該模型來預測新的值確定是不科學的。因此，咱們應該在應用以前就判斷BP算法是否正確。具體的，能夠經過數值的方法(如圖5-3所示的)計算出較精確的偏導，而後再和BP算法計算出來的進行比較，若二者相差在正常的偏差範圍內，則BP算法計算出的應該是比較正確的，不然說明算法實現有誤。注意在檢查完後，在真正訓練模型時不該該再運行數值計算偏導的方法，不然將會運行很慢。（matlab實現見後文代碼2）
用matlab實現時要注意matlab的函數參數不能爲矩陣，而鏈接權重爲矩陣，因此在傳遞初始化鏈接權重前先將其向量化，再用reshape函數恢復。(見後文代碼3)

圖5-2 隨機初始化鏈接權重

圖5-3 數值方法求代價函數偏導的近似值

5.3 神經網絡總結

第一步，設計神經網絡結構。

隱藏層單元個數一般都是不肯定的。

通常選取神經網絡隱藏層單元個數的幾個經驗公式以下：

參考https://www.zhihu.com/question/46530834

此外，MNIST手寫數字識別中給出了以不一樣的神經網絡結構訓練的結果，供參考

第二步，實現正向傳播(FP)和反向傳播算法，這一步包括以下的子步驟。

第三步，用數值方法檢查求偏導的正確性

第四步，用梯度降低法或更先進的優化算法求使得代價函數最小的鏈接權重

在第四步中，因爲代價函數是非凸(non-convex)函數，因此在優化過程當中可能陷入局部最優值，但不必定比全局最優差不少（如圖5-4），在實際應用中一般不是大問題。也會有一些啓發式的算法（如模擬退火算法，遺傳算法等）來幫助跳出局部最優。

圖5-4 陷入局部最優(不必定比全局最優差不少)

代碼1：隨機初始化鏈接權重

function W = randInitializeWeights(L_in, L_out)
%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
%incoming connections and L_out outgoing connections
%   W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights 
%   of a layer with L_in incoming connections and L_out outgoing 
%   connections. 
%
%   Note that W should be set to a matrix of size(L_out, 1 + L_in) as
%   the column row of W handles the "bias" terms
%

W = zeros(L_out, 1 + L_in);


% Instructions: Initialize W randomly so that we break the symmetry while
%               training the neural network.
%
% Note: The first row of W corresponds to the parameters for the bias units
%

epsilon_init = sqrt(6) / (sqrt(L_out+L_in));
W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init;

end

View Code

代碼2：用數值方法求代價函數對鏈接權重偏導的近似值

function numgrad = computeNumericalGradient(J, theta)
%COMPUTENUMERICALGRADIENT Computes the gradient using "finite differences"
%and gives us a numerical estimate of the gradient.
%   numgrad = COMPUTENUMERICALGRADIENT(J, theta) computes the numerical
%   gradient of the function J around theta. Calling y = J(theta) should
%   return the function value at theta.

% Notes: The following code implements numerical gradient checking, and 
%        returns the numerical gradient.It sets numgrad(i) to (a numerical 
%        approximation of) the partial derivative of J with respect to the 
%        i-th input argument, evaluated at theta. (i.e., numgrad(i) should 
%        be the (approximately) the partial derivative of J with respect 
%        to theta(i).)
%                

numgrad = zeros(size(theta));
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
    % Set perturbation vector
    perturb(p) = e;
    % Compute Numerical Gradient
    numgrad(p) = ( J(theta + perturb) - J(theta - perturb)) / (2*e);
    perturb(p) = 0;
end
end

View Code

代碼3：應用FP和BP算法實現計算隱藏層爲1層的神經網絡的代價函數以及其對鏈接權重的偏導數

function [J grad] = nnCostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
%   X, y, lambda) computes the cost and gradient of the neural network. The
%   parameters for the neural network are "unrolled" into the vector
%   nn_params and need to be converted back into the weight matrices. 
% 
%   The returned parameter grad should be a "unrolled" vector of the
%   partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network:Theta1: 1->2; Theta2: 2->3 
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));
           
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);
J = 0;
Theta1_grad = zeros(size(Theta1));  
Theta2_grad = zeros(size(Theta2));

%         Note: The vector y passed into the function is a vector of labels
%               containing values from 1..K. You need to map this vector into a 
%               binary vector of 1's and 0's to be used with the neural network
%               cost function.

for i = 1:m
    % compute activation by Forward Propagation
    a1 = [1; X(i,:)'];
    z2 = Theta1 * a1;
    a2 = [1; sigmoid(z2)];
    z3 = Theta2 * a2;
    h = sigmoid(z3);
    
    yy = zeros(num_labels,1);
    yy(y(i)) = 1;              % 訓練集的真實值yy
   
    J = J + sum(-yy .* log(h) - (1-yy) .* log(1-h));
    
    % Back Propagation 
    delta3 = h - yy;
    delta2 = (Theta2(:,2:end)' * delta3) .* sigmoidGradient(z2); %注意要除去偏移單元的鏈接權重
    
    Theta2_grad = Theta2_grad + delta3 * a2';   
    Theta1_grad = Theta1_grad + delta2 * a1';
end

J = J / m + lambda * (sum(sum(Theta1(:,2:end) .^ 2)) + sum(sum(Theta2(:,2:end) .^ 2))) / (2*m);

Theta2_grad = Theta2_grad / m;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + lambda * Theta2(:,2:end) / m; % regularized nn

Theta1_grad = Theta1_grad / m;
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + lambda * Theta1(:,2:end) / m; % regularized nn

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];

end