content:php
5 Neural Networks (part two)html
5.1 cost function算法
5.2 Back Propagation網絡
5.3 神經網絡總結app
接上一篇4. Neural Networks (part one). 本文將先定義神經網絡的代價函數,而後介紹逆向傳播(Back Propagation: BP)算法,它能有效求解代價函數對鏈接權重的偏導,最後對訓練神經網絡的過程進行總結。dom
(注:正則化相關內容參見3.Bayesian statistics and Regularization)機器學習
(詳細推導過程參見反向傳播算法,以及李宏毅的機器學習課程:youtube,B站)。ide
圖5-1 BP算法步驟函數
在實現反向傳播算法時,有以下幾個須要注意的地方。學習
圖5-2 隨機初始化鏈接權重
圖5-3 數值方法求代價函數偏導的近似值
第一步,設計神經網絡結構。
隱藏層單元個數一般都是不肯定的。
通常選取神經網絡隱藏層單元個數的幾個經驗公式以下:
參考https://www.zhihu.com/question/46530834
此外,MNIST手寫數字識別中給出了以不一樣的神經網絡結構訓練的結果,供參考
第二步,實現正向傳播(FP)和反向傳播算法,這一步包括以下的子步驟。
第三步,用數值方法檢查求偏導的正確性
第四步,用梯度降低法或更先進的優化算法求使得代價函數最小的鏈接權重
在第四步中,因爲代價函數是非凸(non-convex)函數,因此在優化過程當中可能陷入局部最優值,但不必定比全局最優差不少(如圖5-4),在實際應用中一般不是大問題。也會有一些啓發式的算法(如模擬退火算法,遺傳算法等)來幫助跳出局部最優。
圖5-4 陷入局部最優(不必定比全局最優差不少)
代碼1:隨機初始化鏈接權重
function W = randInitializeWeights(L_in, L_out) %RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in %incoming connections and L_out outgoing connections % W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights % of a layer with L_in incoming connections and L_out outgoing % connections. % % Note that W should be set to a matrix of size(L_out, 1 + L_in) as % the column row of W handles the "bias" terms % W = zeros(L_out, 1 + L_in); % Instructions: Initialize W randomly so that we break the symmetry while % training the neural network. % % Note: The first row of W corresponds to the parameters for the bias units % epsilon_init = sqrt(6) / (sqrt(L_out+L_in)); W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init; end
代碼2:用數值方法求代價函數對鏈接權重偏導的近似值
function numgrad = computeNumericalGradient(J, theta) %COMPUTENUMERICALGRADIENT Computes the gradient using "finite differences" %and gives us a numerical estimate of the gradient. % numgrad = COMPUTENUMERICALGRADIENT(J, theta) computes the numerical % gradient of the function J around theta. Calling y = J(theta) should % return the function value at theta. % Notes: The following code implements numerical gradient checking, and % returns the numerical gradient.It sets numgrad(i) to (a numerical % approximation of) the partial derivative of J with respect to the % i-th input argument, evaluated at theta. (i.e., numgrad(i) should % be the (approximately) the partial derivative of J with respect % to theta(i).) % numgrad = zeros(size(theta)); perturb = zeros(size(theta)); e = 1e-4; for p = 1:numel(theta) % Set perturbation vector perturb(p) = e; % Compute Numerical Gradient numgrad(p) = ( J(theta + perturb) - J(theta - perturb)) / (2*e); perturb(p) = 0; end end
代碼3:應用FP和BP算法實現計算隱藏層爲1層的神經網絡的代價函數以及其對鏈接權重的偏導數
function [J grad] = nnCostFunction(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda) %NNCOSTFUNCTION Implements the neural network cost function for a two layer %neural network which performs classification % [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ... % X, y, lambda) computes the cost and gradient of the neural network. The % parameters for the neural network are "unrolled" into the vector % nn_params and need to be converted back into the weight matrices. % % The returned parameter grad should be a "unrolled" vector of the % partial derivatives of the neural network. % % Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices % for our 2 layer neural network:Theta1: 1->2; Theta2: 2->3 Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1)); Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1)); % Setup some useful variables m = size(X, 1); J = 0; Theta1_grad = zeros(size(Theta1)); Theta2_grad = zeros(size(Theta2)); % Note: The vector y passed into the function is a vector of labels % containing values from 1..K. You need to map this vector into a % binary vector of 1's and 0's to be used with the neural network % cost function. for i = 1:m % compute activation by Forward Propagation a1 = [1; X(i,:)']; z2 = Theta1 * a1; a2 = [1; sigmoid(z2)]; z3 = Theta2 * a2; h = sigmoid(z3); yy = zeros(num_labels,1); yy(y(i)) = 1; % 訓練集的真實值yy J = J + sum(-yy .* log(h) - (1-yy) .* log(1-h)); % Back Propagation delta3 = h - yy; delta2 = (Theta2(:,2:end)' * delta3) .* sigmoidGradient(z2); %注意要除去偏移單元的鏈接權重 Theta2_grad = Theta2_grad + delta3 * a2'; Theta1_grad = Theta1_grad + delta2 * a1'; end J = J / m + lambda * (sum(sum(Theta1(:,2:end) .^ 2)) + sum(sum(Theta2(:,2:end) .^ 2))) / (2*m); Theta2_grad = Theta2_grad / m; Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + lambda * Theta2(:,2:end) / m; % regularized nn Theta1_grad = Theta1_grad / m; Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + lambda * Theta1(:,2:end) / m; % regularized nn % Unroll gradients grad = [Theta1_grad(:) ; Theta2_grad(:)]; end