本週主要講解了反向傳播算法 (backpropagation algorithm),用於計算神經網絡中代價方程 (cost function) 對變量 theta 的偏導數值。本週主要內容即實現該算法。算法
function [J grad] = nnCostFunction(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda) %NNCOSTFUNCTION Implements the neural network cost function for a two layer %neural network which performs classification % [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ... % X, y, lambda) computes the cost and gradient of the neural network. The % parameters for the neural network are "unrolled" into the vector % nn_params and need to be converted back into the weight matrices. % % The returned parameter grad should be a "unrolled" vector of the % partial derivatives of the neural network. % % Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices % for our 2 layer neural network Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1)); Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1)); % Setup some useful variables m = size(X, 1); % You need to return the following variables correctly J = 0; Theta1_grad = zeros(size(Theta1)); Theta2_grad = zeros(size(Theta2)); % ====================== YOUR CODE HERE ====================== % 記得加上 bias unit X = [ones(m, 1) X]; % 將 y 值轉成 1 * 10 的矩陣 Y = zeros(m , num_labels); for i = 1:m Y(i , y(i)) = 1; end % Forwardpropagation a2 = sigmoid(X * Theta1'); a2 = [ones(size(a2 , 1) , 1) a2]; a3 = sigmoid(a2 * Theta2') ; J = sum( sum( -Y .* log(a3) - (1 - Y) .* log(1 - a3) ) ) / m; Theta1_withoutBias = Theta1(: , 2:end); Theta2_withoutBias = Theta2(: , 2:end); % regularized cost function J = J + lambda * (sum(sum(Theta1_withoutBias .^ 2)) + sum(sum(Theta2_withoutBias .^ 2))) / (2 * m); % Backpropagation d1 = zeros(size(Theta1)); d2 = zeros(size(Theta2)) ; theta1_wtbias = Theta1; theta1_wtbias(: , 1) = 0; theta2_wtbias = Theta2; theta2_wtbias(: , 1) = 0; for t = 1:m yt = Y(t , :); a3t = a3(t , :); a2t = a2(t , :); a1t = X(t , :); delta3 = a3t - yt; delta2 = delta3 * Theta2 .* (a2t .* (1 - a2t)); delta2 = delta2(2:end); d2 = d2 + delta3' * a2t; d1 = d1 + delta2' * a1t; end % regularized theta Theta1_grad = Theta1_grad + d1 / m + lambda * theta1_wtbias / m; Theta2_grad = Theta2_grad + d2 / m + lambda * theta2_wtbias / m; % ------------------------------------------------------------- % ========================================================================= % Unroll gradients grad = [Theta1_grad(:) ; Theta2_grad(:)]; end