UFLDL深度學習筆記（六）卷積神經網絡

時間 2019-11-20

標籤 ufldl 深度學習筆記神經網絡简体版

原文原文鏈接

UFLDL深度學習筆記（六）卷積神經網絡

1. 主要思路

「UFLDL 卷積神經網絡」主要講解了對大尺寸圖像應用前面所討論神經網絡學習的方法，其中的變化有兩條，第一，對大尺寸圖像的每一個小的patch矩陣應用相同的權值來計算隱藏層特徵，稱爲卷積特徵提取；第二，對計算出來的特徵矩陣作「減法」，把特徵矩陣縱橫等分爲多個區域，取每一個區域的平均值(或最大值)做爲輸出特徵，稱爲池化。這樣作的緣由主要是爲了下降數據規模，對於8X8的圖像輸入層有64個單元，而100X100的圖像，輸入單元有1E4個，相同特徵個數下須要訓練的權重參數個數呈平方倍增長。真實圖像大到必定程度，將會很難運行訓練。因此卷積特徵提取步驟中，使用在小尺寸patch上得到的權重參數做爲共享的權重對大尺寸圖像每一個小patch作卷積運算，來達到前向傳播的做用，結果中包含大圖像每一個小鄰域的特徵。而池化步驟中對過大的特徵矩陣作平均、或取最大值更是明顯的「下采樣」特徵矩陣了。php

2. 卷積和前向傳播的關係

咱們知道對於一個典型的神經網絡，訓練到的權重參數維度表示爲$W^{(1)}_{hiddenSize \times inputSize} , b^{(1)}_{hiddenSize \times 1} ; _{hiddenSize=patchDim \times patchDim}$,在原文「UFLDL 卷積神經網絡」中，所描述的過程是「取$W^{(1)}$與大圖像的每一個 $ patchDim \times patchDim $ 子圖像作乘法，對 $f_{convoled}$ 值作卷積，就能夠獲得卷積後的矩陣」。沒有解釋清楚經典的前向傳播爲何變成了圖像卷積的，以及誰和誰卷積。接下來咱們來慢慢理解一下這個轉變過程。html

對於大圖像$X_{r \times c}$咱們設想的是取每個子矩陣$X_{patchDim \times patchDim}$和$W^{(1)}$按照小尺寸圖像來作前向傳播，因此共須要作$(r-patchDim)\times(c-patchDim)$次前向傳播，每一次表示爲$h_{hiddenSize \times 1}=\sigma( W^{(1)}*X_{patchDim \times patchDim} (:)+b^{(1)}_{hiddenSize \times 1})$. 因此說會獲得$hiddenSize \times (r-patchDim)\times(c-patchDim)$的特徵矩陣。接下來須要作一個交換，才能看出來怎麼用卷積函數；計算$h_{hiddenSize \times 1}$的步驟裏$W^{(1)}$的每一行和和$X_{patchDim \times patchDim}$轉成一列向量後進行點積，能夠和作$(r-patchDim)\times(c-patchDim)$次前向傳播順序交換，也就是說取$W^{(1)}$的每一行轉成$patchDim\times patchDim$的矩陣和$X_{r \times c}$的每個子矩陣$X_{patchDim \times patchDim}$作點乘求和，這不正好是二維卷積嗎！，是的，卷積就是這樣來的，本質上是對大圖像每一個小鄰域的前向傳播，正好運用了二維矩陣卷積這個工具。git

3. 代碼實現

有了前面的認識，咱們再來看卷積計算的實現代碼, 有兩點須要注意的：github

feature = W(featureNum,:,:,channel);這一行表明取出$W^{(1)}_{hiddenSize \times inputSize}$的每一行，而後用二維卷積函數對每張圖片im = squeeze(images(:, :, channel, imageNum));作前向傳播conv2(im, feature, 'valid')。
爲何要對每張圖片的RGB三個通道的卷積累加呢? 這是由於訓練權重時$W^{(1)}_{hiddenSize \times inputSize}$的$inputSize$是把RGB通道展開變成一維向量後訓練出來的，對大圖像的小patch作前向傳播$W^{(1)}_{hiddenSize \times inputSize}$的每一行與$X_{patchDim \times patchDim}$作點積是RGB三段的累加，因此須要累加三個通道的特徵響應，即convolvedImage = convolvedImage + conv2(im, feature, 'valid');

池化部分比較直觀，再也不展開描述，根據cnnExercise.m步驟，除了複用原來的稀疏自編碼、softmax、棧式自編碼部分代碼，咱們要編寫cnnConvolve.m，cnnPool.m，總體代碼詳見https://github.com/codgeek/deeplearning算法

function convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch)
%cnnConvolve Returns the convolution of the features given by W and b with
%the given images
%
% Parameters:
%  patchDim - patch (feature) dimension
%  numFeatures - number of features
%  images - large images to convolve with, matrix in the form
%           images(r, c, channel, image number)
%  W, b - W, b for features from the sparse autoencoder
%  ZCAWhite, meanPatch - ZCAWhitening and meanPatch matrices used for
%                        preprocessing
%
% Returns:
%  convolvedFeatures - matrix of convolved features in the form
%                      convolvedFeatures(featureNum, imageNum, imageRow, imageCol)

numImages = size(images, 4);
imageDim = size(images, 1);
imageChannels = size(images, 3);
% -------------------- YOUR CODE HERE --------------------
% Precompute the matrices that will be used during the convolution. Recall
% that you need to take into account the whitening and mean subtraction
% steps
W = W * ZCAWhite;% W *(ZCAWhite *(X - meanPatch)) equals to (W *ZCAWhite)*X - (W *ZCAWhite)*meanPatch; 
substractMean = W * meanPatch;
W = reshape(W,numFeatures, patchDim, patchDim, imageChannels);

convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);
for imageNum = 1:numImages
 for featureNum = 1:numFeatures
    convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1);
    for channel = 1:imageChannels
      % Obtain the feature (patchDim x patchDim) needed during the convolution
      feature = W(featureNum,:,:,channel); % each row of W is one of numFeatures that has learned
      % ------------------------
      % Flip the feature matrix because of the definition of convolution, as explained later
      feature = rot90(squeeze(feature),2);

      % Obtain the image
      im = squeeze(images(:, :, channel, imageNum));
      % Convolve "feature" with "im", adding the result to convolvedImage
      % be sure to do a 'valid' convolution
      % ---- YOUR CODE HERE ----
      convolvedImage = convolvedImage + conv2(im, feature, 'valid');% (imageDim - patchDim + 1) X (imageDim - patchDim + 1)
     % ------------------------
    end
    % Subtract the bias unit (correcting for the mean subtraction as well)
    % Then, apply the sigmoid function to get the hidden activation
    % ---- YOUR CODE HERE ----
    % meanPatch: numFeatures X 1
    convolvedImage = sigmoid(convolvedImage + b(featureNum) - substractMean(featureNum));
    % ------------------------
    % The convolved feature is the sum of the convolved values for all channels
    convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage;
  end
end
end

function sigm = sigmoid(x)
  sigm = 1 ./ (1 + exp(-x));
end

4.圖示與結果

數據集來自STL-10 dataset. 以及咱們在前一節UFLDL深度學習筆記（五）自編碼線性解碼器中訓練獲得的該數據集下采樣的8X8 patch上的特徵參數STL10Features.mat，下采樣先後圖片自己對好比下。網絡

設定與練習說明相同的參數，輸入每一個圖像爲64X64X3的彩色圖片，共有四個分類 (airplane, car, cat, dog)。運行代碼主文件cnnExercise.m 能夠看到預測準確率爲80.4%。與練習的標準結果吻合。分類的準確其實並不高，一方面緣由在於下采樣倍率較大，下采樣後的圖片人眼基本沒法分類，而經過特徵學習、進一步進行卷積網絡學習，仍然能夠達到必定的準確率。app

咱們看一下在ImageNet 全量數據集上分類準確率數據，統計數據來來自An Analysis of Deep Neural Network Models for Practical Applications. 截止到2017年初的統計，最好的深度神經網絡(DNN)算法分類準確率也不過80%，咱們怎麼輕鬆就達到80%了，吳恩達老師是否是也太牛了? 不過不要高興太早了，上述的實驗僅僅使用了四個圖片類型的數據集，連STL10上面的10種類別還沒用完，更不用說ImageNet上更多的種類了~ O(∩_∩)O哈哈~
下一階段準備用STL10的全量10種圖片，以及ImageNet上的數據集進行算法訓練。STL10包含100000張未標註圖片，500張訓練圖片、800張測試圖片。而ImageNet的數據更大包含14,197,122 張圖片，共有21841個同類集合。包含SIFT特徵的圖片爲 1.2 million，能夠理解爲有標註圖片數量。相信不斷增長的海量數據會不斷提升分類算法的準確率！函數