UFLDL深度學習筆記 (六)卷積神經網絡

UFLDL深度學習筆記 (六)卷積神經網絡

1. 主要思路

UFLDL 卷積神經網絡」主要講解了對大尺寸圖像應用前面所討論神經網絡學習的方法,其中的變化有兩條,第一,對大尺寸圖像的每一個小的patch矩陣應用相同的權值來計算隱藏層特徵,稱爲卷積特徵提取;第二,對計算出來的特徵矩陣作「減法」,把特徵矩陣縱橫等分爲多個區域,取每一個區域的平均值(或最大值)做爲輸出特徵,稱爲池化。這樣作的緣由主要是爲了下降數據規模,對於8X8的圖像輸入層有64個單元,而100X100的圖像,輸入單元有1E4個,相同特徵個數下須要訓練的權重參數個數呈平方倍增長。真實圖像大到必定程度,將會很難運行訓練。因此卷積特徵提取步驟中,使用在小尺寸patch上得到的權重參數做爲共享的權重對大尺寸圖像每一個小patch作卷積運算,來達到前向傳播的做用,結果中包含大圖像每一個小鄰域的特徵。而池化步驟中對過大的特徵矩陣作平均、或取最大值更是明顯的「下采樣」特徵矩陣了。php

2. 卷積和前向傳播的關係

咱們知道對於一個典型的神經網絡,訓練到的權重參數維度表示爲\(W^{(1)}_{hiddenSize \times inputSize} , b^{(1)}_{hiddenSize \times 1} ; _{hiddenSize=patchDim \times patchDim}\),在原文「UFLDL 卷積神經網絡」中,所描述的過程是 「取\(W^{(1)}\)與大圖像的每一個 $ patchDim \times patchDim $ 子圖像作乘法,對 \(f_{convoled}\) 值作卷積,就能夠獲得卷積後的矩陣」 。沒有解釋清楚經典的前向傳播爲何變成了圖像卷積的,以及誰和誰卷積。 接下來咱們來慢慢理解一下這個轉變過程。html

對於大圖像\(X_{r \times c}\)咱們設想的是取每個子矩陣\(X_{patchDim \times patchDim}\)\(W^{(1)}\)按照小尺寸圖像來作前向傳播,因此共須要作\((r-patchDim)\times(c-patchDim)\)次前向傳播,每一次表示爲\(h_{hiddenSize \times 1}=\sigma( W^{(1)}*X_{patchDim \times patchDim} (:)+b^{(1)}_{hiddenSize \times 1})\). 因此說會獲得\(hiddenSize \times (r-patchDim)\times(c-patchDim)\)的特徵矩陣。接下來須要作一個交換,才能看出來怎麼用卷積函數;計算\(h_{hiddenSize \times 1}\)的步驟裏\(W^{(1)}\)的每一行和和\(X_{patchDim \times patchDim}\)轉成一列向量後進行點積,能夠和作\((r-patchDim)\times(c-patchDim)\)次前向傳播順序交換,也就是說取\(W^{(1)}\)的每一行轉成\(patchDim\times patchDim\)的矩陣和\(X_{r \times c}\)的每個子矩陣\(X_{patchDim \times patchDim}\)作點乘求和,這不正好是二維卷積嗎!,是的,卷積就是這樣來的,本質上是對大圖像每一個小鄰域的前向傳播,正好運用了二維矩陣卷積這個工具。git

3. 代碼實現

有了前面的認識,咱們再來看卷積計算的實現代碼, 有兩點須要注意的:github

  • feature = W(featureNum,:,:,channel);這一行表明取出\(W^{(1)}_{hiddenSize \times inputSize}\)的每一行,而後用二維卷積函數對每張圖片im = squeeze(images(:, :, channel, imageNum));作前向傳播conv2(im, feature, 'valid')
  • 爲何要對每張圖片的RGB三個通道的卷積累加呢? 這是由於訓練權重時\(W^{(1)}_{hiddenSize \times inputSize}\)\(inputSize\)是把RGB通道展開變成一維向量後訓練出來的,對大圖像的小patch作前向傳播\(W^{(1)}_{hiddenSize \times inputSize}\)的每一行與\(X_{patchDim \times patchDim}\)作點積是RGB三段的累加,因此須要累加三個通道的特徵響應,即convolvedImage = convolvedImage + conv2(im, feature, 'valid');

池化部分比較直觀,再也不展開描述,根據cnnExercise.m步驟,除了複用原來的稀疏自編碼、softmax、棧式自編碼部分代碼,咱們要編寫cnnConvolve.mcnnPool.m,總體代碼詳見https://github.com/codgeek/deeplearning算法

function convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch)
%cnnConvolve Returns the convolution of the features given by W and b with
%the given images
%
% Parameters:
%  patchDim - patch (feature) dimension
%  numFeatures - number of features
%  images - large images to convolve with, matrix in the form
%           images(r, c, channel, image number)
%  W, b - W, b for features from the sparse autoencoder
%  ZCAWhite, meanPatch - ZCAWhitening and meanPatch matrices used for
%                        preprocessing
%
% Returns:
%  convolvedFeatures - matrix of convolved features in the form
%                      convolvedFeatures(featureNum, imageNum, imageRow, imageCol)

numImages = size(images, 4);
imageDim = size(images, 1);
imageChannels = size(images, 3);
% -------------------- YOUR CODE HERE --------------------
% Precompute the matrices that will be used during the convolution. Recall
% that you need to take into account the whitening and mean subtraction
% steps
W = W * ZCAWhite;% W *(ZCAWhite *(X - meanPatch)) equals to (W *ZCAWhite)*X - (W *ZCAWhite)*meanPatch; 
substractMean = W * meanPatch;
W = reshape(W,numFeatures, patchDim, patchDim, imageChannels);

convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);
for imageNum = 1:numImages
 for featureNum = 1:numFeatures
    convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1);
    for channel = 1:imageChannels
      % Obtain the feature (patchDim x patchDim) needed during the convolution
      feature = W(featureNum,:,:,channel); % each row of W is one of numFeatures that has learned
      % ------------------------
      % Flip the feature matrix because of the definition of convolution, as explained later
      feature = rot90(squeeze(feature),2);

      % Obtain the image
      im = squeeze(images(:, :, channel, imageNum));
      % Convolve "feature" with "im", adding the result to convolvedImage
      % be sure to do a 'valid' convolution
      % ---- YOUR CODE HERE ----
      convolvedImage = convolvedImage + conv2(im, feature, 'valid');% (imageDim - patchDim + 1) X (imageDim - patchDim + 1)
     % ------------------------
    end
    % Subtract the bias unit (correcting for the mean subtraction as well)
    % Then, apply the sigmoid function to get the hidden activation
    % ---- YOUR CODE HERE ----
    % meanPatch: numFeatures X 1
    convolvedImage = sigmoid(convolvedImage + b(featureNum) - substractMean(featureNum));
    % ------------------------
    % The convolved feature is the sum of the convolved values for all channels
    convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage;
  end
end
end

function sigm = sigmoid(x)
  sigm = 1 ./ (1 + exp(-x));
end

4.圖示與結果

數據集來自STL-10 dataset. 以及咱們在前一節UFLDL深度學習筆記 (五)自編碼線性解碼器中訓練獲得的該數據集下采樣的8X8 patch上的特徵參數STL10Features.mat,下采樣先後圖片自己對好比下。網絡

設定與練習說明相同的參數,輸入每一個圖像爲64X64X3的彩色圖片,共有四個分類 (airplane, car, cat, dog)。運行代碼主文件cnnExercise.m 能夠看到預測準確率爲80.4%。與練習的標準結果吻合。分類的準確其實並不高,一方面緣由在於下采樣倍率較大,下采樣後的圖片人眼基本沒法分類,而經過特徵學習、進一步進行卷積網絡學習,仍然能夠達到必定的準確率。app


咱們看一下在ImageNet 全量數據集上分類準確率數據,統計數據來來自An Analysis of Deep Neural Network Models for Practical Applications. 截止到2017年初的統計,最好的深度神經網絡(DNN)算法分類準確率也不過80%,咱們怎麼輕鬆就達到80%了,吳恩達老師是否是也太牛了? 不過不要高興太早了,上述的實驗僅僅使用了四個圖片類型的數據集,連STL10上面的10種類別還沒用完,更不用說ImageNet上更多的種類了~ O(∩_∩)O哈哈~
下一階段準備用STL10的全量10種圖片,以及ImageNet上的數據集進行算法訓練。STL10包含100000張未標註圖片,500張訓練圖片、800張測試圖片。而ImageNet的數據更大包含14,197,122 張圖片,共有21841個同類集合。包含SIFT特徵的圖片爲 1.2 million,能夠理解爲有標註圖片數量。 相信不斷增長的海量數據會不斷提升分類算法的準確率!函數

相關文章
相關標籤/搜索