「UFLDL 卷積神經網絡」主要講解了對大尺寸圖像應用前面所討論神經網絡學習的方法,其中的變化有兩條,第一,對大尺寸圖像的每一個小的patch矩陣應用相同的權值來計算隱藏層特徵,稱爲卷積特徵提取
;第二,對計算出來的特徵矩陣作「減法」,把特徵矩陣縱橫等分爲多個區域,取每一個區域的平均值(或最大值)做爲輸出特徵,稱爲池化
。這樣作的緣由主要是爲了下降數據規模,對於8X8的圖像輸入層有64個單元,而100X100的圖像,輸入單元有1E4個,相同特徵個數下須要訓練的權重參數個數呈平方倍增長。真實圖像大到必定程度,將會很難運行訓練。因此卷積特徵提取步驟中,使用在小尺寸patch上得到的權重參數做爲共享的權重
對大尺寸圖像每一個小patch作卷積運算,來達到前向傳播的做用,結果中包含大圖像每一個小鄰域的特徵。而池化步驟中對過大的特徵矩陣作平均、或取最大值更是明顯的「下采樣」特徵矩陣了。php
咱們知道對於一個典型的神經網絡,訓練到的權重參數維度表示爲\(W^{(1)}_{hiddenSize \times inputSize} , b^{(1)}_{hiddenSize \times 1} ; _{hiddenSize=patchDim \times patchDim}\),在原文「UFLDL 卷積神經網絡」中,所描述的過程是 「取\(W^{(1)}\)與大圖像的每一個 $ patchDim \times patchDim $ 子圖像作乘法,對 \(f_{convoled}\) 值作卷積,就能夠獲得卷積後的矩陣」 。沒有解釋清楚經典的前向傳播爲何變成了圖像卷積的,以及誰和誰卷積。 接下來咱們來慢慢理解一下這個轉變過程。html
對於大圖像\(X_{r \times c}\)咱們設想的是取每個子矩陣\(X_{patchDim \times patchDim}\)和\(W^{(1)}\)按照小尺寸圖像來作前向傳播,因此共須要作\((r-patchDim)\times(c-patchDim)\)次前向傳播,每一次表示爲\(h_{hiddenSize \times 1}=\sigma( W^{(1)}*X_{patchDim \times patchDim} (:)+b^{(1)}_{hiddenSize \times 1})\). 因此說會獲得\(hiddenSize \times (r-patchDim)\times(c-patchDim)\)的特徵矩陣。接下來須要作一個交換,才能看出來怎麼用卷積函數;計算\(h_{hiddenSize \times 1}\)的步驟裏\(W^{(1)}\)的每一行和和\(X_{patchDim \times patchDim}\)轉成一列向量後進行點積,能夠和作\((r-patchDim)\times(c-patchDim)\)次前向傳播順序交換,也就是說取\(W^{(1)}\)的每一行轉成\(patchDim\times patchDim\)的矩陣和\(X_{r \times c}\)的每個子矩陣\(X_{patchDim \times patchDim}\)作點乘求和,這不正好是二維卷積嗎!,是的,卷積就是這樣來的,本質上是對大圖像每一個小鄰域的前向傳播,正好運用了二維矩陣卷積這個工具。git
有了前面的認識,咱們再來看卷積計算的實現代碼, 有兩點須要注意的:github
feature = W(featureNum,:,:,channel);
這一行表明取出\(W^{(1)}_{hiddenSize \times inputSize}\)的每一行,而後用二維卷積函數對每張圖片im = squeeze(images(:, :, channel, imageNum));
作前向傳播conv2(im, feature, 'valid')
。convolvedImage = convolvedImage + conv2(im, feature, 'valid');
池化部分比較直觀,再也不展開描述,根據cnnExercise.m
步驟,除了複用原來的稀疏自編碼、softmax、棧式自編碼部分代碼,咱們要編寫cnnConvolve.m
,cnnPool.m
,總體代碼詳見https://github.com/codgeek/deeplearning算法
function convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch) %cnnConvolve Returns the convolution of the features given by W and b with %the given images % % Parameters: % patchDim - patch (feature) dimension % numFeatures - number of features % images - large images to convolve with, matrix in the form % images(r, c, channel, image number) % W, b - W, b for features from the sparse autoencoder % ZCAWhite, meanPatch - ZCAWhitening and meanPatch matrices used for % preprocessing % % Returns: % convolvedFeatures - matrix of convolved features in the form % convolvedFeatures(featureNum, imageNum, imageRow, imageCol) numImages = size(images, 4); imageDim = size(images, 1); imageChannels = size(images, 3); % -------------------- YOUR CODE HERE -------------------- % Precompute the matrices that will be used during the convolution. Recall % that you need to take into account the whitening and mean subtraction % steps W = W * ZCAWhite;% W *(ZCAWhite *(X - meanPatch)) equals to (W *ZCAWhite)*X - (W *ZCAWhite)*meanPatch; substractMean = W * meanPatch; W = reshape(W,numFeatures, patchDim, patchDim, imageChannels); convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1); for imageNum = 1:numImages for featureNum = 1:numFeatures convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1); for channel = 1:imageChannels % Obtain the feature (patchDim x patchDim) needed during the convolution feature = W(featureNum,:,:,channel); % each row of W is one of numFeatures that has learned % ------------------------ % Flip the feature matrix because of the definition of convolution, as explained later feature = rot90(squeeze(feature),2); % Obtain the image im = squeeze(images(:, :, channel, imageNum)); % Convolve "feature" with "im", adding the result to convolvedImage % be sure to do a 'valid' convolution % ---- YOUR CODE HERE ---- convolvedImage = convolvedImage + conv2(im, feature, 'valid');% (imageDim - patchDim + 1) X (imageDim - patchDim + 1) % ------------------------ end % Subtract the bias unit (correcting for the mean subtraction as well) % Then, apply the sigmoid function to get the hidden activation % ---- YOUR CODE HERE ---- % meanPatch: numFeatures X 1 convolvedImage = sigmoid(convolvedImage + b(featureNum) - substractMean(featureNum)); % ------------------------ % The convolved feature is the sum of the convolved values for all channels convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage; end end end function sigm = sigmoid(x) sigm = 1 ./ (1 + exp(-x)); end
數據集來自STL-10 dataset. 以及咱們在前一節UFLDL深度學習筆記 (五)自編碼線性解碼器中訓練獲得的該數據集下采樣的8X8 patch上的特徵參數STL10Features.mat
,下采樣先後圖片自己對好比下。網絡
設定與練習說明相同的參數,輸入每一個圖像爲64X64X3的彩色圖片,共有四個分類 (airplane, car, cat, dog)。運行代碼主文件cnnExercise.m 能夠看到預測準確率爲80.4%。與練習的標準結果吻合。分類的準確其實並不高,一方面緣由在於下采樣倍率較大,下采樣後的圖片人眼基本沒法分類,而經過特徵學習、進一步進行卷積網絡學習,仍然能夠達到必定的準確率。app
咱們看一下在ImageNet 全量數據集上分類準確率數據,統計數據來來自An Analysis of Deep Neural Network Models for Practical Applications. 截止到2017年初的統計,最好的深度神經網絡(DNN)算法分類準確率也不過80%,咱們怎麼輕鬆就達到80%了,吳恩達老師是否是也太牛了? 不過不要高興太早了,上述的實驗僅僅使用了四個圖片類型的數據集,連STL10上面的10種類別還沒用完,更不用說ImageNet上更多的種類了~ O(∩_∩)O哈哈~
下一階段準備用STL10的全量10種圖片,以及ImageNet上的數據集進行算法訓練。STL10包含100000張未標註圖片,500張訓練圖片、800張測試圖片。而ImageNet的數據更大包含14,197,122 張圖片,共有21841個同類集合。包含SIFT特徵的圖片爲 1.2 million,能夠理解爲有標註圖片數量。 相信不斷增長的海量數據會不斷提升分類算法的準確率!函數