Deep Learning 學習隨記（八）CNN（Convolutional neural network）理解

時間 2020-08-10

標籤 deep learning 學習 cnn convolutional neural network 理解欄目系統網絡简体版

原文原文鏈接

前面Andrew Ng的講義基本看完了。Andrew講的真是通俗易懂，只是不過癮啊，講的太少了。趁着看完那章convolution and pooling，本身又去翻了翻CNN的相關東西。html

當時看講義時，有一點是不太清楚的，就是講義只講了一次convolution和一次pooling，並且第一次的convolution很容易理解，針對一副圖像來的，可是通過一次convolution和poolinggit

後，一副圖像變成了好多副特徵圖（feature map）這時候再進行convolution時，該怎麼辦呢？因此去瞅了瞅CNN的相關論文。github

CNN最經典的案例應該是LeNet-5這個數字識別的任務了吧。這裏能夠看下Yann Lecun大牛網頁 http://yann.lecun.com/exdb/lenet/index.html，以及tutorial： http://deeplearning.net/tutorial/lenet.html。
學習

另外，一篇比較詳細的講CNN的中文博客（懶得看英語的話，就直接看這篇博客了）：http://blog.csdn.net/zouxy09/article/details/8781543。this

這裏面都給出了CNN的結構圖以下。編碼

具體每一個的含義這裏也不說了，能夠參考前面提到的資料。spa

看告終構圖基本瞭解了。有幾點一開始沒看懂的須要說明的：.net

1. 關於每個C層的feature map的個數。code

好比C1是6個，C3是16個，這個應該是經驗值，或者是經過實驗給出的一個比較優的值，這點好多資料都沒有說清楚。不過要注意的是，通常後面的要比前面的個數多些。htm

2. 關於後面的C層。好比S2到C3，並非一一對應的。

也就是說，並非對S2中的每個feature map與後面16個卷積核進行卷積。而是取其中幾個。看了下面圖應該很容易理解：

縱向是S2層的6個feature map，橫向是C3的16個卷積核，X表示二者相連。好比說，第0個卷積核，只用在了前面3個feature map上，把這3個卷積結果加權相加或者平均就獲得C3層的第一個（如按照上圖標示應該是第0個）feature map。至於這個對應表示怎麼來的，也不得而知啊，應該也是經驗或者經過大量實驗得來的吧。這點還不是很清楚...固然，若是想所有相連也不是不能夠，只是對5個相加或者進行加權平均而已（好比第15號卷積核）。

3. 關於每一層的卷積核是怎麼來的。

從Andrew的講義中，咱們是先從一些小patch裏用稀疏自編碼學習到100個特徵（隱層100個單元），而後至關於100個卷積核（不知這樣理解對不對）。這樣子後面的卷積層的核怎麼作呢？每一層都用前一層pooling（或者降採樣）後獲得的feature map再進行一次自編碼學習？。這裏就想到了去看toolbox裏的CNN的代碼，但感受不是同一個套路：

下面是cnnsetup.m的代碼：

function net = cnnsetup(net, x, y)
    inputmaps = 1;
    mapsize = size(squeeze(x(:, :, 1)));

    for l = 1 : numel(net.layers)   %  layer
        if strcmp(net.layers{l}.type, 's')
            mapsize = mapsize / net.layers{l}.scale;
            assert(all(floor(mapsize)==mapsize), ['Layer ' num2str(l) ' size must be integer. Actual: ' num2str(mapsize)]);
            for j = 1 : inputmaps
                net.layers{l}.b{j} = 0;
            end
        end
        if strcmp(net.layers{l}.type, 'c')
            mapsize = mapsize - net.layers{l}.kernelsize + 1;
            fan_out = net.layers{l}.outputmaps * net.layers{l}.kernelsize ^ 2;
            for j = 1 : net.layers{l}.outputmaps  %  output map
                fan_in = inputmaps * net.layers{l}.kernelsize ^ 2;
                for i = 1 : inputmaps  %  input map
                    net.layers{l}.k{i}{j} = (rand(net.layers{l}.kernelsize) - 0.5) * 2 * sqrt(6 / (fan_in + fan_out));
                end
                net.layers{l}.b{j} = 0;
            end
            inputmaps = net.layers{l}.outputmaps;
        end
    end
    % 'onum' is the number of labels, that's why it is calculated using size(y, 1). If you have 20 labels so the output of the network will be 20 neurons.
    % 'fvnum' is the number of output neurons at the last layer, the layer just before the output layer.
    % 'ffb' is the biases of the output neurons.
    % 'ffW' is the weights between the last layer and the output neurons. Note that the last layer is fully connected to the output layer, that's why the size of the weights is (onum * fvnum)
    fvnum = prod(mapsize) * inputmaps;
    onum = size(y, 1);

    net.ffb = zeros(onum, 1);
    net.ffW = (rand(onum, fvnum) - 0.5) * 2 * sqrt(6 / (onum + fvnum));
end

裏面inputmaps是上一層的feature map數。outputmaps當前層的feature map數。其中有一行代碼是

net.layers{l}.k{i}{j} = (rand(net.layers{l}.kernelsize) - 0.5) * 2 * sqrt(6 / (fan_in + fan_out));

這一句應該就是初始化卷積核了。這裏是隨機生成的一個在某個範圍內的kernelsize*kernelsize的卷積核。其中i和j分別對應inputmaps和outputmaps。也就是說是爲每個鏈接初始化了一個卷積核。

下面再看下cnnff.m即前向傳播的部分代碼：

function net = cnnff(net, x)
    n = numel(net.layers);
    net.layers{1}.a{1} = x;
    inputmaps = 1;

    for l = 2 : n   %  for each layer
        if strcmp(net.layers{l}.type, 'c')
            %  !!below can probably be handled by insane matrix operations
            for j = 1 : net.layers{l}.outputmaps   %  for each output map
                %  create temp output map
                z = zeros(size(net.layers{l - 1}.a{1}) - [net.layers{l}.kernelsize - 1 net.layers{l}.kernelsize - 1 0]);
                for i = 1 : inputmaps   %  for each input map
                    %  convolve with corresponding kernel and add to temp output map
                    z = z + convn(net.layers{l - 1}.a{i}, net.layers{l}.k{i}{j}, 'valid');
                end
                %  add bias, pass through nonlinearity
                net.layers{l}.a{j} = sigm(z + net.layers{l}.b{j});
            end
            %  set number of input maps to this layers number of outputmaps
            inputmaps = net.layers{l}.outputmaps;
        elseif strcmp(net.layers{l}.type, 's')
            %  downsample
            for j = 1 : inputmaps
                z = convn(net.layers{l - 1}.a{j}, ones(net.layers{l}.scale) / (net.layers{l}.scale ^ 2), 'valid');   %  !! replace with variable
                net.layers{l}.a{j} = z(1 : net.layers{l}.scale : end, 1 : net.layers{l}.scale : end, :);
            end
        end
    end

    %  concatenate all end layer feature maps into vector
    net.fv = [];
    for j = 1 : numel(net.layers{n}.a)
        sa = size(net.layers{n}.a{j});
        net.fv = [net.fv; reshape(net.layers{n}.a{j}, sa(1) * sa(2), sa(3))];
    end
    %  feedforward into output perceptrons
    net.o = sigm(net.ffW * net.fv + repmat(net.ffb, 1, size(net.fv, 2)));

end

其中卷積層的代碼確實是用了提早初始化的卷積核：

for j = 1 : net.layers{l}.outputmaps   %  for each output map
         %  create temp output map
         z = zeros(size(net.layers{l - 1}.a{1}) - [net.layers{l}.kernelsize - 1 net.layers{l}.kernelsize - 1 0]);
         for i = 1 : inputmaps   %  for each input map
             %  convolve with corresponding kernel and add to temp output map
             z = z + convn(net.layers{l - 1}.a{i}, net.layers{l}.k{i}{j}, 'valid');
         end
         %  add bias, pass through nonlinearity
         net.layers{l}.a{j} = sigm(z + net.layers{l}.b{j});
end