使用matlab自帶工具實現rcnn

平臺：matlab2016b網絡

matlab自帶一個cifar10Net工具可用於深度學習。ide

圖片標註

這裏使用的是matlab自帶的工具trainingImageLabeler對圖像進行roi的標註。函數

選擇AddImages將要訓練的圖片放進去(能夠放入多張圖片)，在ROI Label區域右鍵能夠選擇改變label 的color和name，若是要訓練多個類，也能夠點擊Add ROI Label來添加label。工具

全部圖像標註完成後點擊Export ROIs後會獲得一個table（或stuct）變量，使用學習

save(‘file’，‘variable’);

命令來保存由於cifar10Net使用的是table，若是你的數據集使用的是stuct，這裏使用url

data=struct2table(file);

來將stuct轉化爲tablespa

imageFilename表明了圖片所存儲的位置； tire表明了圖片中標註的輪胎，用矩陣存儲，分別爲roi左上的座標(x,y)和roi的大小(width,height)；.net

RCNN訓練

咱們來查看下網絡結構code

load('rcnnStopSigns.mat','cifar10Net');
cifar10Net.Layers

會獲得如下輸出orm

ans = 

15x1 Layer array with layers:

 1   'imageinput'    Image Input             32x32x3 images with 'zerocenter' normalization
 2   'conv'          Convolution             32 5x5x3 convolutions with stride [1  1] and padding [2  2]
 3   'relu'          ReLU                    ReLU
 4   'maxpool'       Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0]
 5   'conv_1'        Convolution             32 5x5x32 convolutions with stride [1  1] and padding [2  2]
 6   'relu_1'        ReLU                    ReLU
 7   'maxpool_1'     Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0]
 8   'conv_2'        Convolution             64 5x5x32 convolutions with stride [1  1] and padding [2  2]
 9   'relu_2'        ReLU                    ReLU
10   'maxpool_2'     Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0]
11   'fc'            Fully Connected         64 fully connected layer
12   'relu_3'        ReLU                    ReLU
13   'fc_1'          Fully Connected         10 fully connected layer
14   'softmax'       Softmax                 softmax
15   'classoutput'   Classification Output   cross-entropy with 'airplane', 'automobile', and 8 other classes

經過觀察能夠看出，一共只有三個卷積層咱們要對這個網絡進行微調，由於我這裏只訓練了一個車輪，提供的數據中還包含有無標註的圖片，因此全鏈接層的輸出要改爲2。後面再接上一個softmax層和一個classificationLayer,而且定義訓練方式：

x=cifar10Net.Layers(1:end-3);

lastlayers = [
fullyConnectedLayer(2,'Name','fc8','WeightLearnRateFactor',1, 'BiasLearnRateFactor',1)
softmaxLayer('Name','softmax')
classificationLayer('Name','classification')
];

options = trainingOptions('sgdm', ...
 'MiniBatchSize', 32, ...
 'InitialLearnRate', 1e-6, ...
 'MaxEpochs', 100);

RCNN的訓練主要使用trainRCNNObjectDetector.m函數

detector = trainRCNNObjectDetector（groundTruth，network，options）

groundTruth - 具備2個或更多列的表。第一列必須包含圖像文件名。圖像能夠是灰度或真彩色，能夠是IMREAD支持的任何格式。其他列必須包含指定每一個圖像內對象位置的[x，y，width，height]邊框的M×4矩陣。每列表示單個對象類，例如。人，車，狗。其實就是以前使用trainingImageLabeler作標註獲得的數據。

network - 即爲CNN的網絡結構

options - 即爲網絡訓練的參數。包括初始化學習率、迭代次數、BatchSize等等。

除了以上三個參數外，還有

‘PositiveOverlapRange’ - 一個雙元素向量，指定0和1之間的邊界框重疊比例範圍。與指定範圍內(即以前作圖片標註畫出的框)的邊界框重疊的區域提案被用做正訓練樣本。Default: [0.5 1]

‘NegativeOverlapRange’ - 一個雙元素向量，指定0和1之間的邊界框重疊比例範圍。與指定範圍內(即以前作圖片標註畫出的框)的邊界框重疊的區域提案被用做負訓練樣本。Default: [0.1 0.5]

在訓練以前，RCNN會從訓練圖片中獲得不少候選框，其中知足正樣本要求的會被當作訓練正樣本，而知足負樣本要求的會被當作訓練負樣本。

‘NumStrongestRegions’ - 用於生成訓練樣本的最強區域建議的最大數量(即最後獲得的候選框數量)。下降該值以加快處理時間，以訓練準確性爲代價。將此設置爲inf以使用全部區域提案。Default: 2000

以後對訓練完成的結果進行檢測

clear;
tic;
load myRCNN.mat;
detectedImg = imread('cars_train_croped(227_227)\08031.jpg');

[bbox, score, label] = detect(myRCNN, detectedImg, 'MiniBatchSize', 20);

imshow(detectedImg);

idx=find(score>0.1);
bbox = bbox(idx, :);
n=size(idx,1);
for i=1:n
    annotation = sprintf('%s: (Confidence = %f)', label(idx(i)), score(idx(i)));
    de = insertObjectAnnotation(detectedImg, 'rectangle', bbox(i,:), annotation);
end

figure
imshow(de);
toc;

參考博客：https://blog.csdn.net/qq_33801763/article/details/77185457 https://blog.csdn.net/mr_curry/article/details/53160914 https://blog.csdn.net/u014096352/article/details/72854077