1、寫在前面python
fcn是首次使用cnn來實現語義分割的,論文地址:fully convolutional networks for semantic segmentationgit
實現代碼地址:https://github.com/shelhamer/fcn.berkeleyvision.orggithub
全卷積神經網絡主要使用了三種技術:網絡
1. 卷積化(Convolutional)ide
2. 上採樣(Upsample)函數
3. 跳躍結構(Skip Layer)spa
爲了便於理解,我拿最簡單的結構voc-fcn-alexnet進行說明,該網絡結構主要用到了前面兩個技術,不包含跳躍結構。code
2、voc-fcn-alexnet 的train.prototxt文件orm
layer { name: "data" type: "Python" top: "data" top: "label" python_param { module: "voc_layers" layer: "SBDDSegDataLayer" param_str: "{\'sbdd_dir\': \'../data/sbdd/dataset\', \'seed\': 1337, \'split\': \'train\', \'mean\': (104.00699, 116.66877, 122.67892)}" } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" convolution_param { num_output: 96 pad: 100 kernel_size: 11 group: 1 stride: 4 } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "norm1" type: "LRN" bottom: "pool1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv2" type: "Convolution" bottom: "norm1" top: "conv2" convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 stride: 1 } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "norm2" type: "LRN" bottom: "pool2" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv3" type: "Convolution" bottom: "norm2" top: "conv3" convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 1 stride: 1 } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 2 stride: 1 } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" convolution_param { num_output: 256 pad: 1 kernel_size: 3 group: 2 stride: 1 } } layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5" } layer { name: "pool5" type: "Pooling" bottom: "conv5" top: "pool5" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fc6" type: "Convolution" bottom: "pool5" top: "fc6" convolution_param { num_output: 4096 pad: 0 kernel_size: 6 group: 1 stride: 1 } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7" type: "Convolution" bottom: "fc6" top: "fc7" convolution_param { num_output: 4096 pad: 0 kernel_size: 1 group: 1 stride: 1 } } layer { name: "relu7" type: "ReLU" bottom: "fc7" top: "fc7" } layer { name: "drop7" type: "Dropout" bottom: "fc7" top: "fc7" dropout_param { dropout_ratio: 0.5 } } layer { name: "score_fr" type: "Convolution" bottom: "fc7" top: "score_fr" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 21 pad: 0 kernel_size: 1 } } layer { name: "upscore" type: "Deconvolution" bottom: "score_fr" top: "upscore" param { lr_mult: 0 } convolution_param { num_output: 21 bias_term: false kernel_size: 63 stride: 32 } } layer { name: "score" type: "Crop" bottom: "upscore" bottom: "data" top: "score" crop_param { axis: 2 offset: 18 } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "score" bottom: "label" top: "loss" loss_param { ignore_label: 255 normalize: true } }
3、網絡結構blog
假設輸入的圖片爲500x500,
根據train.prototxt文件,能夠獲得上圖的網絡結構,該網絡結構除了前五層的卷積層,也把後面的三層改成了卷積層,score_fr是卷積層的最後一層,也叫heatmap熱圖,熱圖就是咱們最重要的高維特診圖,獲得高維特徵的heatmap以後,就是最重要的一步也是最後的一步,對原圖像進行upsampling(即反捲積),把圖像進行放大,獲得原圖像的大小。
4、損失函數
該網絡的損失函數爲SoftmaxWithLoss。首先進行softmax求解,求出每一個像素點屬於不一樣類別的機率,由於總共是分爲21類,因此每一個像素點對應21個機率值(輸出通道數爲21)。而後求解每一個像素點所屬實際類別機率的log值之和的平均,再取負數,可獲得損失函數,參考以下:
end