深度學習初識之目標識別

時間 2019-11-07

標籤深度學習目標識別简体版

原文原文鏈接

目標檢測的概念、特色

先理解一下概念，是什麼目標檢測呢？python

目標檢測 就是給你一張圖把裏面的目標找出來，好比我要找人，就須要把人從所給圖片中識別出來，這就是目標檢測。ios

輸入是一整張圖片，輸出是要檢測的分類以及它的定位，這對人來講很是簡單，可是對計算機來講很是的困難。git

好比下面這個例子：github

這張圖就是刀具目標識別的結果，它識別出了倆個目標（刀具），把他們框了出來而且寫出了程序經過學習，認爲它是刀的機率。shell

給你們理清一下基本概念人工智能包含機器學習，裏面有不少分類、聚類等等方法，其中一種也是如今圖像方向落地比較多的叫深度學習，基於深度卷積神經網絡的學習方法。可用來作分類、目標識別、語義分割等等。數據庫

分類：分類就是給張圖，程序告訴是什麼，
目標識別：是除了告訴是什麼外，用一個矩形框標出來
分割：相對複雜，不只告訴是什麼，還要把它的完整輪廓標出來。（基於像素點的的目標識別）

今天給你們介紹一下這個刀具目標識別程序中分類的部分，即程序如何識別出框中是刀的。bash

基於深度學習的方法

數字圖像實際上是一個矩陣，矩陣中的每一個數字就是像素值（0-255，RGB共三層），表明深淺。簡單來講就是讓深度神經網絡經過大量的有標籤樣本，本例中就是數百張刀的圖片，而且告訴程序這是刀行數理統計的計算，學習到樣本的廣泛規律。網絡

而後在測試時在來一張圖片輸入進網絡，在網絡計算後，若是得分符合數百張刀的樣本統計出的規律就認爲它是刀，反之不是。app

一 .卷積神經網絡之層級結構

先舉個例子：框架

卷積神經網絡各個層級結構，以下圖:

上圖中CNN要作的事情是：給定一張圖片，是車仍是馬未知，是什麼車也未知，如今須要模型判斷這張圖片裏具體是一個什麼東西，總之輸出一個結果：若是是車那是什麼車

最左邊是

數據輸入層，輸入圖片

中間是

CONV：卷積計算層，線性乘積求和。
RELU：激勵層，即圖中神經元，本質是一種函數。
POOL：池化層，簡言之，即取區域平均或最大。

最右邊是

FC：全鏈接層

這幾個部分中，卷積計算層是CNN的核心。

二.卷積層

神經元

爲了描述神經網絡，先從神經元講起，這個神經網絡僅由一個「神經元」構成，如下便是這個「神經元」的圖示：

具體過程是將Xi加權輸入（實際上就是在圖像上卷積），再求和而後輸入到「圓圈」中，圓圈其實是激活函數，而後再輸出，便完成了神經網絡中一個神經元的計算過程。

三.RELU層

其實就是這個神經元

激活函數的做用是將神經網絡裏的線性變化轉換成非線性變換。以前能夠看出不管神經網絡設計多麼複雜，層數再多加權變換也是線性變換，此時須要激活函數才能更好的提取特徵，激活函數能夠把輸入的特徵保留並映射下來。
Relu：

人工神經網絡中

每條直線都有權值W,每一個神經元都是一個激活函數,傳統神經網絡每層都是全鏈接。

四.池化層

因爲卷積後的數據量過於龐大，所以能夠對圖像的一個局部區域中不一樣位置的特徵進行聚合統計，這種操做稱爲「池化」(也稱做子採樣)。

池化分爲兩種

最大池化：在選中區域中找最大的值做爲Pooling後的值，
平均值池化：把選中的區域中的平均值做爲Pooling後的值。

目的都是減少數據量，用一個像素代替一組像素。

最大池化具體過程以下圖：

上圖所展現的是取區域最大，即上圖左邊部分中左上角2x2的矩陣中6最大，右上角2x2的矩陣中8最大，左下角2x2的矩陣中3最大，右下角2x2的矩陣中4最大，因此獲得上圖右邊部分的結果：6 8 3 4。

不過在外界條件容許的狀況下，仍是不要進行池化，雖然減小了數據量，可是對於數據的準確度的要求就有所降低，對採樣數據精度形成下降。

五.全鏈接層（FC）

顧名思義，就是將以前全部局部鏈接的計算完的圖片特徵所有輸入該層，物理模型相似於傳統人工神經網絡的鏈接方式，將全部數據匯聚，計算。

通俗理解就是2維變1維，計算出得分。有的網絡裏會有多個全鏈接層，那是由於第一個全鏈接層輸出的雖然是1維的，可是可能長度太高，須要再次全鏈接將輸出轉換到和類數相應的數量，相似於線性分類器的做用。

這個部分對於像我同樣的初學者來講，是最難理解的地方，但真正理解後，會有一種豁然開朗的感受，我認爲這部分也是整篇文章的核心。

其實對於目標識別的一個 demo 流程，能夠簡單歸納一下幾點：

搭建好神經網絡
準備幾百張刀的圖片
訓練網絡，讓網絡認識刀，獲得一個刀的網絡 model
用model測試新的圖片，會得出是否是刀的機率

簡單 Demo

caffe-ssd 是一種很是適合新手的 end to end 目標識別框架。具體步驟以下：

一 SSD的安裝

下載caffe，若是沒有配置過能夠參考blog.csdn.net/baobei0112/…

在home目錄下，獲取SSD的代碼，下載完成後有一個caffe文件夾

git clone github.com/weiliu89/ca…
cd caffe
git checkout ssd(出現「分支」則說明copy-check成功)

進入下載好的caffe目錄，複製配置文件

cd /home/usrname/caffe
cp Makefile.config.example Makefile.config
而後修改Makefile.config，照以前caffe文件下修改便可

編譯caffe

make all -j4
make test
make runtest

而後編譯Python wrapper， make pycaffe，不報錯說明成功。

二數據集的準備

準備好標註好的圖片數據用labelImage(使用方法自行百度)來標註圖片，標註好的圖片會有與其名稱對應的xml文件保存識別框位置，具體以下：

在data文件夾下新建本身訓練數據的VOC格式文件（VOC_knife）：
將標註好的全部xml文件放入Annotations中，將全部圖片放入PNGImages中。在PNGImages新建兩個文件夾trainval和test，將PNGImages中的圖片按比例（具體多少不必定，隨你是這麼分的）放入trainval和test中（也就是說全部圖片在PNGImages中有，在trainval和test中還有一份，如此是爲了方便接下來腳本方便生成數據集）：

在ImageSets中應有以下4個txt文檔：

首先是labelmap.prototxt按實際項目寫成以下格式，如有多個分類依次往下手動編寫便可，須要注意的是都須要編寫背景類0：

其次是trainval和test兩個txt文本，用以下腳本編寫便可，只需改變腳本中相應圖片與xml文件的路徑即可生成。

#! /usr/bin/python  
# -*- coding:UTF-8 -*-  
 
import os, sys  
import glob  
#訓練集和測試集路徑
trainval_dir = "xxx/data/VOC_knife/PNGImages/trainval"  
test_dir = "xxx/data/VOC_knife/PNGImages/test"  
 
trainval_img_lists = glob.glob(trainval_dir + '/*.png')    #獲取trainval中全部.png的文件
trainval_img_names = []       #獲取名稱  
for item in trainval_img_lists:temp1, temp2 = os.path.splitext(os.path.basename(item))  
    trainval_img_names.append(temp1)  
 
test_img_lists = glob.glob(test_dir + '/*.png')   #獲取test中全部.png文件  
test_img_names = []  
for item in test_img_lists:  
    temp1, temp2 = os.path.splitext(os.path.basename(item))  
    test_img_names.append(temp1)  
#圖片路徑和xml路徑  
dist_img_dir = "data/VOC_knife/PNGImages"  #須要寫入txt的trainval和test路徑，由於咱們在PNGImges目錄下除了有trainval和test文件夾外還有全部圖片,因此只用寫到PNGImages  
dist_anno_dir = "/data/VOC_knife/Annotations"  #須要寫入的xml路徑    !!!從caffe跟目錄下第一個文件開始寫
 
trainval_fd = open("xxx/VOC_knife/ImageSets/trainval.txt", 'w')  #存到哪裏，及存儲的名稱
test_fd = open("xxx/VOC_knife/ImageSets/test.txt", 'w')  
   
for item in trainval_img_names:  
    trainval_fd.write(dist_img_dir + '/' + str(item) + '.png' + ' ' + dist_anno_dir + '/' + str(item) + '.xml\n')  
 
for item in test_img_names:  
    test_fd.write(dist_img_dir + '/' + str(item) + '.png' + ' ' + dist_anno_dir + '/' + str(item) + '.xml\n')
複製代碼

最後是test_name_size.txt文檔，一樣用腳本生成

#! /usr/bin/python
# -*- coding:UTF-8 -*-
import os, sys
import glob
from PIL import Image #讀圖

#圖的路徑
img_dir = "xxx/VOC_knife/PNGImages/test"

#獲取制定路徑下的全部png圖片的名稱
img_lists = glob.glob(img_dir + '/*.png')

#在指定路徑下建立文件
test_name_size = open('xxx/VOC_knife/test_name_size_knife.txt', 'w')

for item in img_lists:
    img = Image.open(item)
    width, height = img.size
    temp1, temp2 = os.path.splitext(os.path.basename(item))
    test_name_size.write(temp1 + ' ' + str(height) + ' ' + str(width) + '\n')
複製代碼

這時已經準備好VOC格式的數據了，只要將VOC轉換成LMDB格式的數據就能夠用做訓練了

三訓練 caffe.model

用以下腳本進行訓練，需修改路徑和其中solver.prototxt:

from __future__ import print_function  
import caffe  
from caffe.model_libs import *  
from google.protobuf import text_format  
 
 
import math  
import os  
import shutil  
import stat  
import subprocess  
import sys  
 
 
# 給基準網絡後面增長額外的卷積層（爲了不此處的卷積層的名稱和基準網絡卷積層的名稱重複，這裏能夠用基準網絡最後一個層的名稱進行開始命名），這一部分的具體實現方法能夠對照文件~/caffe/python/caffe/model_libs.py查看，SSD的實現基本上就是ssd_pascal.py和model_libs.py兩個文件在控制，剩下的則是caffe底層代碼中編寫各個功能模塊。  
def AddExtraLayers(net, use_batchnorm=True, lr_mult=1):  
    use_relu = True  
 
 
    # Add additional convolutional layers.  
    # 19 x 19  
    ######################################生成附加網絡的第一個卷積層，卷積核的數量爲256，卷積核的大小爲1*1,pad的尺寸爲0，stride爲1.  
    from_layer = net.keys()[-1] #得到基準網絡的最後一層，做爲conv6-1層的輸入  
 
 
    # TODO(weiliu89): Construct the name using the last layer to avoid duplication.  
    # 10 x 10  
    out_layer = "conv6_1"  
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 1, 0, 1,  
        lr_mult=lr_mult)  
    ########################################conv6_1生成完畢  
    ######################################生成附加網絡的第一個卷積層，卷積核的數量爲512，卷積核的大小爲3*3,pad的尺寸爲1，stride爲2.  
    from_layer = out_layer  
    out_layer = "conv6_2"  
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 512, 3, 1, 2,  
        lr_mult=lr_mult)  
    #########################################conv6_2生成完畢  
    # 5 x 5  
    from_layer = out_layer  
    out_layer = "conv7_1"  
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,  
      lr_mult=lr_mult)  
    #########################################conv7_1生成完畢  
    from_layer = out_layer  
    out_layer = "conv7_2"  
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 1, 2,  
      lr_mult=lr_mult)  
    #########################################conv7_2生成完畢  
    # 3 x 3  
    from_layer = out_layer  
    out_layer = "conv8_1"  
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,  
      lr_mult=lr_mult)  
    #########################################conv8_1生成完畢  
    from_layer = out_layer  
    out_layer = "conv8_2"  
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 0, 1,  
      lr_mult=lr_mult)  
    #########################################conv8_2生成完畢  
    # 1 x 1  
    from_layer = out_layer  
    out_layer = "conv9_1"  
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,  
      lr_mult=lr_mult)  
    #########################################conv9_1生成完畢  
    from_layer = out_layer  
    out_layer = "conv9_2"  
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 0, 1,  
      lr_mult=lr_mult)  
    #########################################conv9_2生成完畢  
    return net  
 
 
 
 
### 相應地修改一下參數 ###  
# 包含caffe代碼的路徑  
# 咱們假設你是在caffe跟目錄下運行代碼  
caffe_root = os.getcwd() #獲取caffe的根目錄  
 
 
# 若是你想在生成全部訓練文件以後就開始訓練，這裏run_soon給予參數Ture.  
run_soon = True      
#若是你想接着上次的訓練，繼續進行訓練，這裏的參數爲Ture，（這個就是說可能你訓練通常中止了，從新啓動的時候，這裏的Ture保證繼續接着你上次的訓練進行訓練）  
#不然爲False，表示咱們將從下面定義的預訓練模型處進行加載。（這個表示就是無論你上次訓練一半的模型了，咱們直接從預訓練好的基準模型哪裏開始訓練）  
resume_training = True  
# 若是是Ture的話，表示咱們要移除舊的模型訓練文件，不然是不移除的。  
 
remove_old_models = False  
 
# 訓練數據的數據庫文件. Created by data/VOC0712/create_data.sh  
train_data = "examples/VOC0712/VOC0712_trainval_lmdb"  
# 測試數據的數據庫文件. Created by data/VOC0712/create_data.sh  
test_data = "examples/VOC0712/VOC0712_test_lmdb"  
# 指定批量採樣器。  
resize_width = 300  
resize_height = 300  
resize = "{}x{}".format(resize_width, resize_height)  
batch_sampler = [  
        {  
                'sampler': {  
                        },  
                'max_trials': 1,  
                'max_sample': 1,  
        },  
        {  
                'sampler': {  
                        'min_scale': 0.3,  
                        'max_scale': 1.0,  
                        'min_aspect_ratio': 0.5,  
                        'max_aspect_ratio': 2.0,  
                        },  
                'sample_constraint': {  
                        'min_jaccard_overlap': 0.1,  
                        },  
                'max_trials': 50,  
                'max_sample': 1,  
        },  
        {  
                'sampler': {  
                        'min_scale': 0.3,  
                        'max_scale': 1.0,  
                        'min_aspect_ratio': 0.5,  
                        'max_aspect_ratio': 2.0,  
                        },  
                'sample_constraint': {  
                        'min_jaccard_overlap': 0.3,  
                        },  
                'max_trials': 50,  
                'max_sample': 1,  
        },  
        {  
                'sampler': {  
                        'min_scale': 0.3,  
                        'max_scale': 1.0,  
                        'min_aspect_ratio': 0.5,  
                        'max_aspect_ratio': 2.0,  
                        },  
                'sample_constraint': {  
                        'min_jaccard_overlap': 0.5,  
                        },  
                'max_trials': 50,  
                'max_sample': 1,  
        },  
        {  
                'sampler': {  
                        'min_scale': 0.3,  
                        'max_scale': 1.0,  
                        'min_aspect_ratio': 0.5,  
                        'max_aspect_ratio': 2.0,  
                        },  
                'sample_constraint': {  
                        'min_jaccard_overlap': 0.7,  
                        },  
                'max_trials': 50,  
                'max_sample': 1,  
        },  
        {  
                'sampler': {  
                        'min_scale': 0.3,  
                        'max_scale': 1.0,  
                        'min_aspect_ratio': 0.5,  
                        'max_aspect_ratio': 2.0,  
                        },  
                'sample_constraint': {  
                        'min_jaccard_overlap': 0.9,  
                        },  
                'max_trials': 50,  
                'max_sample': 1,  
        },  
        {  
                'sampler': {  
                        'min_scale': 0.3,  
                        'max_scale': 1.0,  
                        'min_aspect_ratio': 0.5,  
                        'max_aspect_ratio': 2.0,  
                        },  
                'sample_constraint': {  
                        'max_jaccard_overlap': 1.0,  
                        },  
                'max_trials': 50,  
                'max_sample': 1,  
        },  
 
        ]  
 
#以上這一部分就是文中所說的數據加強部分，抱歉的是這一部分我也沒太看懂。具體可查看~/caffe/src/caffe/util/sampler.cpp文件中的詳細定義。  
 
#如下是轉換參數設置，具體意思可在caffe底層代碼中查看參數的定義。路徑爲~/caffe/src/caffe/proto/caffe.proto  
 
train_transform_param = {  
        'mirror': True,  
        'mean_value': [104, 117, 123],############均值  
        'resize_param': {  #################存儲數據轉換器用於調整大小策略的參數的消息。  
                'prob': 1,  ###############使用這個調整策略的可能性  
                'resize_mode': P.Resize.WARP, ########重定義大小的模式，caffe.proto中定義的是枚舉類型  
                'height': resize_height,  
                'width': resize_width,  
                'interp_mode': [ ###########插值模式用於調整大小，定義爲枚舉類型  
                        P.Resize.LINEAR,  
                        P.Resize.AREA,  
                        P.Resize.NEAREST,  
                        P.Resize.CUBIC,  
                        P.Resize.LANCZOS4,  
                        ],  
                },  
        'distort_param': {##########################存儲數據轉換器用於失真策略的參數的消息  
                'brightness_prob': 0.5,  ###########調整亮度的機率，默認爲1。  
                'brightness_delta': 32,  ###########要添加到[-delta，delta]內的像素值的數量。可能的值在[0,255]以內。 推薦32。  
                'contrast_prob': 0.5, #######調整對比度的機率。  
                'contrast_lower': 0.5, #######隨機對比因子的下界。 推薦0.5。  
                'contrast_upper': 1.5, #######隨機對比因子的上界。 推薦1.5。  
                'hue_prob': 0.5, ##########調整色調的機率。  
                'hue_delta': 18, ##########添加到[-delta，delta]內的色調通道的數量。 可能的值在[0，180]以內。 推薦36。  
                'saturation_prob': 0.5, ########調整飽和的機率。  
                'saturation_lower': 0.5, ########隨機飽和因子的下界。 推薦0.5。  
                'saturation_upper': 1.5, ########隨機飽和因子的上界。 推薦1.5。  
                'random_order_prob': 0.0, ########隨機排列圖像通道的機率。  
                },  
        'expand_param': {   ##################存儲數據轉換器用於擴展策略的參數的消息  
                'prob': 0.5,   ###############使用這個擴展策略的可能性  
                'max_expand_ratio': 4.0,   ######擴大圖像的比例。  
                },  
        'emit_constraint': {    ########給定註釋的條件。  
            'emit_type': caffe_pb2.EmitConstraint.CENTER,    ##############類型定義爲枚舉，此處選定爲CENTER  
            }  
        }  
test_transform_param = {    ###############測試轉換參數，相似於訓練轉換參數。  
        'mean_value': [104, 117, 123],  
        'resize_param': {  
                'prob': 1,  
                'resize_mode': P.Resize.WARP,  
                'height': resize_height,  
                'width': resize_width,  
                'interp_mode': [P.Resize.LINEAR],  
                },  
 
        }  
 
# 若是爲true，則對全部新添加的圖層使用批量標準。  
# 目前只有非批量規範版本已通過測試。  
use_batchnorm = False   ###############是否使用批量標準  
lr_mult = 1    #############基礎學習率設定爲1，用於下面的計算以改變初始學習率。  
# 使用不一樣的初始學習率。  
if use_batchnorm:  
    base_lr = 0.0004  
else:  
    # 當batch_size = 1, num_gpus = 1時的學習率。  
    base_lr = 0.00004   ############因爲上面use_batchnorm = false，因此咱們通常調整初始學習率時只需更改這一部分，目前爲0.001。  

#你改你的model
  # Modify the job name if you want.
job_name = "SSD_{}".format(resize)
# The name of the model. Modify it if you want.
model_name = "VGG_VOC0712_{}".format(job_name)


#存儲模型.prototxt文件的目錄。  
save_dir = "models/VGGNet/VOC0712/{}".format(job_name)  
# 存儲模型快照的目錄。  
snapshot_dir = "models/VGGNet/VOC0712/{}".format(job_name)  
# 存儲做業腳本和日誌文件的目錄。  
job_dir = "jobs/VGGNet/VOC0712/{}".format(job_name)  
# 存儲檢測結果的目錄。  
output_result_dir = "{}/data/VOCdevkit/results/VOC2007/{}/Main".format(os.environ['HOME'], job_name)  
 
# 模型定義文件。  
train_net_file = "{}/train.prototxt".format(save_dir)  
test_net_file = "{}/test.prototxt".format(save_dir)  
deploy_net_file = "{}/deploy.prototxt".format(save_dir)  
solver_file = "{}/solver.prototxt".format(save_dir)  
# 快照前綴。  
snapshot_prefix = "{}/{}".format(snapshot_dir, model_name)  
# 做業腳本路徑。  
job_file = "{}/{}.sh".format(job_dir, model_name)  
 
# 存儲測試圖像的名稱和大小。 Created by data/VOC0712/create_list.sh  
name_size_file = "data/VOC0712/test_name_size.txt"  
# 預訓練模型。 咱們使用完卷積截斷的VGGNet。  
pretrain_model = "models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"  
# 存儲LabelMapItem。  
label_map_file = "data/VOC0712/labelmap_voc.prototxt"  
 
# 多框損失層MultiBoxLoss的參數。在~/caffe/src/caffe/proto/caffe.proto可查找具體定義  
num_classes = 21  ##########要預測的類的數量。你的分類數+1
share_location = True   #########位置共享，若是爲true，邊框在不一樣的類中共享。  
background_label_id=0   ########是否使用先驗匹配，通常爲true。  
train_on_diff_gt = True    ########是否考慮困難的ground truth，默認爲true。  
normalization_mode = P.Loss.VALID    ######如何規範跨越批次，空間維度或其餘維度彙集的損失層的損失。 目前只在SoftmaxWithLoss和SigmoidCrossEntropyLoss圖層中實現。按照批次中的示例數量乘以空間維度。 在計算歸一化因子時，不會忽略接收忽略標籤的輸出。定義爲枚舉，四種類型分別是：FULL，除以不帶ignore_label的輸出位置總數。 若是未設置ignore_label，則表現爲FULL；VALID；BATCH_SIZE，除以批量大小；NONE，不要規範化損失。  
code_type = P.PriorBox.CENTER_SIZE     #########bbox的編碼方式。此參數定義在PriorBoxParameter參數定義解釋中，爲枚舉類型，三種類型爲：CORNER，CENTER_SIZE和CORNER_SIZE。  
ignore_cross_boundary_bbox = False    ########若是爲true，則在匹配期間忽略跨邊界bbox。 跨界bbox是一個在圖像區域以外的bbox。即將超出圖像的預測邊框剔除，這裏咱們不踢除，不然特徵圖邊界點產生的先驗框就沒有任何意義。  
mining_type = P.MultiBoxLoss.MAX_NEGATIVE   訓練期間的挖掘類型。定義爲枚舉，分別爲三種類型： 若爲NONE則表示什麼都不使用，這樣會致使正負樣本的嚴重不均衡；若爲MAX_NEGATIVE則根據分數選擇底片；若爲HARD_EXAMPLE則選擇基於「在線硬示例挖掘的基於訓練區域的對象探測器」的硬實例，此類型即爲SSD原文中所使用的Hard_negative_mining(負硬挖掘)策略。  
neg_pos_ratio = 3.  #####負/正比率，即文中所說的1：3。  
loc_weight = (neg_pos_ratio + 1.) / 4.    #########位置損失的權重，  
multibox_loss_param = {        ############存儲MultiBoxLossLayer使用的參數的消息  
    'loc_loss_type': P.MultiBoxLoss.SMOOTH_L1,   ###########位置損失類型，定義爲枚舉，有L2和SMOOTH_L1兩種類型。  
    'conf_loss_type': P.MultiBoxLoss.SOFTMAX,   #########置信損失類型，定義爲枚舉，有SOFTMAX和LOGISTIC兩種。  
    'loc_weight': loc_weight,  
    'num_classes': num_classes,  
    'share_location': share_location,  
    'match_type': P.MultiBoxLoss.PER_PREDICTION,   #########訓練中的匹配方法。定義爲枚舉，有BIPARTITE和PER_PREDICTION兩種。若是match_type爲PER_PREDICTION（即每張圖預測），則使用overlap_threshold來肯定額外的匹配bbox。  
    'overlap_threshold': 0.5,   #########閥值大小。即咱們所說的IoU的大小。  
    'use_prior_for_matching': True,   ########是否使用先驗匹配，通常爲true。  
    'background_label_id': background_label_id,   ##########背景標籤的類別編號，通常爲0。  
    'use_difficult_gt': train_on_diff_gt,  ########是否考慮困難的ground truth，默認爲true。  
    'mining_type': mining_type,    #######訓練期間的挖掘類型。定義爲枚舉，分別爲三種類型： 若爲NONE則表示什麼都不使用，這樣會致使正負樣本的嚴重不均衡；若爲MAX_NEGATIVE則根據分數選擇底片；若爲HARD_EXAMPLE則選擇基於「在線硬示例挖掘的基於訓練區域的對象探測器」的硬實例，此類型即爲SSD原文中所使用的Hard_negative_mining(負硬挖掘)策略。  
    'neg_pos_ratio': neg_pos_ratio,   #####負/正比率，即文中所說的1：3。  
    'neg_overlap': 0.5,   ####對於不匹配的預測，上限爲負的重疊。即若是重疊小於0.5則定義爲負樣本，Faster R-CNN設置爲0.3。  
    'code_type': code_type,   #########bbox的編碼方式。此參數定義在PriorBoxParameter參數定義解釋中，爲枚舉類型，三種類型爲：CORNER，CENTER_SIZE和CORNER_SIZE。  
    'ignore_cross_boundary_bbox': ignore_cross_boundary_bbox,  ########若是爲true，則在匹配期間忽略跨邊界bbox。 跨界bbox是一個在圖像區域以外的bbox。即將超出圖像的預測邊框剔除，這裏咱們不踢除，不然特徵圖邊界點產生的先驗框就沒有任何意義。  
    }  
loss_param = {   ###存儲由損失層共享的參數的消息  
    'normalization': normalization_mode,    ######如何規範跨越批次，空間維度或其餘維度彙集的損失層的損失。 目前只在SoftmaxWithLoss和SigmoidCrossEntropyLoss圖層中實現。按照批次中的示例數量乘以空間維度。 在計算歸一化因子時，不會忽略接收忽略標籤的輸出。定義爲枚舉，四種類型分別是：FULL，除以不帶ignore_label的輸出位置總數。 若是未設置ignore_label，則表現爲FULL；VALID；BATCH_SIZE，除以批量大小；NONE，不要規範化損失。  
    }  
 
＃參數生成先驗。  
＃輸入圖像的最小尺寸  
min_dim = 300   #######維度  
# conv4_3 ==> 38 x 38  
# fc7 ==> 19 x 19  
# conv6_2 ==> 10 x 10  
# conv7_2 ==> 5 x 5  
# conv8_2 ==> 3 x 3  
# conv9_2 ==> 1 x 1  
mbox_source_layers = ['conv4_3', 'fc7', 'conv6_2', 'conv7_2', 'conv8_2', 'conv9_2'] #####prior_box來源層，能夠更改。不少改進都是基於此處的調整。  
# in percent %  
min_ratio = 20 ####這裏便是論文中所說的Smin=0.2，Smax=0.9的初始值，通過下面的運算便可獲得min_sizes，max_sizes。具體如何計算以及二者表明什麼，請關注個人博客SSD詳解。這裏產生不少改進。  
max_ratio = 90  
####math.floor()函數表示：求一個最接近它的整數，它的值小於或等於這個浮點數。  
step = int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))####取一個間距步長，即在下面for循環給ratio取值時起一個間距做用。能夠用一個具體的數值代替，這裏等於17。  
min_sizes = []  ###通過如下運算獲得min_sizes和max_sizes。  
max_sizes = []  
for ratio in xrange(min_ratio, max_ratio + 1, step):  ####從min_ratio至max_ratio+1每隔step=17取一個值賦值給ratio。注意xrange函數的做用。  
########min_sizes.append（）函數即把括號內部每次獲得的值依次給了min_sizes。  
  min_sizes.append(min_dim * ratio / 100.)  
  max_sizes.append(min_dim * (ratio + step) / 100.)  
min_sizes = [min_dim * 10 / 100.] + min_sizes  
max_sizes = [min_dim * 20 / 100.] + max_sizes  
steps = [8, 16, 32, 64, 100, 300]  ###這一步要仔細理解，即計算卷積層產生的prior_box距離原圖的步長，先驗框中心點的座標會乘以step，至關於從feature map位置映射回原圖位置，好比conv4_3輸出特徵圖大小爲38*38，而輸入的圖片爲300*300，因此38*8約等於300，因此映射步長爲8。這是針對300*300的訓練圖片。  
aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]  #######這裏指的是橫縱比，六種尺度對應六個產生prior_box的卷積層。具體可查看生成的train.prototxt文件一一對應每層的aspect_ratio參數，此參數在caffe.proto中有定義，關於aspect_ratios如何把其內容傳遞給了aspect_ratio，在model_libs.py文件中有詳細定義。  
##在此咱們要說明一個事實，就是文中的長寬比是如何產生的，這裏請讀者必定要參看博主博文《SSD詳解（一）》中的第2部份內容，關於prior_box的產生。  
# L2 normalize conv4_3.  
normalizations = [20, -1, -1, -1, -1, -1]  ##對卷積層conv4_3作歸一化。model_libs.py裏產生了normallize層，具體的層定義，參看底層代碼~/caffe/src/layers/Ｎormalize_layer.cpp，爲何這裏設置conv4_3爲20我也沒看懂，原諒Ｃ++太渣，這裏每一個數對應每一個先驗層，只要哪一個層對應的數不爲-1則產生normal。  
# 用於對以前的bbox進行編碼/解碼的方差。  
if code_type == P.PriorBox.CENTER_SIZE:  ########兩種選擇，根據參數code_type的選擇決定，因爲上面已經將code_type選定。有人理解爲變量variance用來對bbox的迴歸目標進行放大，從而加快對應濾波器參數的收斂。除以variance是對預測box和真實box的偏差進行放大，從而增長loss，增大梯度，加快收斂。另外，top_data += top[0]->offset(0, 1);已經使指針指向新的地址，因此variance不會覆蓋前面的結果。prior_variance在model_libs.py中傳遞給了variance變量，而後利用prior_box_layer.cpp將其運算定義至priorbox_layer層中，具體可查看train.prototxt中的每個先驗卷積層層中產生先驗框的層中，即**_mbox_priorbox。  
  prior_variance = [0.1, 0.1, 0.2, 0.2]  
else:  
  prior_variance = [0.1]  
flip = True   ###若是爲true，則會翻轉每一個寬高比。例如，若是有縱橫比「r」，咱們也會產生縱橫比「1.0 / r」。故產生{1，2，3，1/2，1/3}。  
clip = False  ###作clip操做是爲了讓prior的候選座標位置保持在[0,1]範圍內。在caffe.proto文件中有關於參數clip的解釋，爲」若是爲true，則將先驗框裁剪爲[0，1]「。  
#以上兩個參數所產生的結果均在prior_box_layer.cpp中實現。  
 
# 求解參數。  
# 定義要使用的GPU。  
gpus = "0,1,2,3"  #多塊GPU的編號，若是隻有一塊，這裏只需保留0，不然會出錯。  
gpulist = gpus.split(",") #獲取GPU的列表。  
num_gpus = len(gpulist) #獲取GPU編號。  
 
# 將小批量分紅不一樣的GPU.  
batch_size = 32  #設置訓練樣本輸入的數量，不要超出內存就好。  
accum_batch_size = 32 #這裏與batch_size相搭配產生下面的iter_size。在看了下一行你就知道它的做用了。  
iter_size = accum_batch_size / batch_size #若是iter_size=1,則前向傳播一次後進行一次反向傳遞，若是=2，則兩次前傳後進行一次反傳，這樣作是減小每次傳播所佔用的內存空間，有的硬件不行的話就沒法訓練，可是增長iter會使訓練時間增長，可是總的迭代次數不變。  
solver_mode = P.Solver.CPU  
device_id = 0  
batch_size_per_device = batch_size #批次傳遞，沒什麼好講的。  
if num_gpus > 0:  
  batch_size_per_device = int(math.ceil(float(batch_size) / num_gpus))  #這裏指若是你有多塊GPU則能夠將這些訓練任務均分給多塊GPU訓練，從而加快訓練速度。  
  iter_size = int(math.ceil(float(accum_batch_size) / (batch_size_per_device * num_gpus))) #多塊GPU的iter_size大小計算，上面的是一塊的時候。  
  solver_mode = P.Solver.GPU  
  device_id = int(gpulist[0])  
 
if normalization_mode == P.Loss.NONE: ##若是損失層的參數NormalizationMode選擇NONE，即沒有歸一化模式，則基礎學習率爲本文件之上的base_lr=0.0004除以batch_size_per_device=32獲得新的base_lr=1.25*10^(-5)。  
  base_lr /= batch_size_per_device  
elif normalization_mode == P.Loss.VALID: ##同理，根據不一樣的歸一化模式選擇不一樣的base_lr。在本文件上面咱們看到了normalization_mode = P.Loss.VALID，而loc_weight = (neg_pos_ratio + 1.) / 4==1，因此新的base_lr=25*0.0004=0.001，這就是爲何咱們最後生成的solver.prototxt文件中的base_lr=0.001的緣由，因此若是訓練發散想經過減少base_lr來實驗，則要更改最上面的base_lr=0.0004才能夠。  
  base_lr *= 25. / loc_weight  
elif normalization_mode == P.Loss.FULL:  #同上理。  
  # 每幅圖像大概有2000個先驗bbox。  
  # TODO(weiliu89): 估計確切的先驗數量。  
  base_lr *= 2000. #base_lr=2000*0.0004=0.8。  
 
# 評估整個測試集。  
num_test_image = 4952 #整個測試集圖像的數量。  
test_batch_size = 8 #測試時的batch_size。  
# 理想狀況下，test_batch_size應該被num_test_image整除，不然mAP會略微偏離真實值。  
test_iter = int(math.ceil(float(num_test_image) / test_batch_size)) #這裏計算每測試迭代多少次能夠覆蓋整個測試集，和分類網絡中的是一致的。這裏4952/8=619，若是你的測試圖片除以你的test_batch_size不等於整數，那麼這裏會取一個近似整數。  
 
solver_param = { ##solver.prototxt文件中的各參數的取值，這裏相信作過caffe訓練的人應該大體有了解。  
    # 訓練參數  
    'base_lr': base_lr, #把上面的solver拿下來。  
    'weight_decay': 0.0005,  
    'lr_policy': "multistep",  
    'stepvalue': [80000, 100000, 120000], #多步衰減  
    'gamma': 0.1,  
    'momentum': 0.9,  
    'iter_size': iter_size,  
    'max_iter': 120000,  
    'snapshot': 80000,  
    'display': 10,  
    'average_loss': 10,  
    'type': "SGD",  
    'solver_mode': solver_mode,  
    'device_id': device_id,  
    'debug_info': False,  
    'snapshot_after_train': True,  
    # 測試參數  
    'test_iter': [test_iter],  
    'test_interval': 10000, #測試10000次輸出一次測試結果  
    'eval_type': "detection",  
    'ap_version': "11point",  
    'test_initialization': False,  
    }  
 
# 生成檢測輸出的參數。  
det_out_param = {  
    'num_classes': num_classes,  #類別數目  
    'share_location': share_location,  #位置共享。  
    'background_label_id': background_label_id, #背景類別編號，這裏爲0。  
    'nms_param': {'nms_threshold': 0.45, 'top_k': 400}, #非最大抑制參數，閥值爲0.45，top_k表示最大數量的結果要保留，文中介紹，非最大抑制的做用就是消除多餘的框，就是使評分低的框剔除。參數解釋在caffe.proto中有介紹。  
    'save_output_param': {  #用於保存檢測結果的參數，這一部分參數在caffe.proto中的SaveOutputParameter有定義。  
        'output_directory': output_result_dir,  #輸出目錄。 若是不是空的，咱們將保存結果。前面咱們有定義結果保存的路徑。  
        'output_name_prefix': "comp4_det_test_", #輸出名稱前綴。  
        'output_format': "VOC", #輸出格式。VOC  -  PASCAL VOC輸出格式。COCO  -  MS COCO輸出格式。  
        'label_map_file': label_map_file, #若是要輸出結果，還必須提供如下兩個文件。不然，咱們將忽略保存結果。標籤映射文件。這在前面中有給label_map_file附文件，也就是咱們在訓練的時候所作的labelmap.prototxt文件的位置，詳情參看博主博文《基於caffe使用SSD訓練本身的數據》。  
        'name_size_file': name_size_file, #即咱們在訓練時定義的test_name_size.txt文件的路徑。該文件表示測試圖片的大小。  
        'num_test_image': num_test_image, #測試圖片的數量。  
        },  
    'keep_top_k': 200, ##nms步以後每一個圖像要保留的bbox總數。-1表示在nms步以後保留全部的bbox。  
    'confidence_threshold': 0.01, #只考慮可信度大於閾值的檢測。 若是沒有提供，請考慮全部的框。  
    'code_type': code_type,  #bbox的編碼方式。  
    }  
 
# 評估檢測結果的參數。  
det_eval_param = {  #位於caffe.proto文件中的DetectionEvaluateParameter定義。  
    'num_classes': num_classes, #類別數  
    'background_label_id': background_label_id, #背景編號，爲0。  
    'overlap_threshold': 0.5, #重疊閥值，0.5。  
    'evaluate_difficult_gt': False, #若是爲true，也要考慮難以評估的grountruth。  
    'name_size_file': name_size_file, #test_name_size.txt路徑。  
    }  
 
###但願你不須要改變如下###  
# 檢查文件。這一部分是檢查你的全部訓練驗證過程必須有的文件與數據提供。  
check_if_exist(train_data)  
check_if_exist(test_data)  
check_if_exist(label_map_file)  
check_if_exist(pretrain_model)  
make_if_not_exist(save_dir)  
make_if_not_exist(job_dir)  
make_if_not_exist(snapshot_dir)  
 
# 建立訓練網絡。這一部分主要是在model_libs.py中完成的。  
net = caffe.NetSpec()  
##調用model_libs.py中的CreateAnnotatedDataLayer()函數，建立標註數據傳遞層，將括號中的參數傳遞進去。model_libs.py文件中提供了四種基礎網絡，即VGG、ZF、ResNet101和ResNet152。  
net.data, net.label = CreateAnnotatedDataLayer(train_data, batch_size=batch_size_per_device,  
        train=True, output_label=True, label_map_file=label_map_file,  
        transform_param=train_transform_param, batch_sampler=batch_sampler)  
#調用model_libs.py中的VGGNetBody()函數建立截斷的VGG基礎網絡。參數傳遞進去。model_libs.py文件中提供了四種基礎網絡，即VGG、ZF、ResNet101和ResNet152。能夠分別查看不一樣基礎網絡的調用方式。  
VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True,  
    dropout=False)  ##這些參數分別表示：from_layer表示本基礎網絡的數據源來自data層的輸出，fully_conv=Ture表示使用全卷積，reduced=Ｔure在該文件中能夠發現是負責選用全卷積層的某幾個參數的取值和最後選擇不一樣參數的全連接層，dilated=Ｔrue表示是否須要fc6和fc7間的pool5層以及選擇其參數還有配合reduced共同選擇全卷積層的參數選擇，dropout表示是否須要dropout層flase表示不須要。  
 
#如下爲添加特徵提取的層，即調用咱們本文件最上面定義的須要額外添加的幾個層，即conv6_1,conv6_2等等。  
AddExtraLayers(net, use_batchnorm, lr_mult=lr_mult)  
 
#調用CreateMultiBoxHead()函數建立先驗框的提取及匹配等層數，下面這些參數其實咱們在上面所有都有解釋，具體仍然能夠參照caffe.proto和model_libs.py以及該層對應的cpp實現文件去閱讀理解。這些層包括conv_mbox_conf、conv_mbox_loc、對應前二者的perm和flat層（這兩層的做用在我博文《ＳＳＤ詳解》中有解釋）、還有conv_mbox_priorbox先驗框產生層等。  
mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,  
        use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,  
        aspect_ratios=aspect_ratios, steps=steps, normalizations=normalizations,  
        num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,  
        prior_variance=prior_variance, kernel_size=3, pad=1, lr_mult=lr_mult)  
 
# 建立MultiBoxLossLayer。即建立損失層。這裏包括置信損失和位置損失的疊加。具體計算的實如今multibox_loss_layer.cpp中實現，其中的哥哥參數想multi_loss_param和loss_param等參數在前面均有定義。  
name = "mbox_loss"  
mbox_layers.append(net.label)  
net[name] = L.MultiBoxLoss(*mbox_layers, multibox_loss_param=multibox_loss_param,  
        loss_param=loss_param, include=dict(phase=caffe_pb2.Phase.Value('TRAIN')),  
        propagate_down=[True, True, False, False]) #這裏重點講一下參數propagate_down，指定是否反向傳播到每一個底部。若是未指定，Caffe會自動推斷每一個輸入是否須要反向傳播來計算參數梯度。若是對某些輸入設置爲true，則強制向這些輸入反向傳播; 若是對某些輸入設置爲false，則會跳過對這些輸入的反向傳播。大小必須是0或等於底部的數量。具體解讀cpp文件中的參數propagate_down[0]~[3]。  
 
with open(train_net_file, 'w') as f: #打開文件將上面編輯的這些層寫入到prototxt文件中。  
    print('name: "{}_train"'.format(model_name), file=f)  
    print(net.to_proto(), file=f)  
shutil.copy(train_net_file, job_dir) #將寫入的訓練文件train.prototxt複製一份給目錄job_dir。  
 
 # 建立測試網絡。前一部分基本上與訓練網絡一致。  
net = caffe.NetSpec()  
net.data, net.label = CreateAnnotatedDataLayer(test_data, batch_size=test_batch_size,  
        train=False, output_label=True, label_map_file=label_map_file,  
        transform_param=test_transform_param)  
 
VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True,  
    dropout=False)  
 
AddExtraLayers(net, use_batchnorm, lr_mult=lr_mult)  
 
mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,  
        use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,  
        aspect_ratios=aspect_ratios, steps=steps, normalizations=normalizations,  
        num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,  
        prior_variance=prior_variance, kernel_size=3, pad=1, lr_mult=lr_mult)  
 
conf_name = "mbox_conf"  #置信的交叉驗證。  
if multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.SOFTMAX:  
  reshape_name = "{}_reshape".format(conf_name)  
  net[reshape_name] = L.Reshape(net[conf_name], shape=dict(dim=[0, -1, num_classes]))  
  softmax_name = "{}_softmax".format(conf_name)  
  net[softmax_name] = L.Softmax(net[reshape_name], axis=2)  
  flatten_name = "{}_flatten".format(conf_name)  
  net[flatten_name] = L.Flatten(net[softmax_name], axis=1)  
  mbox_layers[1] = net[flatten_name]  
elif multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.LOGISTIC:  
  sigmoid_name = "{}_sigmoid".format(conf_name)  
  net[sigmoid_name] = L.Sigmoid(net[conf_name])  
  mbox_layers[1] = net[sigmoid_name]  
 
#下面這一部分是test網絡獨有的，爲檢測輸出和評估網絡。  
net.detection_out = L.DetectionOutput(*mbox_layers,  
    detection_output_param=det_out_param,  
    include=dict(phase=caffe_pb2.Phase.Value('TEST')))  
net.detection_eval = L.DetectionEvaluate(net.detection_out, net.label,  
    detection_evaluate_param=det_eval_param,  
    include=dict(phase=caffe_pb2.Phase.Value('TEST')))  
 
with open(test_net_file, 'w') as f: #寫入test.txt。  
    print('name: "{}_test"'.format(model_name), file=f)  
    print(net.to_proto(), file=f)  
shutil.copy(test_net_file, job_dir)  
 
# 建立deploy網絡。  
# 從測試網中刪除第一層和最後一層。  
deploy_net = net  
with open(deploy_net_file, 'w') as f:  
    net_param = deploy_net.to_proto()  
    # 從測試網中刪除第一個（AnnotatedData）和最後一個（DetectionEvaluate）層。  
    del net_param.layer[0] #刪除首層  
    del net_param.layer[-1] #刪除尾層。  
    net_param.name = '{}_deploy'.format(model_name) #建立網絡名。  
    net_param.input.extend(['data']) #輸入擴展爲data。  
    net_param.input_shape.extend([  
        caffe_pb2.BlobShape(dim=[1, 3, resize_height, resize_width])]) #deploy.prototxt文件中特有的輸入數據維度信息，這裏應該爲[1,3,300,300]。  
    print(net_param, file=f) #輸出到文件  
shutil.copy(deploy_net_file, job_dir) #複製一份到job_dir中。  
 
# 建立Slover.prototxt。  
solver = caffe_pb2.SolverParameter(  #將上面定義的solver參數通通拿下來。  
        train_net=train_net_file,  
        test_net=[test_net_file],  
        snapshot_prefix=snapshot_prefix,  
        **solver_param)  
 
with open(solver_file, 'w') as f: #將拿下來的參數通通寫入solver.prototxt中。  
    print(solver, file=f)  
shutil.copy(solver_file, job_dir) #複製一份到job_dir中。  
 
max_iter = 0  #最大迭代次數首先初始化爲0。  
# 找到最近的快照。即若是中途中斷訓練，再次訓練首先尋找上次中斷時保存的模型繼續訓練。  
for file in os.listdir(snapshot_dir): #依次在快照模型所保存的文件中查找相對應的模型。  
  if file.endswith(".solverstate"): #若是存在此模型，則繼續往下訓練。  
    basename = os.path.splitext(file)[0]  
    iter = int(basename.split("{}_iter_".format(model_name))[1])  
    if iter > max_iter: #若是已迭代的次數大於max_iter，則賦值給max_iter。  
      max_iter = iter  
 
#如下部分爲訓練命令。  
train_src_param = '--weights="{}" \\\n'.format(pretrain_model) #加載與訓練微調模型命令。  
if resume_training:  
  if max_iter > 0:  
    train_src_param = '--snapshot="{}_iter_{}.solverstate" \\\n'.format(snapshot_prefix, max_iter) #權重的初始參數即從咱們定義的imagenet訓練ＶＧＧ16模型中獲取。  
 
if remove_old_models:  
  # 刪除任何小於max_iter的快照。上一段和本段程序主要的目的是隨着訓練的推動，max_iter隨之逐漸增大，知道訓練至120000次後把前面生成的快照模型都刪除了，就是保存下一次的模型後刪除上一次的模型。  
  for file in os.listdir(snapshot_dir):  #遍歷查找模型文件。  
    if file.endswith(".solverstate"): #找到後綴爲solverstate的模型文件。  
      basename = os.path.splitext(file)[0]  
      iter = int(basename.split("{}_iter_".format(model_name))[1]) #獲取已迭代的次數。  
      if max_iter > iter: #若是迭代知足條件，則下一條語句去刪除。  
        os.remove("{}/{}".format(snapshot_dir, file))  
    if file.endswith(".caffemodel"):  #找到後綴爲caffemodel的模型文件。  
      basename = os.path.splitext(file)[0]  
      iter = int(basename.split("{}_iter_".format(model_name))[1]) #獲取迭代次數iter。  
      if max_iter > iter: #判斷若是知足條件則刪除已存在的模型。  
        os.remove("{}/{}".format(snapshot_dir, file))  
 
# 建立工做文件。  
with open(job_file, 'w') as f: #將訓練文件寫入執行文件中生成.sh可執行文件後執行命令訓練。  
  f.write('cd {}\n'.format(caffe_root))  
  f.write('./build/tools/caffe train \\\n')  
  f.write('--solver="{}" \\\n'.format(solver_file))  
  f.write(train_src_param)  
  if solver_param['solver_mode'] == P.Solver.GPU:  
    f.write('--gpu {} 2>&1 | tee {}/{}.log\n'.format(gpus, job_dir, model_name))  
  else:  
    f.write('2>&1 | tee {}/{}.log\n'.format(job_dir, model_name))  
 
# 複製本腳本只job_dir中。  
py_file = os.path.abspath(__file__)  
shutil.copy(py_file, job_dir)  
 
# 運行。  
os.chmod(job_file, stat.S_IRWXU)  
if run_soon:  
  subprocess.call(job_file, shell=True) 
複製代碼

具體須要修改的參數和路徑已經在腳本中註釋，在修改類別數量時請+1（背景類），好比只有一類則改成2。

四用本身的模型在圖片上進行目標識別

# -*- coding: utf-8 -*
import numpy as np

import timeit
from PIL import Image
from PIL import ImageDraw
import os
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (10, 10)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Make sure that the work directory is caffe_root
caffe_root = './'
# modify img_dir to your path of testing images of kitti
#須要測試的集合的圖片
img_dir = 'models/knife/test1/'
import os
os.chdir(caffe_root)
import sys
sys.path.insert(0, 'python')
from google.protobuf import text_format
from caffe.proto import caffe_pb2

import caffe
#from _ensemble import *

caffe.set_device(0)
caffe.set_mode_gpu()
#deploy,模型，和labelmap的位置
model_def = 'models/knife/model-v1/SSD_300x300/deploy.prototxt'
model_weights = 'models/knife/model-v1/SSD_300x300/VGG_knife_SSD_300x300_iter_150000.caffemodel'
voc_labelmap_file = caffe_root+'data/VOC_knife/ImageSets/labelmap_knife.prototxt'
#最後標記完保存的路徑
save_dir = 'models/knife/result1-150000/'
txt_dir = 'models/knife/result1-150000/'
#f = open (r'out_3d.txt','w')

if not(os.path.exists(txt_dir)):
    os.makedirs(txt_dir)
if not(os.path.exists(save_dir)):
    os.makedirs(save_dir)    
file = open(voc_labelmap_file, 'r')
labelmap = caffe_pb2.LabelMap()
text_format.Merge(str(file.read()), labelmap)

net = caffe.Net(model_def,      # defines the structure of the model
                model_weights,  # contains the trained weights
                caffe.TEST)     # use test mode (e.g., don't perform dropout)

# input preprocessing: 'data' is the name of the input blob == net.inputs[0]
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1))
transformer.set_mean('data', np.array([104,117,123])) # mean pixel
transformer.set_raw_scale('data', 255)  # the reference model operates on images in [0,255] range instead of [0,1]
transformer.set_channel_swap('data', (2,1,0))  # the reference model has channels in BGR order instead of RGB

# set net to batch size of 1

image_width = 300
image_height = 300

net.blobs['data'].reshape(1,3,image_height,image_width)
    
def get_labelname(labelmap, labels):
    num_labels = len(labelmap.item)
    labelnames = []
    if type(labels) is not list:
        labels = [labels]
    for label in labels:
        found = False
        for i in xrange(0, num_labels):
            if label == labelmap.item[i].label:
                found = True
                labelnames.append(labelmap.item[i].display_name)
                break
        assert found == True
    return labelnames
 
im_names = list(os.walk(img_dir))[0][2]

for im_name in im_names:

    img_file = img_dir + im_name
    image = caffe.io.load_image(img_file)
    
    transformed_image = transformer.preprocess('data', image)
    net.blobs['data'].data[...] = transformed_image
    
    #t1 = timeit.Timer("net.forward()","from __main__ import net")
    #print t1.timeit(2)

    # Forward pass.
    detections = net.forward()['detection_out']
    
    # Parse the outputs.
    det_label = detections[0,0,:,1]
    det_conf = detections[0,0,:,2]
    det_xmin = detections[0,0,:,3]
    det_ymin = detections[0,0,:,4]
    det_xmax = detections[0,0,:,5]
    det_ymax = detections[0,0,:,6]

    # Get detections with confidence higher than 0.001
    top_indices = [i for i, conf in enumerate(det_conf) if conf >= 0.15]
    top_conf = det_conf[top_indices]
    top_label_indices = det_label[top_indices].tolist()
    top_labels = get_labelname(labelmap, top_label_indices)
    top_xmin = det_xmin[top_indices]
    top_ymin = det_ymin[top_indices]
    top_xmax = det_xmax[top_indices]
    top_ymax = det_ymax[top_indices]   

    #colors = plt.cm.hsv(np.linspace(0, 1, 21)).tolist()
 
    #img = Image.open(img_dir + "%06d.jpg"%(img_idx))
    img = Image.open(img_file)
    draw = ImageDraw.Draw(img)       
    for i in xrange(top_conf.shape[0]):
        xmin = top_xmin[i] * image.shape[1]
        ymin = top_ymin[i] * image.shape[0]
        xmax = top_xmax[i] * image.shape[1]
        ymax = top_ymax[i] * image.shape[0]
        
        h = float(ymax - ymin)
        w = float(xmax - xmin)
        #if (w==0) or (h==0):
        #   continue
        #if (h/w >=2)and((xmin<10)or(xmax > 1230)):
        #   continue
        
        score = top_conf[i]
        label_num = top_label_indices[i]
        if score > 0.3:
            draw.line(((xmin,ymin),(xmin,ymax),(xmax,ymax),(xmax,ymin),(xmin,ymin)),fill=(0,255,0))
            draw.text((xmin,ymin),'%s%.2f'%(top_labels[i], score),fill=(255,255,255))
        #elif score > 0.02:
        #    draw.line(((xmin,ymin),(xmin,ymax),(xmax,ymax),(xmax,ymin),(xmin,ymin)),fill=(255,0,255))
        #    draw.text((xmin,ymin),'%.2f'%(score),fill=(255,255,255))
        
    #img.save(save_dir+"%06d.jpg"%(img_idx))
    img.save(save_dir+im_name) 
複製代碼