目標檢測之Faster-RCNN的pytorch代碼詳解(模型準備篇)

時間 2019-11-16

標籤目標檢測 faster rcnn pytorch 代碼詳解模型準備简体版

原文原文鏈接

十月一的假期轉眼就結束了，這個假期帶女友處處玩了玩，雖然經濟彷彿要陷入危機，不過不要緊，要是吃不上飯就看書，吃精神糧食也不錯，哈哈！開個玩笑，是要收收心好好幹活了，繼續寫Faster-RCNN的代碼解釋的博客，本篇博客研究模型準備部分，也就是對應於代碼目錄/simple-faster-rcnn-pytorch-master/model/utils/文件夾，顧名思義，utils通常就是一些配置工具之類的文件，咱們打開仔細看一下目錄：html

一.bbox_tools.py

大概有這麼些文件夾，NMS文件夾裏對應的是非極大值抑制部分的代碼，這裏就不班門弄斧了，有興趣的你們能夠仔細看看，到底非極大值是個什麼原理，如何用代碼進行實現，我主要看的是bbox_tools.py和creator_tools.py這兩個文件，下面放上代碼的流程圖看一下大概的目錄結構：python

bbox_tools.py部分的代碼主要由四個函數構成：1loc2bbox(src_bbox,loc)和bbox2loc(src_bbox,dst_bbox)是一對函數，其功能是恰好相反的，好比loc2bbox()看其函數的參數src_bbox,loc就知道是有已知源框框和位置誤差，求出目標框框的做用，而bbox2loc(src_bbox,dst_bbox)函數看其參數就知道是完成已知源框框和參考框框求出其位置誤差的功能！而這個bbox_iou看函數名字咱們也大概能猜出是求兩個bbox的相交的交併比的功能，最後的generate_anchor_base()的功能大概就是根據基準點生成9個基本的anchor的功能！ratios=[0.5,1,2],anchor_scales=[8,16,32]是長寬比和縮放比例，3x3的參數恰好獲得9個anchor!ios

有了上述對函數的大概功能的印象，下面來看一下具體是如何用代碼來實現這些功能的：windows

1 def loc2bbox(src_bbox,loc):數組

 1 def loc2bbox(src_bbox, loc):
 2     """Decode bounding boxes from bounding box offsets and scales.
 3 
 4     Given bounding box offsets and scales computed by
 5     :meth:`bbox2loc`, this function decodes the representation to
 6     coordinates in 2D image coordinates.
 7 
 8     Given scales and offsets :math:`t_y, t_x, t_h, t_w` and a bounding
 9     box whose center is :math:`(y, x) = p_y, p_x` and size :math:`p_h, p_w`,
10     the decoded bounding box's center :math:`\\hat{g}_y`, :math:`\\hat{g}_x`
11     and size :math:`\\hat{g}_h`, :math:`\\hat{g}_w` are calculated
12     by the following formulas.
13 
14     * :math:`\\hat{g}_y = p_h t_y + p_y`
15     * :math:`\\hat{g}_x = p_w t_x + p_x`
16     * :math:`\\hat{g}_h = p_h \\exp(t_h)`
17     * :math:`\\hat{g}_w = p_w \\exp(t_w)`
18 
19     The decoding formulas are used in works such as R-CNN [#]_.
20 
21     The output is same type as the type of the inputs.
22 
23     .. [#] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. \
24     Rich feature hierarchies for accurate object detection and semantic \
25     segmentation. CVPR 2014.
26 
27     Args:
28         src_bbox (array): A coordinates of bounding boxes.
29             Its shape is :math:`(R, 4)`. These coordinates are
30             :math:`p_{ymin}, p_{xmin}, p_{ymax}, p_{xmax}`.
31         loc (array): An array with offsets and scales.
32             The shapes of :obj:`src_bbox` and :obj:`loc` should be same.
33             This contains values :math:`t_y, t_x, t_h, t_w`.
34 
35     Returns:
36         array:
37         Decoded bounding box coordinates. Its shape is :math:`(R, 4)`. \
38         The second axis contains four values \
39         :math:`\\hat{g}_{ymin}, \\hat{g}_{xmin},
40         \\hat{g}_{ymax}, \\hat{g}_{xmax}`.
41 
42     """
43 
44     if src_bbox.shape[0] == 0:
45         return xp.zeros((0, 4), dtype=loc.dtype)
46 
47     src_bbox = src_bbox.astype(src_bbox.dtype, copy=False)
48 
49     src_height = src_bbox[:, 2] - src_bbox[:, 0]
50     src_width = src_bbox[:, 3] - src_bbox[:, 1]
51     src_ctr_y = src_bbox[:, 0] + 0.5 * src_height
52     src_ctr_x = src_bbox[:, 1] + 0.5 * src_width
53 
54     dy = loc[:, 0::4]
55     dx = loc[:, 1::4]
56     dh = loc[:, 2::4]
57     dw = loc[:, 3::4]
58 
59     ctr_y = dy * src_height[:, xp.newaxis] + src_ctr_y[:, xp.newaxis]
60     ctr_x = dx * src_width[:, xp.newaxis] + src_ctr_x[:, xp.newaxis]
61     h = xp.exp(dh) * src_height[:, xp.newaxis]
62     w = xp.exp(dw) * src_width[:, xp.newaxis]
63 
64     dst_bbox = xp.zeros(loc.shape, dtype=loc.dtype)
65     dst_bbox[:, 0::4] = ctr_y - 0.5 * h
66     dst_bbox[:, 1::4] = ctr_x - 0.5 * w
67     dst_bbox[:, 2::4] = ctr_y + 0.5 * h
68     dst_bbox[:, 3::4] = ctr_x + 0.5 * w
69 
70     return dst_bbox

loc2bbox()

來看一下loc2bbox部分的代碼，首先是一個if判斷數據的類型，不是主要功能實現部分，緊接着src_height = src_bbox[:,2]- src_bbox[:,0]求出源框架的高度，用[:,2]-[:,0]，之因此這麼作是由於進行迴歸是要將數據格式從左上右下的座標表示形式轉化到中心點和長寬的表示形式，而bbox框的源位置類型應該是x0,y0,x1,y1這樣用第三個減去第一個獲得的天然是高度h，一樣的辦法也能夠求出寬度w,而後函數進行了中心點的求解，就是用左上角的x0+1/2h,y0+1/2w很直覺的就能夠求出中心點的座標，接下來利用 dy = loc[:,0::4],dx = loc[:,1::4],dh=loc[:,2::4],dw=loc[:,3::4]分別求出迴歸預測loc的四個參數dy,dx,dh,dw來對源框bbox進行修正，即利用下述公式分別將源框的位置座標轉化爲修正後框框的位置座標x,y和寬度及高度w,h，網絡

完成了目標框的位置肯定，最後再將中心點座標和長寬轉換成左上角和右下角座標的表示形式，就完成了loc2bbox函數的編寫app

2 def bbox2loc(src_bbox,dst_bbox):框架

這個函數的功能就是求出用於迴歸預測的ground_truth的值是多少，通俗點說就是你讓我根據anchor來預測真實的目標的位置，那我須要學習，我學習的過程你得給我一個用於計算損失函數的目標偏移量吧！沒錯，這個函數的做用就是用來計算這個目標偏移量的，bbox2loc,其計算遵循了下述公式：less

仔細看代碼你就會發現確實程序就是這樣寫的，首先一樣的計算出源框架也就是預測框架的中心點座標和它的長和寬，完成從左上右下表示的座標的方式到中心點座標表示方式的轉化獲得Px,Py,Pw,Ph，一樣的將ground_truth的左上右下角的座標轉換成中心點座標和長寬的形式也就是上面公式裏面的Gx,Gy,Gw,Gh，緊接着利用eps = xp.finfo(height.dtype).eps求出最小的正數，將height,width與其比較保證所有是非負！以後就利用上述的公式求出偏移量的值tx,ty,tw,th完成了從bbox到loc的轉化！dom

3 def bbox_iou(bbox_a, bbox_b):

 1 def bbox_iou(bbox_a, bbox_b):
 2     """Calculate the Intersection of Unions (IoUs) between bounding boxes.
 3 
 4     IoU is calculated as a ratio of area of the intersection
 5     and area of the union.
 6 
 7     This function accepts both :obj:`numpy.ndarray` and :obj:`cupy.ndarray` as
 8     inputs. Please note that both :obj:`bbox_a` and :obj:`bbox_b` need to be
 9     same type.
10     The output is same type as the type of the inputs.
11 
12     Args:
13         bbox_a (array): An array whose shape is :math:`(N, 4)`.
14             :math:`N` is the number of bounding boxes.
15             The dtype should be :obj:`numpy.float32`.
16         bbox_b (array): An array similar to :obj:`bbox_a`,
17             whose shape is :math:`(K, 4)`.
18             The dtype should be :obj:`numpy.float32`.
19 
20     Returns:
21         array:
22         An array whose shape is :math:`(N, K)`. \
23         An element at index :math:`(n, k)` contains IoUs between \
24         :math:`n` th bounding box in :obj:`bbox_a` and :math:`k` th bounding \
25         box in :obj:`bbox_b`.
26 
27     """
28     if bbox_a.shape[1] != 4 or bbox_b.shape[1] != 4:
29         raise IndexError
30 
31     # top left
32     tl = xp.maximum(bbox_a[:, None, :2], bbox_b[:, :2])
33     # bottom right
34     br = xp.minimum(bbox_a[:, None, 2:], bbox_b[:, 2:])
35 
36     area_i = xp.prod(br - tl, axis=2) * (tl < br).all(axis=2)
37     area_a = xp.prod(bbox_a[:, 2:] - bbox_a[:, :2], axis=1)
38     area_b = xp.prod(bbox_b[:, 2:] - bbox_b[:, :2], axis=1)
39     return area_i / (area_a[:, None] + area_b - area_i)

bbox_iou

顧名思義，這個函數的做用就是計算兩個bbox的IOU，所謂的IOU其實就是交併比，而這個交併比就是兩個IOU相交的面積除以相併的面積，用公式來表示就是：

　　　　　　 $\frac{IOUA \cap IOUB }{IOUA \cup IOUB}$

這樣的表達應該足夠直觀了吧，而整個函數也正是按照這個思路進行的，來看代碼首先不知足.shape[1]的判斷，說明bbox的形狀不完整，直接raise IndexError ，而後分別取兩個IOU左上的最大值和右下的最小值，這樣其實就是完成了相交的工做(由於bbox如今的表示方式是左上座標和右下座標)，以後利用numpy.prod返回給定軸上數組元素的乘積，分別求出area_i (相交的面積) area_a,area_b(兩個bbox的面積)最後直接利用公式 area_i / area_a +area_b - area_i 就求出了兩個框框之間的交併比!

4 def generate_anchor_base (base_size=16,ratios=[0.5,1,2],anchor_scales=[8,16,32]):

 1 def generate_anchor_base(base_size=16, ratios=[0.5, 1, 2],
 2                          anchor_scales=[8, 16, 32]):
 3     """Generate anchor base windows by enumerating aspect ratio and scales.
 4 
 5     Generate anchors that are scaled and modified to the given aspect ratios.
 6     Area of a scaled anchor is preserved when modifying to the given aspect
 7     ratio.
 8 
 9     :obj:`R = len(ratios) * len(anchor_scales)` anchors are generated by this
10     function.
11     The :obj:`i * len(anchor_scales) + j` th anchor corresponds to an anchor
12     generated by :obj:`ratios[i]` and :obj:`anchor_scales[j]`.
13 
14     For example, if the scale is :math:`8` and the ratio is :math:`0.25`,
15     the width and the height of the base window will be stretched by :math:`8`.
16     For modifying the anchor to the given aspect ratio,
17     the height is halved and the width is doubled.
18 
19     Args:
20         base_size (number): The width and the height of the reference window.
21         ratios (list of floats): This is ratios of width to height of
22             the anchors.
23         anchor_scales (list of numbers): This is areas of anchors.
24             Those areas will be the product of the square of an element in
25             :obj:`anchor_scales` and the original area of the reference
26             window.
27 
28     Returns:
29         ~numpy.ndarray:
30         An array of shape :math:`(R, 4)`.
31         Each element is a set of coordinates of a bounding box.
32         The second axis corresponds to
33         :math:`(y_{min}, x_{min}, y_{max}, x_{max})` of a bounding box.
34 
35     """
36     py = base_size / 2.
37     px = base_size / 2.
38 
39     anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4),
40                            dtype=np.float32)
41     for i in six.moves.range(len(ratios)):
42         for j in six.moves.range(len(anchor_scales)):
43             h = base_size * anchor_scales[j] * np.sqrt(ratios[i])
44             w = base_size * anchor_scales[j] * np.sqrt(1. / ratios[i])
45 
46             index = i * len(anchor_scales) + j
47             anchor_base[index, 0] = py - h / 2.
48             anchor_base[index, 1] = px - w / 2.
49             anchor_base[index, 2] = py + h / 2.
50             anchor_base[index, 3] = px + w / 2.
51     return anchor_base

generate_anchor_base

這個函數的做用就是產生(0,0)座標開始的基礎的9個anchor框，(0,0)座標是指的一次提取的特徵圖而言，從函數的名字咱們也能夠看出來，generate_anchor_base,

分析一下函數的參數base_size=16就是基礎的anchor的寬和高實際上是16的大小，再根據不一樣的放縮比和寬高比進行進一步的調整，ratios就是指的寬高的放縮比分別是0.5:1,1:1,1:2這樣，最後一個參數是anchor_scales也就是在base_size的基礎上再增長的量，本代碼中對應着三種面積的大小(16*8)²,

(16*16)² (16*32)² 也就是128,256,512的平方大小，三種面積乘以三种放縮比就剛恰好是9種anchor，示意圖以下：

(圖來自於三年一夢博客，文末附地址)

其實，Faster-rcnn的重要思想就是在這個地方體現出來了，到底怎樣進行目標檢測？如何才能不漏下任何一個目標？那就是遍歷的方法，不是遍歷圖片，而是遍歷特徵圖，對一次提取的特徵圖進行遍歷(3*3的卷積核挨個特徵產生anchor) 依次產生9個長寬比尺寸不一樣的anchor，力求將全部的在圖中的目標都框住，產生完anchor以後再送入到9×2和9×4的Fc網絡用來作分類和迴歸，對產生的anchor進行進一步的修正，這樣幾乎以極大的機率能夠將圖中全部的目標所有框住了！後續再進行一些處理，如非極大值抑制，抑制住重複框住的anchor，產生良好的可視效果！

若是你仔細看過generate_anchor_base的代碼，你可能會發現一些端倪，就在這一句 anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4),dtype=np.float32) 這個地方的anchor_base爲何是np.zeros啊，沒錯，這樣初始的座標都是(0,0,0,0)，對呀，其實這個函數一開始就只是以特徵圖的左上角爲基準產生的9個anchor,根本沒有對全圖的全部anchor的產生作任何的解釋！那全部的anchor是在哪裏產生的呢？答案是在 model / region_proposal_network裏！！

咱們這就來看一下它的代碼：根據函數的名字找了一下應該是_enumerate_shifted_anchor無疑了！下面接着來看看到底這個函數是如何產生整個特徵圖全部的anchor的！

 1 def _enumerate_shifted_anchor(anchor_base, feat_stride, height, width):
 2     # Enumerate all shifted anchors:
 3     #
 4     # add A anchors (1, A, 4) to
 5     # cell K shifts (K, 1, 4) to get
 6     # shift anchors (K, A, 4)
 7     # reshape to (K*A, 4) shifted anchors
 8     # return (K*A, 4)
 9 
10     # !TODO: add support for torch.CudaTensor
11     # xp = cuda.get_array_module(anchor_base)
12     # it seems that it can't be boosed using GPU
13     import numpy as xp
14     shift_y = xp.arange(0, height * feat_stride, feat_stride)
15     shift_x = xp.arange(0, width * feat_stride, feat_stride)
16     shift_x, shift_y = xp.meshgrid(shift_x, shift_y)
17     shift = xp.stack((shift_y.ravel(), shift_x.ravel(),
18                       shift_y.ravel(), shift_x.ravel()), axis=1)
19 
20     A = anchor_base.shape[0]
21     K = shift.shape[0]
22     anchor = anchor_base.reshape((1, A, 4)) + \
23              shift.reshape((1, K, 4)).transpose((1, 0, 2))
24     anchor = anchor.reshape((K * A, 4)).astype(np.float32)
25     return anchor

_enumerate_shifted_anchor

先來解釋一下對於feature map的每個點產生anchor的思想，正如代碼中描述的那樣，首先是將特徵圖放大16倍對應回原圖，爲何要放大16倍，由於原圖是通過4次pooling獲得的特徵圖，因此縮小了16倍，對應於代碼的

shift_y /shift_x = xp.arange(0, height * feat_stride, feat_stride) 而這個feat_stride=16就是放大的倍數，arange()函數你們都清楚，最後獲得的效果就是縱橫向都擴大了16倍對應回原圖大小，shift_x,shift_y = xp.meshgrid(shift_x,shift_y)就是造成了一個縱橫向偏移量的矩陣，也就是特徵圖的每一點都可以經過這個矩陣找到映射在原圖中的具體位置！

具體來看這個函數的實現方法，說實話，多是由於個人python功力不夠的緣由，這個函數我第二遍看仍是思索了良久纔想出來，本着不誤人子弟的原則，哈哈，

首先是shift_y = xp.arange(0, height * feat_stride, feat_stride) 這個已經介紹過了，就是以feat_stride爲間距產生從(0,height*feat_stride)的一行，一樣的shift_x就是以feat_stride產生從(0,width*feat_stride)的一行，而後重點來了！

shift_x, shift_y = xp.meshgrid(shift_x, shift_y) 這個meshgrid是一個畫網格函數，具體的能夠百度它的做用,就是以shift_x爲行，以shift_y的行爲列產生矩陣，一樣shift_y是以shift_y的行爲列，以shift_x的行的個數爲列數產生矩陣，描述不太清楚，看下例子：　　

最後獲得的X,Y的結果是：再解釋一下產生的大X 以x的行爲行，以y的元素個數爲列構成矩陣，一樣的產生的Y以y的行做爲列，以x的元素個數做爲列數產生矩陣！！！

緊接着 shift = xp.stack((shift_y.ravel(), shift_x.ravel(), shift_y.ravel(), shift_x.ravel()), axis=1)

通過剛纔的變化，其實大X,Y的元素個數已經相同，看矩陣的結構也能看出，矩陣大小是相同的，X.ravel()以後變成一行，此時shift_x,shift_y的元素個數是相同的，都等於特徵圖的長寬的乘積(像素點個數)，不一樣的是此時的shift_x裏面裝得是橫向看的x的一行一行的偏移座標，而此時的y裏面裝得是對應的縱向的偏移座標！若是畫個圖的話就是這樣：

（單畫個示意圖，以後會更新電子版）

最後代碼中的shift變量就變成了以特徵圖像素總個數爲行，4列的這樣的數據格式，能夠看到下面的代碼也會resize(1,K,4)這樣的格式！接下來A = anchor_base.shape[0]代碼讀取base_anchor的個數，這裏應該是等於9的，由於有9個base_anchor,一樣K=shift.shapep[0]這裏就是讀取特徵圖中元素的總個數！也就是咱們畫的四列中的一列的行數，可能會有人費解，爲什要堆疊成四列呢，我以爲應該是由於anchor的表示是左上右下座標的形式，全部有四個座標，而每兩列剛好表明了橫縱座標的偏移量也就是一個點，因此最後用四列表明瞭兩個點的偏移量。

anchor = anchor_base.reshape((1, A, 4)) + shift.reshape((1, K, 4)).transpose((1, 0, 2)) 命運終於無情的對小貓咪下手了，沒錯，這纔是整個函數的關鍵，用基礎的9個anchor的座標分別和偏移量相加，最後得出了全部的anchor的座標，四列能夠堪稱是左上角的座標和右下角的座標加偏移量的同步執行，飛速的從上往下捋一遍，全部的anchor就都出來了！一共K個特徵點，每個有A(9)個基本的anchor，因此最後reshape((K*A),4)的形式，也就獲得了最後的全部的anchor.

最後上一張示意圖：

（圖源機器之心博客）

每一個特徵點之間間距是16的距離，其實我以爲單看這張圖真的不足以表達enumerate_shfited_anchor函數的運行過程，只是一個便於理解的示意圖！

錨點及其9個anchor 單個錨點在一張圖上表達全部錨點的全部anchor在整幅圖上的表達

最後一個問題就是爲何要將特徵圖對應回原圖的大小呢？由於你要框住的待檢測目標是在原圖，而你選取anchor是在特徵圖上，pooling以後特徵之間的相對位置不變，可是尺寸縮小爲原來的1/16,也就是說，一個點對應於原圖的16個點的信息，原圖和特徵圖對應的感覺野是不同的，而你的anchor目的是爲了框原圖的目標的，若是不remap回原圖的話，你一個base_size的anchor基本就框住了特徵圖的大部分的信息，這樣的目標檢測沒有任何意義的，論文做者之因此採用這種網絡結構其中一個目的也是爲了讓特徵圖和原圖的對應關係明確，方便remap回原圖從而選取anchor，產生proposal!

二 Creator_tools.py文件

creator.py文件也是須要重點解釋的部分，由於它基本上涵蓋了整個網絡優化的所有內容，其中的三個creator每個都有本身獨特的做用，因此在正式介紹代碼以前，我想先解釋一點理論部分的內容，先看下代碼的框架來認識下今天要講的主角吧！

大體的函數框架是這樣的，能夠看出裏面有三個主要的類 1ProposalTargetCreator 2 AnchorTargetCreator 3 ProposalCreator

首先介紹

1 AnchorTargetCreator

AnchorTargetCreator的做用是啥呢？其實AnchorTargetCreator的做用就是爲Faster-RCNN專有的RPN網絡提供自我訓練的樣本，RPN網絡正是利用AnchorTargetCreator產生的樣本做爲數據進行網絡的訓練和學習的，這樣產生的預測anchor的類別和位置才更加精確，anchor變成真正的ROIS須要進行位置修正，而AnchorTargetCreator產生的帶標籤的樣本就是給RPN網絡進行訓練學習用噠！自我修正自我提升！

那麼AnchorTargetCreator選取樣本的標準是什麼呢？

答：以前咱們直到_enumerate_shifted_anchor函數在一幅圖上產生了20000多個anchor，而AnchorTargetCreator就是要從20000多個Anchor選出256個用於二分類和全部的位置迴歸！爲預測值提供對應的真實值，選取的規則是：

1.對於每個Ground_truth bounding_box 從anchor中選取和它重疊度最高的一個anchor做爲樣本！

2 從剩下的anchor中選取和Ground_truth bounding_box重疊度超過0.7的anchor做爲樣本，注意正樣本的數目不能超過128

3隨機的從剩下的樣本中選取和gt_bbox重疊度小於0.3的anchor做爲負樣本，正負樣本之和爲256

PS:須要注意的是對於每個anchor，gt_label要麼爲1,要麼爲0,因此這樣實現二分類，而計算迴歸損失時，只有正樣本計算損失，負樣本不參與計算。

有了這幾條規則咱們來看AnchorTargetCreator的函數：

 1 class AnchorTargetCreator(object):
 2     """Assign the ground truth bounding boxes to anchors.
 3 
 4     Assigns the ground truth bounding boxes to anchors for training Region
 5     Proposal Networks introduced in Faster R-CNN [#]_.
 6 
 7     Offsets and scales to match anchors to the ground truth are
 8     calculated using the encoding scheme of
 9     :func:`model.utils.bbox_tools.bbox2loc`.
10 
11     .. [#] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. \
12     Faster R-CNN: Towards Real-Time Object Detection with \
13     Region Proposal Networks. NIPS 2015.
14 
15     Args:
16         n_sample (int): The number of regions to produce.
17         pos_iou_thresh (float): Anchors with IoU above this
18             threshold will be assigned as positive.
19         neg_iou_thresh (float): Anchors with IoU below this
20             threshold will be assigned as negative.
21         pos_ratio (float): Ratio of positive regions in the
22             sampled regions.
23 
24     """
25 
26     def __init__(self,
27                  n_sample=256,
28                  pos_iou_thresh=0.7, neg_iou_thresh=0.3,
29                  pos_ratio=0.5):
30         self.n_sample = n_sample
31         self.pos_iou_thresh = pos_iou_thresh
32         self.neg_iou_thresh = neg_iou_thresh
33         self.pos_ratio = pos_ratio
34 
35     def __call__(self, bbox, anchor, img_size):
36         """Assign ground truth supervision to sampled subset of anchors.
37 
38         Types of input arrays and output arrays are same.
39 
40         Here are notations.
41 
42         * :math:`S` is the number of anchors.
43         * :math:`R` is the number of bounding boxes.
44 
45         Args:
46             bbox (array): Coordinates of bounding boxes. Its shape is
47                 :math:`(R, 4)`.
48             anchor (array): Coordinates of anchors. Its shape is
49                 :math:`(S, 4)`.
50             img_size (tuple of ints): A tuple :obj:`H, W`, which
51                 is a tuple of height and width of an image.
52 
53         Returns:
54             (array, array):
55 
56             #NOTE: it's scale not only  offset
57             * **loc**: Offsets and scales to match the anchors to \
58                 the ground truth bounding boxes. Its shape is :math:`(S, 4)`.
59             * **label**: Labels of anchors with values \
60                 :obj:`(1=positive, 0=negative, -1=ignore)`. Its shape \
61                 is :math:`(S,)`.
62 
63         """
64 
65         img_H, img_W = img_size
66 
67         n_anchor = len(anchor)
68         inside_index = _get_inside_index(anchor, img_H, img_W)
69         anchor = anchor[inside_index]
70         argmax_ious, label = self._create_label(
71             inside_index, anchor, bbox)
72 
73         # compute bounding box regression targets
74         loc = bbox2loc(anchor, bbox[argmax_ious])
75 
76         # map up to original set of anchors
77         label = _unmap(label, n_anchor, inside_index, fill=-1)
78         loc = _unmap(loc, n_anchor, inside_index, fill=0)
79 
80         return loc, label

AnchorTargeCreator

首先是讀取圖片的尺寸大小，而後用len(anchor)讀取anchor的個數，通常對應20000個左右，以後調用_get_inside_index(anchor,img_H,img_W)來將那些超出圖片範圍的anchor所有去掉，mask只保留位於圖片內部的，再調用self._create_label(inside_index,anchor,bbox)篩選出符合條件的正例128個負例128並給它們附上相應的label，最後調用bbox2loc將anchor和bbox進行求誤差看成迴歸計算的目標！

展開來仔細看下_create_label 這個函數：

首先初始化label,而後label.fill(-1)將全部標號所有置爲-1,調用_clac_ious(anchor,bbox,inside_dex)產生argmax_ious,max_ious,gt_argmax_ious ，以後進行判斷，若是label[max_ious<self.neg_ious_thresh] = 0定義爲負樣本，而label[gt_argmax_ous]=1直接定義爲正樣本，同時label[max_ious>self.pos_iou_thresh]=1也定義爲正樣本，這裏的定義規則其實gt_argmax_ious就是和gt_bbox重疊讀最高的anchor，直接定義爲正樣本,而max_ious就是指的重疊度大於0.7的直接定義爲正樣本，而小於0.3的定義負樣本，和開始講的規則實際是一一對應的，程序接下來還有一個判斷就是說若是選出來的label==1的個數大於pos_ratio*n_samples就是正樣本若是按照這個規則選取多了的話，就調用np.random.choice(pos_index,size(len(pos_index)-n_pos),replace=False)就是總數不變隨機丟棄掉一些正樣本的意思！一樣的方法若是負樣本選擇多了也隨機丟棄掉一些，最後將序列argmax_ious,label返回！

這裏其實我寫博客的時候有一句代碼想不到合理的解釋：就是loc = bbox2loc(anchor,bbox[argmax_ious]]) 爲何anchor要和argmax_ious進行bbox2loc???究竟是哪些和那些？一對一仍是一對多啊？？後來我頓悟了，argmax_ious原本就是按順序針對每個anchor分別和IOUS進行交併比選取每一行的最大值產生的啊！argmax_ious只是列的序號，加上bbox就完成了bbox的選擇工做，anchor天然要和對應最大的那個進行相交唄，還有一個問題就是爲何全部的anchor都要求bbox2loc?那可有20000個呢！！哈哈，記得我「答」那一行寫的啥不，選出256用於二分類和全部的進行位置迴歸！是全部的啊，程序這裏不就是很好的體現嗎！

2ProposalCreator

proposal的做用又是啥咧？其實proposalCreator作的就是生成ROIS的過程，並且整個過程只有前向計算沒有反向傳播，因此徹底能夠只用numpy和Tensor就把它計算出來咯！那具體的選取流程又是啥樣的呢？

1對於每張圖片，利用FeatureMap,計算H/16*W/16*9大約20000個anchor屬於前景的機率和其對應的位置參數，這個就是RPN網絡正向做用的過程，沒毛病，而後從中選取機率較大的12000張，利用位置迴歸參數，修正這12000個anchor的位置4 利用非極大值抑制，選出2000個ROIS！沒錯，整個流程讀下來發現確實只有前向傳播的過程

看代碼：

  1 class ProposalCreator:
  2     # unNOTE: I'll make it undifferential
  3     # unTODO: make sure it's ok
  4     # It's ok
  5     """Proposal regions are generated by calling this object.
  6 
  7     The :meth:`__call__` of this object outputs object detection proposals by
  8     applying estimated bounding box offsets
  9     to a set of anchors.
 10 
 11     This class takes parameters to control number of bounding boxes to
 12     pass to NMS and keep after NMS.
 13     If the paramters are negative, it uses all the bounding boxes supplied
 14     or keep all the bounding boxes returned by NMS.
 15 
 16     This class is used for Region Proposal Networks introduced in
 17     Faster R-CNN [#]_.
 18 
 19     .. [#] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. \
 20     Faster R-CNN: Towards Real-Time Object Detection with \
 21     Region Proposal Networks. NIPS 2015.
 22 
 23     Args:
 24         nms_thresh (float): Threshold value used when calling NMS.
 25         n_train_pre_nms (int): Number of top scored bounding boxes
 26             to keep before passing to NMS in train mode.
 27         n_train_post_nms (int): Number of top scored bounding boxes
 28             to keep after passing to NMS in train mode.
 29         n_test_pre_nms (int): Number of top scored bounding boxes
 30             to keep before passing to NMS in test mode.
 31         n_test_post_nms (int): Number of top scored bounding boxes
 32             to keep after passing to NMS in test mode.
 33         force_cpu_nms (bool): If this is :obj:`True`,
 34             always use NMS in CPU mode. If :obj:`False`,
 35             the NMS mode is selected based on the type of inputs.
 36         min_size (int): A paramter to determine the threshold on
 37             discarding bounding boxes based on their sizes.
 38 
 39     """
 40 
 41     def __init__(self,
 42                  parent_model,
 43                  nms_thresh=0.7,
 44                  n_train_pre_nms=12000,
 45                  n_train_post_nms=2000,
 46                  n_test_pre_nms=6000,
 47                  n_test_post_nms=300,
 48                  min_size=16
 49                  ):
 50         self.parent_model = parent_model
 51         self.nms_thresh = nms_thresh
 52         self.n_train_pre_nms = n_train_pre_nms
 53         self.n_train_post_nms = n_train_post_nms
 54         self.n_test_pre_nms = n_test_pre_nms
 55         self.n_test_post_nms = n_test_post_nms
 56         self.min_size = min_size
 57 
 58     def __call__(self, loc, score,
 59                  anchor, img_size, scale=1.):
 60         """input should  be ndarray
 61         Propose RoIs.
 62 
 63         Inputs :obj:`loc, score, anchor` refer to the same anchor when indexed
 64         by the same index.
 65 
 66         On notations, :math:`R` is the total number of anchors. This is equal
 67         to product of the height and the width of an image and the number of
 68         anchor bases per pixel.
 69 
 70         Type of the output is same as the inputs.
 71 
 72         Args:
 73             loc (array): Predicted offsets and scaling to anchors.
 74                 Its shape is :math:`(R, 4)`.
 75             score (array): Predicted foreground probability for anchors.
 76                 Its shape is :math:`(R,)`.
 77             anchor (array): Coordinates of anchors. Its shape is
 78                 :math:`(R, 4)`.
 79             img_size (tuple of ints): A tuple :obj:`height, width`,
 80                 which contains image size after scaling.
 81             scale (float): The scaling factor used to scale an image after
 82                 reading it from a file.
 83 
 84         Returns:
 85             array:
 86             An array of coordinates of proposal boxes.
 87             Its shape is :math:`(S, 4)`. :math:`S` is less than
 88             :obj:`self.n_test_post_nms` in test time and less than
 89             :obj:`self.n_train_post_nms` in train time. :math:`S` depends on
 90             the size of the predicted bounding boxes and the number of
 91             bounding boxes discarded by NMS.
 92 
 93         """
 94         # NOTE: when test, remember
 95         # faster_rcnn.eval()
 96         # to set self.traing = False
 97         if self.parent_model.training:
 98             n_pre_nms = self.n_train_pre_nms
 99             n_post_nms = self.n_train_post_nms
100         else:
101             n_pre_nms = self.n_test_pre_nms
102             n_post_nms = self.n_test_post_nms
103 
104         # Convert anchors into proposal via bbox transformations.
105         # roi = loc2bbox(anchor, loc)
106         roi = loc2bbox(anchor, loc)
107 
108         # Clip predicted boxes to image.
109         roi[:, slice(0, 4, 2)] = np.clip(
110             roi[:, slice(0, 4, 2)], 0, img_size[0])
111         roi[:, slice(1, 4, 2)] = np.clip(
112             roi[:, slice(1, 4, 2)], 0, img_size[1])
113 
114         # Remove predicted boxes with either height or width < threshold.
115         min_size = self.min_size * scale
116         hs = roi[:, 2] - roi[:, 0]
117         ws = roi[:, 3] - roi[:, 1]
118         keep = np.where((hs >= min_size) & (ws >= min_size))[0]
119         roi = roi[keep, :]
120         score = score[keep]
121 
122         # Sort all (proposal, score) pairs by score from highest to lowest.
123         # Take top pre_nms_topN (e.g. 6000).
124         order = score.ravel().argsort()[::-1]
125         if n_pre_nms > 0:
126             order = order[:n_pre_nms]
127         roi = roi[order, :]
128 
129         # Apply nms (e.g. threshold = 0.7).
130         # Take after_nms_topN (e.g. 300).
131 
132         # unNOTE: somthing is wrong here!
133         # TODO: remove cuda.to_gpu
134         keep = non_maximum_suppression(
135             cp.ascontiguousarray(cp.asarray(roi)),
136             thresh=self.nms_thresh)
137         if n_post_nms > 0:
138             keep = keep[:n_post_nms]
139         roi = roi[keep]
140         return roi

ProposalCreator

下面對應選取流程解釋代碼：

最開始初始化一些參數，好比nms_thresh=0.7,訓練和測試選取不一樣的樣本，min_size=16等等，果真代碼一進來就針對訓練和測試過程分別設置了不一樣的參數，而後rois = loc2bbox(anchor,loc)利用預測的修正值，對12000個anchor進行修正，

以後利用numpy.clip(rois[:,slice(0,4,2)],0,img_size[0])函數將產生的rois的大小所有裁剪到圖片範圍之內！而後計算圖片的高度和寬度，兩者任何一個小於開始咱們規定的min_size都直接mask掉！只保留剩下的Rois，而後對剩下的ROIs進行打分，對獲得的分數進行合併而後進行排序，只保留屬於前景的機率排序後的前12000/6000個（分別對應訓練和測試時候的配置），以後再調用非極大值抑制函數，將重複的抑制掉，就能夠將篩選後ROIS進行返回啦！ProposalCreator的函數的說明也結束了

3Proposal_TargetCreator

好了，到了最後一個須要解釋的部分了，唉，這個博客感受寫了很久啊，整整白天一天！ Proposal_TragetCreator的做用又是什麼呢？簡略點說那就是提供GroundTruth樣本供ROISHeads網絡進行自我訓練的！那這個ROISHeads網絡又是什麼呢？就是接收ROIS對它進行n_class類別的預測以及最終目標檢測位置的！也就是最終輸出結果的網絡啊，你說它重要不重要！最終輸出結果的網絡的性能的好壞徹底取決於它，確定重要唄！一樣解釋下它的流程：

ProposalCreator產生2000個ROIS，可是這些ROIS並不都用於訓練，通過本ProposalTargetCreator的篩選產生128個用於自身的訓練，規則以下:

1 ROIS和GroundTruth_bbox的IOU大於0.5,選取一些(好比說本實驗的32個)做爲正樣本

2 選取ROIS和GroundTruth_bbox的IOUS小於等於0的選取一些好比說選取128-32=96個做爲負樣本

3而後分別對ROI_Headers進行訓練

對應的代碼部分以下：

  1 class ProposalTargetCreator(object):
  2     """Assign ground truth bounding boxes to given RoIs.
  3 
  4     The :meth:`__call__` of this class generates training targets
  5     for each object proposal.
  6     This is used to train Faster RCNN [#]_.
  7 
  8     .. [#] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. \
  9     Faster R-CNN: Towards Real-Time Object Detection with \
 10     Region Proposal Networks. NIPS 2015.
 11 
 12     Args:
 13         n_sample (int): The number of sampled regions.
 14         pos_ratio (float): Fraction of regions that is labeled as a
 15             foreground.
 16         pos_iou_thresh (float): IoU threshold for a RoI to be considered as a
 17             foreground.
 18         neg_iou_thresh_hi (float): RoI is considered to be the background
 19             if IoU is in
 20             [:obj:`neg_iou_thresh_hi`, :obj:`neg_iou_thresh_hi`).
 21         neg_iou_thresh_lo (float): See above.
 22 
 23     """
 24 
 25     def __init__(self,
 26                  n_sample=128,
 27                  pos_ratio=0.25, pos_iou_thresh=0.5,
 28                  neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.0
 29                  ):
 30         self.n_sample = n_sample
 31         self.pos_ratio = pos_ratio
 32         self.pos_iou_thresh = pos_iou_thresh
 33         self.neg_iou_thresh_hi = neg_iou_thresh_hi
 34         self.neg_iou_thresh_lo = neg_iou_thresh_lo  # NOTE:default 0.1 in py-faster-rcnn
 35 
 36     def __call__(self, roi, bbox, label,
 37                  loc_normalize_mean=(0., 0., 0., 0.),
 38                  loc_normalize_std=(0.1, 0.1, 0.2, 0.2)):
 39         """Assigns ground truth to sampled proposals.
 40 
 41         This function samples total of :obj:`self.n_sample` RoIs
 42         from the combination of :obj:`roi` and :obj:`bbox`.
 43         The RoIs are assigned with the ground truth class labels as well as
 44         bounding box offsets and scales to match the ground truth bounding
 45         boxes. As many as :obj:`pos_ratio * self.n_sample` RoIs are
 46         sampled as foregrounds.
 47 
 48         Offsets and scales of bounding boxes are calculated using
 49         :func:`model.utils.bbox_tools.bbox2loc`.
 50         Also, types of input arrays and output arrays are same.
 51 
 52         Here are notations.
 53 
 54         * :math:`S` is the total number of sampled RoIs, which equals \
 55             :obj:`self.n_sample`.
 56         * :math:`L` is number of object classes possibly including the \
 57             background.
 58 
 59         Args:
 60             roi (array): Region of Interests (RoIs) from which we sample.
 61                 Its shape is :math:`(R, 4)`
 62             bbox (array): The coordinates of ground truth bounding boxes.
 63                 Its shape is :math:`(R', 4)`.
 64             label (array): Ground truth bounding box labels. Its shape
 65                 is :math:`(R',)`. Its range is :math:`[0, L - 1]`, where
 66                 :math:`L` is the number of foreground classes.
 67             loc_normalize_mean (tuple of four floats): Mean values to normalize
 68                 coordinates of bouding boxes.
 69             loc_normalize_std (tupler of four floats): Standard deviation of
 70                 the coordinates of bounding boxes.
 71 
 72         Returns:
 73             (array, array, array):
 74 
 75             * **sample_roi**: Regions of interests that are sampled. \
 76                 Its shape is :math:`(S, 4)`.
 77             * **gt_roi_loc**: Offsets and scales to match \
 78                 the sampled RoIs to the ground truth bounding boxes. \
 79                 Its shape is :math:`(S, 4)`.
 80             * **gt_roi_label**: Labels assigned to sampled RoIs. Its shape is \
 81                 :math:`(S,)`. Its range is :math:`[0, L]`. The label with \
 82                 value 0 is the background.
 83 
 84         """
 85         n_bbox, _ = bbox.shape
 86 
 87         roi = np.concatenate((roi, bbox), axis=0)
 88 
 89         pos_roi_per_image = np.round(self.n_sample * self.pos_ratio)
 90         iou = bbox_iou(roi, bbox)
 91         gt_assignment = iou.argmax(axis=1)
 92         max_iou = iou.max(axis=1)
 93         # Offset range of classes from [0, n_fg_class - 1] to [1, n_fg_class].
 94         # The label with value 0 is the background.
 95         gt_roi_label = label[gt_assignment] + 1
 96 
 97         # Select foreground RoIs as those with >= pos_iou_thresh IoU.
 98         pos_index = np.where(max_iou >= self.pos_iou_thresh)[0]
 99         pos_roi_per_this_image = int(min(pos_roi_per_image, pos_index.size))
100         if pos_index.size > 0:
101             pos_index = np.random.choice(
102                 pos_index, size=pos_roi_per_this_image, replace=False)
103 
104         # Select background RoIs as those within
105         # [neg_iou_thresh_lo, neg_iou_thresh_hi).
106         neg_index = np.where((max_iou < self.neg_iou_thresh_hi) &
107                              (max_iou >= self.neg_iou_thresh_lo))[0]
108         neg_roi_per_this_image = self.n_sample - pos_roi_per_this_image
109         neg_roi_per_this_image = int(min(neg_roi_per_this_image,
110                                          neg_index.size))
111         if neg_index.size > 0:
112             neg_index = np.random.choice(
113                 neg_index, size=neg_roi_per_this_image, replace=False)
114 
115         # The indices that we're selecting (both positive and negative).
116         keep_index = np.append(pos_index, neg_index)
117         gt_roi_label = gt_roi_label[keep_index]
118         gt_roi_label[pos_roi_per_this_image:] = 0  # negative labels --> 0
119         sample_roi = roi[keep_index]
120 
121         # Compute offsets and scales to match sampled RoIs to the GTs.
122         gt_roi_loc = bbox2loc(sample_roi, bbox[gt_assignment[keep_index]])
123         gt_roi_loc = ((gt_roi_loc - np.array(loc_normalize_mean, np.float32)
124                        ) / np.array(loc_normalize_std, np.float32))
125 
126         return sample_roi, gt_roi_loc, gt_roi_label

ProposalTargetCreator

由於這些數據是要放入到整個大網絡裏進行訓練的，好比說位置數據，因此要對其位置座標進行數據加強處理(歸一化處理)

首先肯定bbox.shape找出n_bbox的個數，而後將bbox和rois鏈接起來，肯定須要的正樣本的個數，調用bbox_iou進行IOU的計算，按行找到最大值，返回最大值對應的序號以及其真正的IOU，以後利用最大值的序號將那些挑出的最大值的label+1從0,n_fg_class-1 變到1,n_fg_class，一樣的根據IOUS的最大值將正負樣本找出來，若是找出的樣本數目過多就隨機丟掉一些，以後將正負樣本序號鏈接起來，獲得它們對應的真實的label，而後統一的將負樣本的label所有置爲0，這樣篩選出來的樣本的label就已經肯定了，以後將sample_rois取出來，根據它和bbox的偏移量計算出loc,最後返回sample_rois,gt_roi_loc和gt_rois_label，完成任務使命！

終於將模型準備部分所有講解完了，若是你對解釋中的名詞有一些不理解，好比說不知道ROIS_Heads對應哪一部分網絡，我能夠再附上一個網絡的框架圖，固然不是我畫的，是原做者畫得，算是增強印象吧，打了這麼多的字真的是累了，若是你看完以爲對理解代碼確實有點幫助，歡迎給我留言哦！～共同窗習進步！peace!

參考博客： https://www.cnblogs.com/king-lps/p/8981222.html