Faster-RCNN tensorflow 程序細節

時間 2019-12-11

標籤 faster rcnn tensorflow 程序細節简体版

原文原文鏈接

tf-faster-rcnn github：https://github.com/endernewton/tf-faster-rcnngit

backbone，例如vgg，conv層不改變feature大小，pooling層輸出(w/2, h/2)，有4個pooling因此featuremap變爲原圖1/16大小。github

檢測RPN模塊：算法

例如任意圖片reshape到800*600，輸入網絡過vgg，conv5_3->rpn_conv/3*3->rpn_relu 獲得feature map (1，512，50，38)，接下來兩個1*1的卷積分別用於每一個點9個anchor前背景分類(1，18，50，38)，anchor偏移量的預測(1，36，50，38)，50 x 38 x 9 = 17100 從原圖中扣出來的anchor數。(rpn_bbox_pred(偏移)+rpn_cls_prob_reshape(前背景))->proposal_layer 修正後的proposal，濾掉超出原圖的proposal後NMS以及機率排序等操做後得到最終的boundingbox。輸出大小爲:(N,4)，這裏的N與NMS以及機率排序閾值有關，獲得的就是boundingbox的四個座標。數組

生成anchors，利用tx, ty, tw, th對全部的anchors作bbox regression迴歸（這裏的anchors生成和訓練時徹底一致）
按照輸入的foreground softmax scores由大到小排序anchors，提取前pre_nms_topN(e.g. 6000)個anchors，即提取修正位置後的foreground anchors。
限定超出圖像邊界的foreground anchors爲圖像邊界（防止後續roi pooling時proposal超出圖像邊界）
剔除很是小（width<threshold or height<threshold）的foreground anchors
進行non maximum suppression
再次按照nms後的foreground softmax scores由大到小排序fg anchors，提取前post_nms_topN(e.g. 300)結果做爲proposal輸出。

1. 生成anchors，首先生成一個base anchor，而後以base anchor爲基礎，在原圖像中移動，生成原圖像中的anchors。網絡

2. 在conv5-3這一層中，第一個feature的點，對應的原圖像中[0,0]的點，第二個feature的點，對應原圖像中的[16，0]的點，經過乘以步長，能夠創建feature map上的特徵和原圖像中anchors的映射關係。ide

3. RPN層的第一步是用[3, 3, 512]的卷積核在conv5-3上進行卷積操做，conv5-3上的每個像素點，對應的是原圖中的一個近似於16*16的區域，因此這也就是爲何文章中說的把每個中心點的像素轉換爲256-D的vector，但TensorFlow實現這裏用了更厚的feature map，個數變爲了512，因而好比[0, 0, :]就是一個512-D的vector。post

4. 在這個512-D的vector基礎上又有兩個全卷積網絡，一個是[1, 1]的卷積核，可是輸出是9*2，由於9個anchor，每一個anchor都有兩個值，有目標仍是沒有目標，因此是18，因此這個輸出的大小的height和width與conv5-4的大小一致。用於定義objectness。spa

5. 一樣的，在這個512-D向量的基礎上，定義了一個[1, 1]的卷積核，輸出爲9*4，由於9個anchor，每一個anchor都有4個值，這四個值爲預測的tx,ty,tw,th。因此這個輸出feature的height和width與conv5-3的大小一致。這是用於定義pred-box的。code

6. blog

rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
 rpn_cls_prob: RPN層輸出的objectness的值
 rpn_bbox_pred: RPN層輸出的box的取值，即：tx,ty,tw,th

該方法首先根據rpn_bbox_pred來生成原始圖像中的anchor的預測座標，因爲rpn_bbox_pred是tx，ty，tw, th，這四個值的計算方法可看論文。對rpn_cls_prob進行排序，根據objectness分數的高低排序，而後選出須要保留的proposal的個數，論文中設置的爲6000。而後從這些proposal中使用nms算法，篩選出最後的proposal。返回這些proposal和scores。注：scores和proposal都是排序後的。

rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")
 rpn_cls_score: RPN層輸出的box的取值，即：tx, ty, tw, th

該方法首先把越過邊界的anchor都過濾掉，保留都在圖像範圍內的anchor。而後建立一個所有是-1的labels。接着計算每一個anchor和ground-truth的overlap，overlap返回一個二維數組，行數表明的是anchor個數，列數表明的ground-truth的個數。從中選擇max-overlap，若是max-overlap大於某個閾值，那麼這個anchor的label就設置爲包含目標，用1表示，若是max-overlap小於某個閾值，那麼這個anchor的label就設置爲0。而後再找到每一個ground-truth和anchor覆蓋最大的anchor的index，把這些anchor設置爲1。避免某個ground-truth沒有對應的anchor。對每一個anchor都設置是否含有目標後，利用anchors和每一個anchors對應的max-overlap的ground-truth來計算該anchor對應的tx*, ty*, tw*, th*。而後設置bbox_inside_weights，這個權值起到的做用是論文中公式(1)中的pi*。bbox_outside_weights該權值用來設置在全部樣本中，positive和negitive的權值。因爲上述全部操做都是在沒有越界的anchor中進行的，因此須要還原回到全部的anchors中。因而使用方法_unmap。
該方法最後返回:

rpn_labels:這是真實的每一個anchor是含有目標仍是沒有目標. 用於loss計算。

rpn_bbox_targets:這個是真實的每一個anchor與其覆蓋最大的ground-truth來計算獲得的tx, ty, tw, th。

rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")
rois: 爲第6步方法中生成的rois
roi_scores： 爲第6步中生成的roi的scores

該方法首先計算fg_rois_per_image也就是在一個batch中認爲是前景的roi的個數，剩餘的被認爲是背景。而後計算每一個rois和ground-truth的overlap，該overlap返回的數組形式爲[roi_size, gt_size]。從這些roi中選擇隨機選擇一些正樣本和負樣本，max-overlap大於某個閾值的roi被認爲是正樣本，找到正樣本後，創建label，label是每一個正樣本的類別標籤，因爲20類，因此就是某個數字。按照比率設定背景樣本，背景樣本的標籤爲0。該方法中調用了一個_sample_rois的方法，該方法返回值爲：labels，每一個roi的類別標籤，rois，是對原來全部rois進行正負樣本過濾，選擇出來的正樣本和負樣本。roi_scores, 對應選擇出來的正負樣本的objectness scores。bbox_targets，該返回值爲數組：target_nums = [num_rois, num_class*4]的數組，取其中的一行做爲例子，好比target_nums[0]，該vector的長度爲80，首先設置所有爲0，若是target_nums[0]的類別爲3，那麼設置target_nums[0, 3*4:(3*4+4)]的取值爲tx，ty，tw，th。這個方法返回的rois會接着送到後面的fast-rcnn網絡。該方法中計算出來的labels，boxs都做爲真實值，rois from feature conv5_3

9. loss

# RPN, class loss
# 計算RPN的objectness的loss，首先獲取rpn網絡輸出的logits，而後從anchor和ground-truth中計算獲得的label中選擇不爲-1的anchors。而後對這些anchor來計算cross-entropy
rpn_cls_score = tf.reshape(self._predictions[‘rpn_cls_score_reshape’], [-1,2])
rpn_label = tf.reshape(self._anchor_targets[‘rpn_labels’], [-1])
rpn_select = tf.where(tf.not_equal(rpn_label,-1))
rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score, rpn_select),[-1,2])
rpn_label = tf.reshape(tf.gather(rpn_label, rpn_select),[-1])
rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))
# RPN, bbox loss
# 採用smooth_l1_loss來計算bbox的loss。rpn_bbox_inside_weights用於把是object的box過濾出來，由於並非全部的anchors都是有object的。rpn_bbox_inside_weights用於設置標記爲1的box和標記爲0的box的權值比率。
rpn_bbox_pred = self._predictions[‘rpn_bbox_pred’]
rpn_bbox_targets = self._anchor_targets[‘rpn_bbox_targets’]
rpn_bbox_inside_weights = self._anchor_targets[‘rpn_bbox_inside_weights’]
rpn_bbox_outside_weights = self._anchor_targets[‘rpn_bbox_outside_weights’]

 1 rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1,2,3])
 2 
 3    # RCNN, class loss
 4    cls_score = self._predictions["cls_score"]
 5    label = tf.reshape(self._proposal_targets["labels"],[-1])
 6    cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=cls_score, labels=label))
 7 
 8    # RCNN, bbox loss
 9    bbox_pred = self._predictions['bbox_pred']
10    bbox_targets = self._proposal_targets['bbox_targets']
11    bbox_inside_weights = self._proposal_targets['bbox_inside_weights']
12    bbox_outside_weights = self._proposal_targets['bbox_outside_weights']
13 
14    loss_box = self._smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights)
15 
16    self._losses['cross_entropy'] = cross_entropy
17    self._losses['loss_box'] = loss_box
18    self._losses['rpn_cross_entropy'] = rpn_cross_entropy
19    self._losses['rpn_loss_box'] = rpn_loss_box
20 
21    loss = cross_entropy + loss_box + rpn_cross_entropy + rpn_loss_box
22    self._losses['total_loss'] = loss
23 
24    self._event_summaries.update(self._losses)
25 
26  return loss

10 RoI pooling

對每個RoI，將RoI的座標從原圖映射到feature map就是簡單除以原圖到feature的放縮尺度16，從而獲得feature map上的box座標，因爲box大小不一，因此要逆向考慮，在Pooling的過程當中須要計算Pooling後的結果對應到feature map上所佔的範圍，在這個範圍內作max pooling。

計算RoI在feature map上的寬高與pooled寬高的比值求得bin的大小[即pooling後featuremap上一個點與RoI上一個patch的映射關係，更具體的就是把feature map分割爲 pooled_w*pooled_h 這麼多份]，因爲roi的大小不一致，因此每次都須要計算一次bin的大小。最後在pooled上面循環遍歷channel，h，w這3個維度，將映射後的區域平動到RoI對應的位置[hstart, wstart, hend, wend]，統計該bin區域的最大值。

11. 輸入處理

通常resize到固定尺寸，若是要求任意尺寸的輸入那麼限制條件就是fc層，如何處理呢？方法一：採用spp-net，固定size的特徵金字塔；方法二：直接把fc換爲global average pooling

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。