上文的最後,咱們生成了用於計算錨框信息的特徵(源代碼在inference模式中不進行錨框生成,而是外部生成好feed進網絡,training模式下在向前傳播時直接生成錨框,不過實際上沒什麼區別,錨框生成的講解見『計算機視覺』Mask-RCNN_錨框生成):html
rpn_feature_maps = [P2, P3, P4, P5, P6]
接下來,咱們基於上述特徵首先生成錨框的信息,包含每一個錨框的前景/背景得分信息及每一個錨框的座標修正信息。python
接前文主函數,咱們初始化rpn model class的對象,並應用於各層特徵:git
# Anchors if mode == "training": …… else: anchors = input_anchors # RPN Model, 返回的是keras的Module對象, 注意keras中的Module對象是可call的 rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE, # 1 3 256 len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE) # Loop through pyramid layers layer_outputs = [] # list of lists for p in rpn_feature_maps: layer_outputs.append(rpn([p])) # 保存各pyramid特徵通過RPN以後的結果
具體的RPN模塊調用函數棧以下,算法
############################################################ # Region Proposal Network (RPN) ############################################################ def rpn_graph(feature_map, anchors_per_location, anchor_stride): """Builds the computation graph of Region Proposal Network. feature_map: backbone features [batch, height, width, depth] anchors_per_location: number of anchors per pixel in the feature map anchor_stride: Controls the density of anchors. Typically 1 (anchors for every pixel in the feature map), or 2 (every other pixel). Returns: rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax) rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities. rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to be applied to anchors. """ # TODO: check if stride of 2 causes alignment(校準,對齊) issues if the feature map # is not even. # Shared convolutional base of the RPN shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu', strides=anchor_stride, name='rpn_conv_shared')(feature_map) # Anchor Score. [batch, height, width, anchors per location * 2]. x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid', activation='linear', name='rpn_class_raw')(shared) # Reshape to [batch, anchors, 2] rpn_class_logits = KL.Lambda( lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x) # Output tensors to a Model must be Keras tensors, 因此下面不行 # rpn_class_logits = tf.reshape(x, [tf.shape(x)[0], -1, 2]) # Softmax on last dimension of BG/FG. rpn_probs = KL.Activation( "softmax", name="rpn_class_xxx")(rpn_class_logits) # Bounding box refinement. [batch, H, W, anchors per location * depth] # where depth is [x, y, log(w), log(h)] x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid", activation='linear', name='rpn_bbox_pred')(shared) # Reshape to [batch, anchors, 4] rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x) return [rpn_class_logits, rpn_probs, rpn_bbox] def build_rpn_model(anchor_stride, anchors_per_location, depth): """Builds a Keras model of the Region Proposal Network. It wraps the RPN graph so it can be used multiple times with shared weights. anchors_per_location: number of anchors per pixel in the feature map anchor_stride: Controls the density of anchors. Typically 1 (anchors for every pixel in the feature map), or 2 (every other pixel). depth: Depth of the backbone feature map. Returns a Keras Model object. The model outputs, when called, are: rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax) rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities. rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to be applied to anchors. """ input_feature_map = KL.Input(shape=[None, None, depth], name="input_rpn_feature_map") # [rpn_class_logits, rpn_probs, rpn_bbox] input_feature_map 3 1 outputs = rpn_graph(input_feature_map, anchors_per_location, anchor_stride) return KM.Model([input_feature_map], outputs, name="rpn_model")
接前文主函數,咱們將獲取的list形式的各層錨框信息進行拼接重組:數組
# Loop through pyramid layers layer_outputs = [] # list of lists for p in rpn_feature_maps: layer_outputs.append(rpn([p])) # 保存各pyramid特徵通過RPN以後的結果 # Concatenate layer outputs # Convert from list of lists of level outputs to list of lists # of outputs across levels. # e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]] output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"] outputs = list(zip(*layer_outputs)) # [[logits2,……6], [class2,……6], [bbox2,……6]] outputs = [KL.Concatenate(axis=1, name=n)(list(o)) for o, n in zip(outputs, output_names)] # [batch, num_anchors, 2/4] # 其中num_anchors指的是所有特徵層上的anchors總數 rpn_class_logits, rpn_class, rpn_bbox = outputs
目的很簡單,原來的返回值爲[(logits2, class2, bbox2), (logits3, class3, bbox3), ……],首先將之轉換爲[[logits2,……6], [class2,……6], [bbox2,……6]],而後將每一個小list中的tensor按照第一維度(即anchors維度)拼接,獲得三個tensor,每一個tensor代表batch中圖片對應5個特徵層的所有anchors的分類迴歸信息,即:[batch, anchors, 2分類結果 or (dy, dx, log(dh), log(dw))]。網絡
上一步咱們獲取了所有錨框的信息,這裏咱們的目的是從中挑選指定個數的更可能包含obj的錨框做爲建議區域,即咱們但願獲取在上一步的二分類中前景得分更高的框,同時,因爲錨框生成算法的設計,其數量巨大且重疊嚴重,咱們在得分高低的基礎上,進一步的但願可以去重(非極大值抑制),這就是proposal生成的目的。app
接前文主函數,咱們用下面的代碼進入候選區生成過程,ide
# Generate proposals # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates # and zero padded. # POST_NMS_ROIS_INFERENCE = 1000 # POST_NMS_ROIS_TRAINING = 2000 proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\ else config.POST_NMS_ROIS_INFERENCE # [IMAGES_PER_GPU, num_rois, (y1, x1, y2, x2)] # IMAGES_PER_GPU取代了batch,以後說的batch都是IMAGES_PER_GPU rpn_rois = ProposalLayer( proposal_count=proposal_count, nms_threshold=config.RPN_NMS_THRESHOLD, # 0.7 name="ROI", config=config)([rpn_class, rpn_bbox, anchors])
proposal_count是一個整數,用於指定生成proposal數目,不足時會生成座標爲[0,0,0,0]的空值進行補全。函數
下面咱們來看看ProposalLayer的過程,在初始部分咱們獲取[rpn_class, rpn_bbox, anchors]三個張量做爲參數,oop
class ProposalLayer(KE.Layer): """Receives anchor scores and selects a subset to pass as proposals to the second stage. Filtering is done based on anchor scores and non-max suppression to remove overlaps. It also applies bounding box refinement deltas to anchors. Inputs: rpn_probs: [batch, num_anchors, (bg prob, fg prob)] rpn_bbox: [batch, num_anchors, (dy, dx, log(dh), log(dw))] anchors: [batch, num_anchors, (y1, x1, y2, x2)] anchors in normalized coordinates Returns: Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)] """ def __init__(self, proposal_count, nms_threshold, config=None, **kwargs): super(ProposalLayer, self).__init__(**kwargs) self.config = config self.proposal_count = proposal_count self.nms_threshold = nms_threshold def call(self, inputs): # [rpn_class, rpn_bbox, anchors] # Box Scores. Use the foreground class confidence. [batch, num_rois, 2]->[batch, num_rois] scores = inputs[0][:, :, 1] # Box deltas. 記錄座標修正信息:(dy, dx, log(dh), log(dw)). [batch, num_rois, 4] deltas = inputs[1] deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4]) # [ 0.1 0.1 0.2 0.2] # Anchors. 記錄座標信息:(y1, x1, y2, x2). [batch, num_rois, 4] anchors = inputs[2]
這裏的變量scores = inputs[0][:, :, 1],即咱們只須要所有候選框的前景得分。
而後咱們獲取前景得分最大的n個候選框,
# Improve performance by trimming to top anchors by score # and doing the rest on the smaller subset. pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1]) # 輸入矩陣時輸出每一行的top k. [batch, top_k] ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True, name="top_anchors").indices
提取top k錨框,咱們同時對三個輸入進行了提取
# batch_slice函數: # # 將batch特徵拆分爲單張 # # 而後提取指定的張數 # # 使用單張特徵處理函數處理,併合並(此時返回的第一維不是輸入時的batch,而是上步指定的張數) scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y), self.config.IMAGES_PER_GPU) deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y), self.config.IMAGES_PER_GPU) pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x), self.config.IMAGES_PER_GPU, names=["pre_nms_anchors"])
其中使用了一個後面也會大量使用的函數:batch_slice,我嘗試使用tf的while_loop進行了改寫。
這個函數將只支持batch爲1的函數進行了擴展(實際就是不能有batch維度的函數),tf.gather函數只能進行一維數組的切片,而scares爲2維[batch, num_rois],相對的ix也是二維[batch, top_k],因此咱們須要將二者切片應用函數後將結果拼接。
【注】本函數位於util.py而非model.py
# ## Batch Slicing # Some custom layers support a batch size of 1 only, and require a lot of work # to support batches greater than 1. This function slices an input tensor # across the batch dimension and feeds batches of size 1. Effectively, # an easy way to support batches > 1 quickly with little code modification. # In the long run, it's more efficient to modify the code to support large # batches and getting rid of this function. Consider this a temporary solution def batch_slice(inputs, graph_fn, batch_size, names=None): """Splits inputs into slices and feeds each slice to a copy of the given computation graph and then combines the results. It allows you to run a graph on a batch of inputs even if the graph is written to support one instance only. inputs: list of tensors. All must have the same first dimension length graph_fn: A function that returns a TF tensor that's part of a graph. batch_size: number of slices to divide the data into. names: If provided, assigns names to the resulting tensors. """ if not isinstance(inputs, list): inputs = [inputs] outputs = [] for i in range(batch_size): inputs_slice = [x[i] for x in inputs] output_slice = graph_fn(*inputs_slice) if not isinstance(output_slice, (tuple, list)): output_slice = [output_slice] outputs.append(output_slice) # 使用tf.while_loop實現循環體代碼以下: # import tensorflow as tf # i = 0 # outputs = [] # # def cond(index): # return index < batch_size # 返回bool值 # # def body(index): # index += 1 # inputs_slice = [x[i] for x in inputs] # output_slice = graph_fn(*inputs_slice) # if not isinstance(output_slice, (tuple, list)): # output_slice = [output_slice] # outputs.append(output_slice) # return index # 返回cond須要的判斷參數進行下一次判斷 # # tf.while_loop(cond, body, [i]) # Change outputs from a list of slices where each is # a list of outputs to a list of outputs and each has # a list of slices # 下面示意中假設每次graph_fn返回兩個tensor # [[tensor11, tensor12], [tensor21, tensor22], ……] # ——> [(tensor11, tensor21, ……), (tensor12, tensor22, ……)] zip返回的是多個tuple outputs = list(zip(*outputs)) if names is None: names = [None] * len(outputs) # 通常來說就是batch維度合併回去(上面的for循環實際是將batch拆分了) result = [tf.stack(o, axis=0, name=n) for o, n in zip(outputs, names)] if len(result) == 1: result = result[0] return result
咱們在RPN中獲取了所有錨框的座標迴歸結果,rpn_bbox:[batch, anchors, (dy, dx, log(dh), log(dw))],2小節中咱們將top k錨框的座標信息以及top k的迴歸信息提取了出來,如今咱們將之合併(使用RPN迴歸的結果取修正top k錨框的座標),
# Apply deltas to anchors to get refined anchors. # [IMAGES_PER_GPU, top_k, (y1, x1, y2, x2)] boxes = utils.batch_slice([pre_nms_anchors, deltas], lambda x, y: apply_box_deltas_graph(x, y), self.config.IMAGES_PER_GPU, names=["refined_anchors"])
函數以下,
def apply_box_deltas_graph(boxes, deltas): """Applies the given deltas to the given boxes. boxes: [N, (y1, x1, y2, x2)] boxes to update deltas: [N, (dy, dx, log(dh), log(dw))] refinements to apply """ # dy = (y_n - y_o)/h_o # dx = (x_n - x_o)/w_o # dh = h_n/h_o # dw = w_n/w_o # Convert to y, x, h, w height = boxes[:, 2] - boxes[:, 0] width = boxes[:, 3] - boxes[:, 1] center_y = boxes[:, 0] + 0.5 * height center_x = boxes[:, 1] + 0.5 * width # Apply deltas center_y += deltas[:, 0] * height center_x += deltas[:, 1] * width height *= tf.exp(deltas[:, 2]) width *= tf.exp(deltas[:, 3]) # Convert back to y1, x1, y2, x2 y1 = center_y - 0.5 * height x1 = center_x - 0.5 * width y2 = y1 + height x2 = x1 + width result = tf.stack([y1, x1, y2, x2], axis=1, name="apply_box_deltas_out") return result
自此咱們在代碼層面認識到了迴歸結果4個座標值的真正含義:
dy = (y_n - y_o)/h_o
dx = (x_n - x_o)/w_o
dh = h_n/h_o #
dw = w_n/w_o
注意,咱們的錨框座標其實是位於一個歸一化了的圖上(SSD也是如此且有過介紹,見『TensorFlow』SSD源碼學習_其三:錨框生成,即全部錨框位於一個長寬爲1的虛擬畫布上),上一步的修正進行以後再也不可以保證這一點,因此咱們須要切除錨框越界的的部分(即只保留錨框和[0,0,1,1]畫布的交集)。
# Clip to image boundaries. Since we're in normalized coordinates, # clip to 0..1 range. [IMAGES_PER_GPU, top_k, (y1, x1, y2, x2)] window = np.array([0, 0, 1, 1], dtype=np.float32) boxes = utils.batch_slice(boxes, # boxes來源自anchors, 修正deltas的影響 lambda x: clip_boxes_graph(x, window), self.config.IMAGES_PER_GPU, names=["refined_anchors_clipped"])
保留交集函數以下,
def clip_boxes_graph(boxes, window): """ boxes: [N, (y1, x1, y2, x2)] window: [4] in the form y1, x1, y2, x2 """ # Split wy1, wx1, wy2, wx2 = tf.split(window, 4) y1, x1, y2, x2 = tf.split(boxes, 4, axis=1) # Clip y1 = tf.maximum(tf.minimum(y1, wy2), wy1) x1 = tf.maximum(tf.minimum(x1, wx2), wx1) y2 = tf.maximum(tf.minimum(y2, wy2), wy1) x2 = tf.maximum(tf.minimum(x2, wx2), wx1) clipped = tf.concat([y1, x1, y2, x2], axis=1, name="clipped_boxes") clipped.set_shape((clipped.shape[0], 4)) return clipped
最後進行非極大值抑制,確保不會出現過於重複的推薦區域,
# Filter out small boxes # According to Xinlei Chen's paper, this reduces detection accuracy # for small objects, so we're skipping it. # Non-max suppression def nms(boxes, scores): """ 非極大值抑制子函數 :param boxes: [top_k, (y1, x1, y2, x2)] :param scores: [top_k] :return: """ indices = tf.image.non_max_suppression( boxes, scores, self.proposal_count, # 參數三爲最大返回數目 self.nms_threshold, name="rpn_non_max_suppression") proposals = tf.gather(boxes, indices) # Pad if needed, 一旦返回數目不足, 填充(0,0,0,0)直到數目達標 padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0) # 在後面添加全0行 proposals = tf.pad(proposals, [(0, padding), (0, 0)]) return proposals proposals = utils.batch_slice([boxes, scores], nms, self.config.IMAGES_PER_GPU) return proposals # [IMAGES_PER_GPU, proposal_count, (y1, x1, y2, x2)]
沒錯,TensorFlow以經封裝好了:tf.image.non_max_suppression
至此,咱們獲取了所有的推薦區域。