『計算機視覺』Mask-RCNN_推斷網絡其三：RPN錨框處理和Proposal生成

時間 2019-11-08

標籤計算機視覺 mask rcnn 推斷網絡其三 rpn 處理 proposal 生成欄目系統網絡简体版

原文原文鏈接

1、RPN錨框信息生成

上文的最後，咱們生成了用於計算錨框信息的特徵（源代碼在inference模式中不進行錨框生成，而是外部生成好feed進網絡，training模式下在向前傳播時直接生成錨框，不過實際上沒什麼區別，錨框生成的講解見『計算機視覺』Mask-RCNN_錨框生成）：html

　　　　rpn_feature_maps = [P2, P3, P4, P5, P6]

接下來，咱們基於上述特徵首先生成錨框的信息，包含每一個錨框的前景/背景得分信息及每一個錨框的座標修正信息。python

接前文主函數，咱們初始化rpn model class的對象，並應用於各層特徵：git

        # Anchors
        if mode == "training":
            ……
        else:
            anchors = input_anchors

        # RPN Model, 返回的是keras的Module對象, 注意keras中的Module對象是可call的
        rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,  # 1 3 256
                              len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE)
        # Loop through pyramid layers
        layer_outputs = []  # list of lists
        for p in rpn_feature_maps:
            layer_outputs.append(rpn([p]))  # 保存各pyramid特徵通過RPN以後的結果

具體的RPN模塊調用函數棧以下，算法

############################################################
#  Region Proposal Network (RPN)
############################################################

def rpn_graph(feature_map, anchors_per_location, anchor_stride):
    """Builds the computation graph of Region Proposal Network.

    feature_map: backbone features [batch, height, width, depth]
    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).

    Returns:
        rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax)
        rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities.
        rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to be
                  applied to anchors.
    """
    # TODO: check if stride of 2 causes alignment(校準,對齊) issues if the feature map
    # is not even.
    # Shared convolutional base of the RPN
    shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',
                       strides=anchor_stride,
                       name='rpn_conv_shared')(feature_map)

    # Anchor Score. [batch, height, width, anchors per location * 2].
    x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
                  activation='linear', name='rpn_class_raw')(shared)

    # Reshape to [batch, anchors, 2]
    rpn_class_logits = KL.Lambda(
        lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)
    # Output tensors to a Model must be Keras tensors, 因此下面不行
    # rpn_class_logits = tf.reshape(x, [tf.shape(x)[0], -1, 2])

    # Softmax on last dimension of BG/FG.
    rpn_probs = KL.Activation(
        "softmax", name="rpn_class_xxx")(rpn_class_logits)

    # Bounding box refinement. [batch, H, W, anchors per location * depth]
    # where depth is [x, y, log(w), log(h)]
    x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
                  activation='linear', name='rpn_bbox_pred')(shared)

    # Reshape to [batch, anchors, 4]
    rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x)

    return [rpn_class_logits, rpn_probs, rpn_bbox]


def build_rpn_model(anchor_stride, anchors_per_location, depth):
    """Builds a Keras model of the Region Proposal Network.
    It wraps the RPN graph so it can be used multiple times with shared
    weights.

    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).
    depth: Depth of the backbone feature map.

    Returns a Keras Model object. The model outputs, when called, are:
    rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax)
    rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities.
    rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to be
                applied to anchors.
    """
    input_feature_map = KL.Input(shape=[None, None, depth],
                                 name="input_rpn_feature_map")
    # [rpn_class_logits, rpn_probs, rpn_bbox] input_feature_map 3 1
    outputs = rpn_graph(input_feature_map, anchors_per_location, anchor_stride)
    return KM.Model([input_feature_map], outputs, name="rpn_model")

接前文主函數，咱們將獲取的list形式的各層錨框信息進行拼接重組：數組

        # Loop through pyramid layers
        layer_outputs = []  # list of lists
        for p in rpn_feature_maps:
            layer_outputs.append(rpn([p]))  # 保存各pyramid特徵通過RPN以後的結果
        # Concatenate layer outputs
        # Convert from list of lists of level outputs to list of lists
        # of outputs across levels.
        # e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]]
        output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
        outputs = list(zip(*layer_outputs))  # [[logits2,……6], [class2,……6], [bbox2,……6]]
        outputs = [KL.Concatenate(axis=1, name=n)(list(o))
                   for o, n in zip(outputs, output_names)]

        # [batch, num_anchors, 2/4]
        # 其中num_anchors指的是所有特徵層上的anchors總數
        rpn_class_logits, rpn_class, rpn_bbox = outputs

目的很簡單，原來的返回值爲[(logits2, class2, bbox2), (logits3, class3, bbox3), ……]，首先將之轉換爲[[logits2,……6], [class2,……6], [bbox2,……6]]，而後將每一個小list中的tensor按照第一維度（即anchors維度）拼接，獲得三個tensor，每一個tensor代表batch中圖片對應5個特徵層的所有anchors的分類迴歸信息，即：[batch, anchors, 2分類結果 or (dy, dx, log(dh), log(dw))]。網絡

2、Proposal建議區生成

上一步咱們獲取了所有錨框的信息，這裏咱們的目的是從中挑選指定個數的更可能包含obj的錨框做爲建議區域，即咱們但願獲取在上一步的二分類中前景得分更高的框，同時，因爲錨框生成算法的設計，其數量巨大且重疊嚴重，咱們在得分高低的基礎上，進一步的但願可以去重（非極大值抑制），這就是proposal生成的目的。app

接前文主函數，咱們用下面的代碼進入候選區生成過程，ide

        # Generate proposals
        # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
        # and zero padded.
        # POST_NMS_ROIS_INFERENCE = 1000
        # POST_NMS_ROIS_TRAINING = 2000
        proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\
            else config.POST_NMS_ROIS_INFERENCE
        # [IMAGES_PER_GPU, num_rois, (y1, x1, y2, x2)]
        # IMAGES_PER_GPU取代了batch，以後說的batch都是IMAGES_PER_GPU
        rpn_rois = ProposalLayer(
            proposal_count=proposal_count,
            nms_threshold=config.RPN_NMS_THRESHOLD,  # 0.7
            name="ROI",
            config=config)([rpn_class, rpn_bbox, anchors])

proposal_count是一個整數，用於指定生成proposal數目，不足時會生成座標爲[0,0,0,0]的空值進行補全。函數

一、初始化ProposalLayer class

下面咱們來看看ProposalLayer的過程，在初始部分咱們獲取[rpn_class, rpn_bbox, anchors]三個張量做爲參數，oop

class ProposalLayer(KE.Layer):
    """Receives anchor scores and selects a subset to pass as proposals
    to the second stage. Filtering is done based on anchor scores and
    non-max suppression to remove overlaps. It also applies bounding
    box refinement deltas to anchors.

    Inputs:
        rpn_probs: [batch, num_anchors, (bg prob, fg prob)]
        rpn_bbox: [batch, num_anchors, (dy, dx, log(dh), log(dw))]
        anchors: [batch, num_anchors, (y1, x1, y2, x2)] anchors in normalized coordinates

    Returns:
        Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
    """

    def __init__(self, proposal_count, nms_threshold, config=None, **kwargs):
        super(ProposalLayer, self).__init__(**kwargs)
        self.config = config
        self.proposal_count = proposal_count
        self.nms_threshold = nms_threshold

    def call(self, inputs):
        # [rpn_class, rpn_bbox, anchors]

        # Box Scores. Use the foreground class confidence. [batch, num_rois, 2]->[batch, num_rois]
        scores = inputs[0][:, :, 1]
        # Box deltas. 記錄座標修正信息：(dy, dx, log(dh), log(dw)). [batch, num_rois, 4]
        deltas = inputs[1]
        deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])  # [ 0.1  0.1  0.2  0.2]
        # Anchors. 記錄座標信息：(y1, x1, y2, x2). [batch, num_rois, 4]
        anchors = inputs[2]

這裏的變量scores = inputs[0][:, :, 1]，即咱們只須要所有候選框的前景得分。

二、top k錨框篩選

而後咱們獲取前景得分最大的n個候選框，

        # Improve performance by trimming to top anchors by score
        # and doing the rest on the smaller subset.
        pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1])
        # 輸入矩陣時輸出每一行的top k. [batch, top_k]
        ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
                         name="top_anchors").indices

提取top k錨框，咱們同時對三個輸入進行了提取

        # batch_slice函數：
        # #   將batch特徵拆分爲單張
        # #   而後提取指定的張數
        # #   使用單張特徵處理函數處理，併合並（此時返回的第一維不是輸入時的batch，而是上步指定的張數）
        scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
                                            self.config.IMAGES_PER_GPU,
                                            names=["pre_nms_anchors"])

附錄.輔助函數batch_slice

其中使用了一個後面也會大量使用的函數：batch_slice，我嘗試使用tf的while_loop進行了改寫。

這個函數將只支持batch爲1的函數進行了擴展（實際就是不能有batch維度的函數），tf.gather函數只能進行一維數組的切片，而scares爲2維[batch, num_rois]，相對的ix也是二維[batch, top_k]，因此咱們須要將二者切片應用函數後將結果拼接。

【注】本函數位於util.py而非model.py

# ## Batch Slicing
# Some custom layers support a batch size of 1 only, and require a lot of work
# to support batches greater than 1. This function slices an input tensor
# across the batch dimension and feeds batches of size 1. Effectively,
# an easy way to support batches > 1 quickly with little code modification.
# In the long run, it's more efficient to modify the code to support large
# batches and getting rid of this function. Consider this a temporary solution
def batch_slice(inputs, graph_fn, batch_size, names=None):
    """Splits inputs into slices and feeds each slice to a copy of the given
    computation graph and then combines the results. It allows you to run a
    graph on a batch of inputs even if the graph is written to support one
    instance only.

    inputs: list of tensors. All must have the same first dimension length
    graph_fn: A function that returns a TF tensor that's part of a graph.
    batch_size: number of slices to divide the data into.
    names: If provided, assigns names to the resulting tensors.
    """
    if not isinstance(inputs, list):
        inputs = [inputs]

    outputs = []
    for i in range(batch_size):
        inputs_slice = [x[i] for x in inputs]
        output_slice = graph_fn(*inputs_slice)
        if not isinstance(output_slice, (tuple, list)):
            output_slice = [output_slice]
        outputs.append(output_slice)

    # 使用tf.while_loop實現循環體代碼以下：
    # import tensorflow as tf
    # i = 0
    # outputs = []
    #
    # def cond(index):
    #     return index < batch_size  # 返回bool值
    #
    # def body(index):
    #     index += 1
    #     inputs_slice = [x[i] for x in inputs]
    #     output_slice = graph_fn(*inputs_slice)
    #     if not isinstance(output_slice, (tuple, list)):
    #         output_slice = [output_slice]
    #     outputs.append(output_slice)
    #     return index  # 返回cond須要的判斷參數進行下一次判斷
    #
    # tf.while_loop(cond, body, [i])

    # Change outputs from a list of slices where each is
    # a list of outputs to a list of outputs and each has
    # a list of slices
    # 下面示意中假設每次graph_fn返回兩個tensor
    # [[tensor11, tensor12], [tensor21, tensor22], ……]
    # ——> [(tensor11, tensor21, ……), (tensor12, tensor22, ……)]  zip返回的是多個tuple
    outputs = list(zip(*outputs))

    if names is None:
        names = [None] * len(outputs)

    # 通常來說就是batch維度合併回去（上面的for循環實際是將batch拆分了）
    result = [tf.stack(o, axis=0, name=n)
              for o, n in zip(outputs, names)]
    if len(result) == 1:
        result = result[0]

    return result

三、錨框座標初調

咱們在RPN中獲取了所有錨框的座標迴歸結果，rpn_bbox：[batch, anchors, (dy, dx, log(dh), log(dw))]，2小節中咱們將top k錨框的座標信息以及top k的迴歸信息提取了出來，如今咱們將之合併（使用RPN迴歸的結果取修正top k錨框的座標），

        # Apply deltas to anchors to get refined anchors.
        # [IMAGES_PER_GPU, top_k, (y1, x1, y2, x2)]
        boxes = utils.batch_slice([pre_nms_anchors, deltas],
                                  lambda x, y: apply_box_deltas_graph(x, y),
                                  self.config.IMAGES_PER_GPU,
                                  names=["refined_anchors"])

函數以下，

def apply_box_deltas_graph(boxes, deltas):
    """Applies the given deltas to the given boxes.
    boxes: [N, (y1, x1, y2, x2)] boxes to update
    deltas: [N, (dy, dx, log(dh), log(dw))] refinements to apply
    """
    # dy = (y_n - y_o)/h_o
    # dx = (x_n - x_o)/w_o
    # dh = h_n/h_o
    # dw = w_n/w_o

    # Convert to y, x, h, w
    height = boxes[:, 2] - boxes[:, 0]
    width = boxes[:, 3] - boxes[:, 1]
    center_y = boxes[:, 0] + 0.5 * height
    center_x = boxes[:, 1] + 0.5 * width
    # Apply deltas
    center_y += deltas[:, 0] * height
    center_x += deltas[:, 1] * width
    height *= tf.exp(deltas[:, 2])
    width *= tf.exp(deltas[:, 3])
    # Convert back to y1, x1, y2, x2
    y1 = center_y - 0.5 * height
    x1 = center_x - 0.5 * width
    y2 = y1 + height
    x2 = x1 + width
    result = tf.stack([y1, x1, y2, x2], axis=1, name="apply_box_deltas_out")
    return result

自此咱們在代碼層面認識到了迴歸結果4個座標值的真正含義：

dy = (y_n - y_o)/h_o

dx = (x_n - x_o)/w_o

dh = h_n/h_o #

dw = w_n/w_o

注意，咱們的錨框座標其實是位於一個歸一化了的圖上（SSD也是如此且有過介紹，見『TensorFlow』SSD源碼學習_其三：錨框生成，即全部錨框位於一個長寬爲1的虛擬畫布上），上一步的修正進行以後再也不可以保證這一點，因此咱們須要切除錨框越界的的部分（即只保留錨框和[0,0,1,1]畫布的交集）。

        # Clip to image boundaries. Since we're in normalized coordinates,
        # clip to 0..1 range. [IMAGES_PER_GPU, top_k, (y1, x1, y2, x2)]
        window = np.array([0, 0, 1, 1], dtype=np.float32)
        boxes = utils.batch_slice(boxes,  # boxes來源自anchors, 修正deltas的影響
                                  lambda x: clip_boxes_graph(x, window),
                                  self.config.IMAGES_PER_GPU,
                                  names=["refined_anchors_clipped"])

保留交集函數以下，

def clip_boxes_graph(boxes, window):
    """
    boxes: [N, (y1, x1, y2, x2)]
    window: [4] in the form y1, x1, y2, x2
    """
    # Split
    wy1, wx1, wy2, wx2 = tf.split(window, 4)
    y1, x1, y2, x2 = tf.split(boxes, 4, axis=1)
    # Clip
    y1 = tf.maximum(tf.minimum(y1, wy2), wy1)
    x1 = tf.maximum(tf.minimum(x1, wx2), wx1)
    y2 = tf.maximum(tf.minimum(y2, wy2), wy1)
    x2 = tf.maximum(tf.minimum(x2, wx2), wx1)
    clipped = tf.concat([y1, x1, y2, x2], axis=1, name="clipped_boxes")
    clipped.set_shape((clipped.shape[0], 4))
    return clipped

四、非極大值抑制

最後進行非極大值抑制，確保不會出現過於重複的推薦區域，

        # Filter out small boxes
        # According to Xinlei Chen's paper, this reduces detection accuracy
        # for small objects, so we're skipping it.

        # Non-max suppression
        def nms(boxes, scores):
            """
            非極大值抑制子函數
            :param boxes: [top_k, (y1, x1, y2, x2)]
            :param scores: [top_k]
            :return: 
            """
            indices = tf.image.non_max_suppression(
                boxes, scores, self.proposal_count,  # 參數三爲最大返回數目
                self.nms_threshold, name="rpn_non_max_suppression")
            proposals = tf.gather(boxes, indices)
            # Pad if needed, 一旦返回數目不足, 填充(0,0,0,0)直到數目達標
            padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)
            # 在後面添加全0行
            proposals = tf.pad(proposals, [(0, padding), (0, 0)])
            return proposals
        proposals = utils.batch_slice([boxes, scores], nms,
                                      self.config.IMAGES_PER_GPU)
        return proposals  # [IMAGES_PER_GPU, proposal_count, (y1, x1, y2, x2)]

沒錯，TensorFlow以經封裝好了：tf.image.non_max_suppression

至此，咱們獲取了所有的推薦區域。