『TensorFlow』SSD源碼學習_其三：錨框生成

時間 2019-11-08

原文原文鏈接

Fork版本項目地址：SSDpython

上一節中咱們定義了vgg_300的網絡結構，實際使用中還須要匹配SSD另外一關鍵組件：被選取特徵層的搜索網格。在項目中，vgg_300網絡和網格生成都被統一進一個class中，咱們從class SSDNet開始談起。ios

1、初始化class SSDNet

這是SSDNet的初始化部分，這一部分的內容在上一節都提到過了：網絡超參數定義 & 初始化vgg_300網絡結構並更新feat_shapesgit

【注1】：feat_shapes更新以前每一元素是二維元組(HW)，更新以後變成三維（HWC），不影響使用，實際使用時會採起[1:3]切片。github

【注2】：雖然給的參數是輸入300*300的圖片，實際測試中想要匹配後面的feat_shape，須要304*304的輸入才行網絡

SSDParams = namedtuple('SSDParameters', ['img_shape',
                                         'num_classes',
                                         'no_annotation_label',
                                         'feat_layers',
                                         'feat_shapes',
                                         'anchor_size_bounds',
                                         'anchor_sizes',
                                         'anchor_ratios',
                                         'anchor_steps',
                                         'anchor_offset',
                                         'normalizations',
                                         'prior_scaling'
                                         ])


class SSDNet(object):
    """Implementation of the SSD VGG-based 300 network.

    The default features layers with 300x300 image input are:
      conv4 ==> 38 x 38
      conv7 ==> 19 x 19
      conv8 ==> 10 x 10
      conv9 ==> 5 x 5
      conv10 ==> 3 x 3
      conv11 ==> 1 x 1
    The default image size used to train this network is 300x300.
    """
    default_params = SSDParams(
        img_shape=(300, 300),
        num_classes=21,
        no_annotation_label=21,
        feat_layers=['block4', 'block7', 'block8', 'block9', 'block10', 'block11'],
        feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)],
        anchor_size_bounds=[0.15, 0.90],
        # anchor_size_bounds=[0.20, 0.90],
        anchor_sizes=[(21., 45.),
                      (45., 99.),
                      (99., 153.),
                      (153., 207.),
                      (207., 261.),
                      (261., 315.)],
        anchor_ratios=[[2, .5],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5],
                       [2, .5]],
        anchor_steps=[8, 16, 32, 64, 100, 300],
        anchor_offset=0.5,
        normalizations=[1, -1, -1, -1, -1, -1],  # 控制SSD層處理時是否預先沿着HW正則化
        prior_scaling=[0.1, 0.1, 0.2, 0.2]
        )

    def __init__(self, params=None):
        """Init the SSD net with some parameters. Use the default ones
        if none provided.
        """
        if isinstance(params, SSDParams):
            self.params = params
        else:
            self.params = SSDNet.default_params

    # ======================================================================= #
    def net(self, inputs,
            is_training=True,
            update_feat_shapes=True,
            dropout_keep_prob=0.5,
            prediction_fn=slim.softmax,
            reuse=None,
            scope='ssd_300_vgg'):
        """SSD network definition.
        向前傳播網絡，而且根據實際狀況嘗試修改self.params.feat_shapes值
        """
        r = ssd_net(inputs,
                    num_classes=self.params.num_classes,
                    feat_layers=self.params.feat_layers,
                    anchor_sizes=self.params.anchor_sizes,
                    anchor_ratios=self.params.anchor_ratios,
                    normalizations=self.params.normalizations,
                    is_training=is_training,
                    dropout_keep_prob=dropout_keep_prob,
                    prediction_fn=prediction_fn,
                    reuse=reuse,
                    scope=scope)
        # Update feature shapes (try at least!)
        if update_feat_shapes:
            # r[0]：各選中層預測結果，predictions
            # feat_shapes：[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)]
            # 獲取各個中間層shape（不含0維），若是含有None則返回默認的feat_shapes
            shapes = ssd_feat_shapes_from_net(r[0], self.params.feat_shapes)
            self.params = self.params._replace(feat_shapes=shapes)
        return r

2、生成搜素網格Anchor Boxes

SSD網絡的另外一個關鍵點就是生成搜索網格(Anchor Boxes)，項目中的SSD 會在四、七、八、九、十、11 這六層生成搜索網格，數據以下，app

層數	卷積操做後特徵大小	網格加強比例	單個網格加強獲得網格數目	總網格數目
4	[38,38]	[2,0.5]	4	4 x 38 x 38
7	[19,19]	[2,0.5,3,1/3]	6	6 x 19 x 19
8	[10,10]	[2,0.5,3,1/3]	6	6 x 10 x 10
9	[5,5]	[2,0.5,3,1/3]	6	6 x 5 x 5
10	[3,3]	[2,0.5]	4	4 x 3 x 3
11	[1,1]	[2,0.5]	4	4 x 1 x 1

每一層網格生成邏輯以下：ide

生成所有網格中心點座標，存儲下來函數

生成一組網格的長寬，存儲下來測試

最終這一組長寬匹配全部的中心點，生成所有的網格，不過這一步不在網格生成函數中，僅是邏輯步驟this

網格長寬組數=加強比例+2，對應上面表格第三列的len+2等於第四列的值。咱們先忽略具體生成數學過程，先來看生成函數調用流程（按照調用棧給出）：

訓練腳本train_ssd_network.py創建網絡

# Get the SSD network and its anchors.
ssd_class = nets_factory.get_network(FLAGS.model_name)  # 'ssd_300_vgg'
ssd_params = ssd_class.default_params._replace(num_classes=FLAGS.num_classes)  # 替換類屬性
ssd_net = ssd_class(ssd_params)  # 建立類實例
ssd_shape = ssd_net.params.img_shape  # 獲取類屬性(300,300)
ssd_anchors = ssd_net.anchors(ssd_shape)  # 調用類方法，建立搜素框

類方法anchors

方法內部調用另外一個函數……感受很臃腫，不過多是爲了函數被其餘class複用，能夠理解

    def anchors(self, img_shape, dtype=np.float32):
        """Compute the default anchor boxes, given an image shape.
        """
        return ssd_anchors_all_layers(img_shape,  # (300,300)
                                      self.params.feat_shapes,
                                      self.params.anchor_sizes,
                                      self.params.anchor_ratios,
                                      self.params.anchor_steps,  # [8, 16, 32, 64, 100, 300]
                                      self.params.anchor_offset,  # 0.5
                                      dtype)

函數ssd_anchors_all_layers

爲所有指定的feat層生成搜索網絡

def ssd_anchors_all_layers(img_shape,
                           layers_shape,
                           anchor_sizes,
                           anchor_ratios,
                           anchor_steps,  # [8, 16, 32, 64, 100, 300]
                           offset=0.5,
                           dtype=np.float32):
    """Compute anchor boxes for all feature layers.
    """
    layers_anchors = []
    for i, s in enumerate(layers_shape):
        anchor_bboxes = ssd_anchor_one_layer(img_shape, s,
                                             anchor_sizes[i],
                                             anchor_ratios[i],
                                             anchor_steps[i],
                                             offset=offset, dtype=dtype)
        layers_anchors.append(anchor_bboxes)
    return layers_anchors

參數以下：

anchor_steps=[8, 16, 32, 64, 100, 300]
feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)] anchor_sizes=[(21., 45.), (45., 99.), (99., 153.), (153., 207.), (207., 261.), (261., 315.)]
anchor_ratios=[[2, .5], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5], [2, .5]]

函數ssd_anchor_one_layer

具體的單層feat網格生成邏輯

def ssd_anchor_one_layer(img_shape,
                         feat_shape,
                         sizes,
                         ratios,
                         step,
                         offset=0.5,
                         dtype=np.float32):
    """Computer SSD default anchor boxes for one feature layer.

    Determine the relative position grid of the centers, and the relative
    width and height.

    Arguments:
      feat_shape: Feature shape, used for computing relative position grids;
      size: Absolute reference sizes;
      ratios: Ratios to use on these features;
      img_shape: Image shape, used for computing height, width relatively to the
        former;
      offset: Grid offset.

    Return:
      y, x, h, w: Relative x and y grids, and height and width.
    """
    # Compute the position grid: simple way.
    # y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
    # y = (y.astype(dtype) + offset) / feat_shape[0]
    # x = (x.astype(dtype) + offset) / feat_shape[1]
    # Weird SSD-Caffe computation using steps values...
    # 生成feat_shape中HW對應的網格座標
    y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
    # step*feat_shape 約等於img_shape，這使得網格點座標介於0~1，放縮一下便可到圖像大小
    y = (y.astype(dtype) + offset) * step / img_shape[0]
    x = (x.astype(dtype) + offset) * step / img_shape[1]

    # Expand dims to support easy broadcasting.
    y = np.expand_dims(y, axis=-1)
    x = np.expand_dims(x, axis=-1)

    # Compute relative height and width.
    # Tries to follow the original implementation of SSD for the order.
    num_anchors = len(sizes) + len(ratios)
    h = np.zeros((num_anchors, ), dtype=dtype)
    w = np.zeros((num_anchors, ), dtype=dtype)
    # Add first anchor boxes with ratio=1.
    h[0] = sizes[0] / img_shape[0]
    w[0] = sizes[0] / img_shape[1]
    di = 1
    if len(sizes) > 1:
        h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]
        w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]
        di += 1
    for i, r in enumerate(ratios):
        h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r)
        w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r)
    return y, x, h, w

爲了理清邏輯，咱們在ssd_vgg_300.py最後添加下面測試代碼，

if __name__=='__main__':
    img = tf.placeholder(tf.float32, [1, 304, 304, 3])
    with slim.arg_scope(ssd_arg_scope()):
        ssd = SSDNet()
    r = ssd.net(img)
    ar = ssd_anchor_one_layer((300,300),(38,38),(21,45),(2,0.5),8)
    import matplotlib.pyplot as plt
    plt.scatter(ar[0], ar[1], c='r', marker='.')
    plt.grid(True)
    plt.show()

實際上繪製出了在block4上定位出的中心點座標，輸出圖以下：

ar[2]
Out[2]: array([ 0.07 , 0.10246951, 0.04949747, 0.09899495], dtype=float32)
ar[3]
Out[3]: array([ 0.07 , 0.10246951, 0.09899495, 0.04949747], dtype=float32)

能夠看到全部的中心點都分佈在[0,1]區間，而ar[2]、ar[3]是搜索框寬高。

回過頭來看函數體能夠清楚看出來，

中心點和寬高都是放縮了的，乘300後纔是對應於原圖像的位置和寬高，這樣層層下來，達到不一樣尺度不一樣位置的檢測
中心點公式：y = (y.astype(dtype) + offset) * step / img_shape[0]，實際上step*feat_shape約等於img_shape，
這使得網格點座標介於0~1，放縮一下便可到圖像大小，這也就是超參數anchor_steps的意義：用於輔助放縮搜
索網格中心點的位置
除了前兩組寬高的計算不依賴anchor_ratios，後面的寬高計算中：
- h須要乘anchor_ratios**0.5
- w須要除anchor_ratios**0.5

且，感覺野控制依賴兩個參數：

anchor_sizes=[(21., 45.), (45., 99.), (99., 153.), (153., 207.), (207., 261.), (261., 315.)]
anchor_ratios=[[2, .5], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5], [2, .5]]

至此，搜索網格生成完成，下一節，咱們將從目標識別任務的數據處理入手，進一步瞭解SSD乃至其餘目標檢測網絡的工做流程。