Fork版本項目地址:SSDpython
上一節中咱們定義了vgg_300的網絡結構,實際使用中還須要匹配SSD另外一關鍵組件:被選取特徵層的搜索網格。在項目中,vgg_300網絡和網格生成都被統一進一個class中,咱們從class SSDNet開始談起。ios
這是SSDNet的初始化部分,這一部分的內容在上一節都提到過了:網絡超參數定義 & 初始化vgg_300網絡結構並更新feat_shapesgit
【注1】:feat_shapes更新以前每一元素是二維元組(HW),更新以後變成三維(HWC),不影響使用,實際使用時會採起[1:3]切片。github
【注2】:雖然給的參數是輸入300*300的圖片,實際測試中想要匹配後面的feat_shape,須要304*304的輸入才行網絡
SSDParams = namedtuple('SSDParameters', ['img_shape', 'num_classes', 'no_annotation_label', 'feat_layers', 'feat_shapes', 'anchor_size_bounds', 'anchor_sizes', 'anchor_ratios', 'anchor_steps', 'anchor_offset', 'normalizations', 'prior_scaling' ]) class SSDNet(object): """Implementation of the SSD VGG-based 300 network. The default features layers with 300x300 image input are: conv4 ==> 38 x 38 conv7 ==> 19 x 19 conv8 ==> 10 x 10 conv9 ==> 5 x 5 conv10 ==> 3 x 3 conv11 ==> 1 x 1 The default image size used to train this network is 300x300. """ default_params = SSDParams( img_shape=(300, 300), num_classes=21, no_annotation_label=21, feat_layers=['block4', 'block7', 'block8', 'block9', 'block10', 'block11'], feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)], anchor_size_bounds=[0.15, 0.90], # anchor_size_bounds=[0.20, 0.90], anchor_sizes=[(21., 45.), (45., 99.), (99., 153.), (153., 207.), (207., 261.), (261., 315.)], anchor_ratios=[[2, .5], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5], [2, .5]], anchor_steps=[8, 16, 32, 64, 100, 300], anchor_offset=0.5, normalizations=[1, -1, -1, -1, -1, -1], # 控制SSD層處理時是否預先沿着HW正則化 prior_scaling=[0.1, 0.1, 0.2, 0.2] ) def __init__(self, params=None): """Init the SSD net with some parameters. Use the default ones if none provided. """ if isinstance(params, SSDParams): self.params = params else: self.params = SSDNet.default_params # ======================================================================= # def net(self, inputs, is_training=True, update_feat_shapes=True, dropout_keep_prob=0.5, prediction_fn=slim.softmax, reuse=None, scope='ssd_300_vgg'): """SSD network definition. 向前傳播網絡,而且根據實際狀況嘗試修改self.params.feat_shapes值 """ r = ssd_net(inputs, num_classes=self.params.num_classes, feat_layers=self.params.feat_layers, anchor_sizes=self.params.anchor_sizes, anchor_ratios=self.params.anchor_ratios, normalizations=self.params.normalizations, is_training=is_training, dropout_keep_prob=dropout_keep_prob, prediction_fn=prediction_fn, reuse=reuse, scope=scope) # Update feature shapes (try at least!) if update_feat_shapes: # r[0]:各選中層預測結果,predictions # feat_shapes:[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)] # 獲取各個中間層shape(不含0維),若是含有None則返回默認的feat_shapes shapes = ssd_feat_shapes_from_net(r[0], self.params.feat_shapes) self.params = self.params._replace(feat_shapes=shapes) return r
SSD網絡的另外一個關鍵點就是生成搜索網格(Anchor Boxes),項目中的SSD 會在 四、七、八、九、十、11 這六層生成搜索網格,數據以下,app
層數 | 卷積操做後特徵大小 | 網格加強比例 | 單個網格加強獲得網格數目 | 總網格數目 |
---|---|---|---|---|
4 | [38,38] | [2,0.5] | 4 | 4 x 38 x 38 |
7 | [19,19] | [2,0.5,3,1/3] | 6 | 6 x 19 x 19 |
8 | [10,10] | [2,0.5,3,1/3] | 6 | 6 x 10 x 10 |
9 | [5,5] | [2,0.5,3,1/3] | 6 | 6 x 5 x 5 |
10 | [3,3] | [2,0.5] | 4 | 4 x 3 x 3 |
11 | [1,1] | [2,0.5] | 4 | 4 x 1 x 1 |
每一層網格生成邏輯以下:ide
生成所有網格中心點座標,存儲下來函數
生成一組網格的長寬,存儲下來測試
最終這一組長寬匹配全部的中心點,生成所有的網格,不過這一步不在網格生成函數中,僅是邏輯步驟this
網格長寬組數=加強比例+2,對應上面表格第三列的len+2等於第四列的值。咱們先忽略具體生成數學過程,先來看生成函數調用流程(按照調用棧給出):
# Get the SSD network and its anchors. ssd_class = nets_factory.get_network(FLAGS.model_name) # 'ssd_300_vgg' ssd_params = ssd_class.default_params._replace(num_classes=FLAGS.num_classes) # 替換類屬性 ssd_net = ssd_class(ssd_params) # 建立類實例 ssd_shape = ssd_net.params.img_shape # 獲取類屬性(300,300) ssd_anchors = ssd_net.anchors(ssd_shape) # 調用類方法,建立搜素框
方法內部調用另外一個函數……感受很臃腫,不過多是爲了函數被其餘class複用,能夠理解
def anchors(self, img_shape, dtype=np.float32): """Compute the default anchor boxes, given an image shape. """ return ssd_anchors_all_layers(img_shape, # (300,300) self.params.feat_shapes, self.params.anchor_sizes, self.params.anchor_ratios, self.params.anchor_steps, # [8, 16, 32, 64, 100, 300] self.params.anchor_offset, # 0.5 dtype)
爲所有指定的feat層生成搜索網絡
def ssd_anchors_all_layers(img_shape, layers_shape, anchor_sizes, anchor_ratios, anchor_steps, # [8, 16, 32, 64, 100, 300] offset=0.5, dtype=np.float32): """Compute anchor boxes for all feature layers. """ layers_anchors = [] for i, s in enumerate(layers_shape): anchor_bboxes = ssd_anchor_one_layer(img_shape, s, anchor_sizes[i], anchor_ratios[i], anchor_steps[i], offset=offset, dtype=dtype) layers_anchors.append(anchor_bboxes) return layers_anchors
參數以下:
anchor_steps=[8, 16, 32, 64, 100, 300]
feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)] anchor_sizes=[(21., 45.), (45., 99.), (99., 153.), (153., 207.), (207., 261.), (261., 315.)]
anchor_ratios=[[2, .5], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5], [2, .5]]
具體的單層feat網格生成邏輯
def ssd_anchor_one_layer(img_shape, feat_shape, sizes, ratios, step, offset=0.5, dtype=np.float32): """Computer SSD default anchor boxes for one feature layer. Determine the relative position grid of the centers, and the relative width and height. Arguments: feat_shape: Feature shape, used for computing relative position grids; size: Absolute reference sizes; ratios: Ratios to use on these features; img_shape: Image shape, used for computing height, width relatively to the former; offset: Grid offset. Return: y, x, h, w: Relative x and y grids, and height and width. """ # Compute the position grid: simple way. # y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]] # y = (y.astype(dtype) + offset) / feat_shape[0] # x = (x.astype(dtype) + offset) / feat_shape[1] # Weird SSD-Caffe computation using steps values... # 生成feat_shape中HW對應的網格座標 y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]] # step*feat_shape 約等於img_shape,這使得網格點座標介於0~1,放縮一下便可到圖像大小 y = (y.astype(dtype) + offset) * step / img_shape[0] x = (x.astype(dtype) + offset) * step / img_shape[1] # Expand dims to support easy broadcasting. y = np.expand_dims(y, axis=-1) x = np.expand_dims(x, axis=-1) # Compute relative height and width. # Tries to follow the original implementation of SSD for the order. num_anchors = len(sizes) + len(ratios) h = np.zeros((num_anchors, ), dtype=dtype) w = np.zeros((num_anchors, ), dtype=dtype) # Add first anchor boxes with ratio=1. h[0] = sizes[0] / img_shape[0] w[0] = sizes[0] / img_shape[1] di = 1 if len(sizes) > 1: h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0] w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1] di += 1 for i, r in enumerate(ratios): h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r) w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r) return y, x, h, w
爲了理清邏輯,咱們在ssd_vgg_300.py最後添加下面測試代碼,
if __name__=='__main__': img = tf.placeholder(tf.float32, [1, 304, 304, 3]) with slim.arg_scope(ssd_arg_scope()): ssd = SSDNet() r = ssd.net(img) ar = ssd_anchor_one_layer((300,300),(38,38),(21,45),(2,0.5),8) import matplotlib.pyplot as plt plt.scatter(ar[0], ar[1], c='r', marker='.') plt.grid(True) plt.show()
實際上繪製出了在block4上定位出的中心點座標,輸出圖以下:
ar[2]
Out[2]: array([ 0.07 , 0.10246951, 0.04949747, 0.09899495], dtype=float32)
ar[3]
Out[3]: array([ 0.07 , 0.10246951, 0.09899495, 0.04949747], dtype=float32)
能夠看到全部的中心點都分佈在[0,1]區間,而ar[2]、ar[3]是搜索框寬高。
回過頭來看函數體能夠清楚看出來,
中心點公式:y = (y.astype(dtype) + offset) * step / img_shape[0],實際上step*feat_shape約等於img_shape,
這使得網格點座標介於0~1,放縮一下便可到圖像大小,這也就是超參數anchor_steps的意義:用於輔助放縮搜
索網格中心點的位置
且,感覺野控制依賴兩個參數:
anchor_sizes=[(21., 45.), (45., 99.), (99., 153.), (153., 207.), (207., 261.), (261., 315.)]
anchor_ratios=[[2, .5], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5], [2, .5]]
至此,搜索網格生成完成,下一節,咱們將從目標識別任務的數據處理入手,進一步瞭解SSD乃至其餘目標檢測網絡的工做流程。