pytorch實現yolov3(2) 配置文件解析及各layer生成

配置文件

配置文件yolov3.cfg定義了網絡的結構html

....

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

.....

配置文件描述了model的結構.python

yolov3 layer

yolov3有如下幾種結構安全

  • Convolutional
  • Shortcut
  • Upsample
  • Route
  • YOLO

Convolutional

[convolutional]
batch_normalize=1  
filters=64  
size=3  
stride=1  
pad=1  
activation=leaky

Shortcut

[shortcut]
from=-3  
activation=linear

相似於resnet,用以加深網絡深度.上述配置的含義是shortcut layer的輸出是前一層和前三層的輸出的疊加.
resnet skip connection解釋詳細見https://zhuanlan.zhihu.com/p/28124810網絡

Upsample

[upsample]
stride=2

經過雙線性插值法將N*N的feature map變爲(stride*N) * (stride*N)的feature map.模仿特徵金字塔,生成多尺度feature map.增強小目標檢測效果.app

Route

[route]
layers = -4

[route]
layers = -1, 61

以上述配置爲例:
當layers只有一個值,表明route layer輸出的是router layer - 4那一層layer的feature map.
當layers有2個值時,表明route layer的輸出爲route layer -1和第61 layer的feature map在深度方向鏈接起來.(好比說3*3*100,3*3*200add起來變成3*3*300)dom

yolo

[yolo]
mask = 0,1,2
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=80
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1

yolo層負責預測. anchors是9個anchor,事先聚類獲得,表示最有可能的anchor形狀.
mask表示哪幾組anchor被使用.好比mask=0,1,2表明使用10,13 16,30 30,61這幾組anchor. 在原理篇裏說過了,每一個cell預測3個boudingbox. 三種尺度,總計9種.ide

Net

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width= 320
height = 320
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

定義了model的輸入,batch等等.函數

如今開始寫代碼:測試

解析配置文件

這一步裏,作配置文件的解析.把每一塊的配置內容存儲於一個dict.ui

def parse_cfg(cfgfile):
    """
    Takes a configuration file

    Returns a list of blocks. Each blocks describes a block in the neural
    network to be built. Block is represented as a dictionary in the list

    """
    file = open(cfgfile, 'r')
    # store the lines in a list
    lines = file.read().split('\n')
    # get read of the empty lines
    lines = [x for x in lines if len(x) > 0]
    lines = [x for x in lines if x[0] != '#']              # get rid of comments
    # get rid of fringe whitespaces
    lines = [x.rstrip().lstrip() for x in lines]

    block = {}
    blocks = []

    for line in lines:
        if line[0] == "[":               # This marks the start of a new block
            # If block is not empty, implies it is storing values of previous block.
            if len(block) != 0:
                blocks.append(block)     # add it the blocks list
                block = {}               # re-init the block
            block["type"] = line[1:-1].rstrip()
        else:
            key, value = line.split("=")
            block[key.rstrip()] = value.lstrip()
    blocks.append(block)

    return blocks

用pytorch建立各個layer

逐個layer建立.

def create_modules(blocks):
    # Captures the information about the input and pre-processing
    net_info = blocks[0]
    module_list = nn.ModuleList()
    prev_filters = 3     #卷積的時候須要知道卷積核的depth.卷積核的size在配置文件裏定義了.depeth就是上一層的output的depth.
    output_filters = []  #用以保存每個layer的輸出的feature map

    #index表明了當前layer位於網絡的第幾層
    for index, x in enumerate(blocks[1:]):
        #生成每個layer
        
        module_list.append(module)
        prev_filters = filters
        output_filters.append(filters)
    
    return(net_info,module_list)
  • 卷積層
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

除了卷積以外實際上還包括了bn和leaky.batchnormalize基本成了標配了如今,用來解決梯度消失的問題(反向傳播梯度越乘越小).leaky是激活函數RLU.
因此用到了nn.Sequential()

module = nn.Sequential()
module.add_module("conv_{0}".format(index), conv)
module.add_module("batch_norm_{0}".format(index), bn)
module.add_module("leaky_{0}".format(index), activn)

卷積層建立完整代碼
涉及到一個python語法enumerate. 就是爲一個list中的每一個元素添加一個index,造成新的list.

>>>seasons = ['Spring', 'Summer', 'Fall', 'Winter']
>>> list(enumerate(seasons))
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
>>> list(enumerate(seasons, start=1))       # 下標從 1 開始
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]

卷積層建立

#index表明了當前layer位於網絡的第幾層
    for index, x in enumerate(blocks[1:]):
        module = nn.Sequential()

        #check the type of block
        #create a new module for the block
        #append to module_list

        if (x["type"] == "convolutional"):
            #Get the info about the layer
            activation = x["activation"]
            try:
                batch_normalize = int(x["batch_normalize"])
                bias = False
            except:
                batch_normalize = 0
                bias = True

            filters= int(x["filters"])
            padding = int(x["pad"])
            kernel_size = int(x["size"])
            stride = int(x["stride"])

            if padding:
                pad = (kernel_size - 1) // 2
            else:
                pad = 0

            #Add the convolutional layer
            #prev_filters是上一層輸出的feature map的depth.好比上層有64個卷積核,則輸出爲m*n*64
            conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias)
            module.add_module("conv_{0}".format(index), conv)

            #Add the Batch Norm Layer
            if batch_normalize:
                bn = nn.BatchNorm2d(filters)
                module.add_module("batch_norm_{0}".format(index), bn)

            #Check the activation. 
            #It is either Linear or a Leaky ReLU for YOLO
            if activation == "leaky":
                activn = nn.LeakyReLU(0.1, inplace = True)
                module.add_module("leaky_{0}".format(index), activn)
  • upsample層
#If it's an upsampling layer
        #We use Bilinear2dUpsampling
        elif (x["type"] == "upsample"):
            stride = int(x["stride"])
            upsample = nn.Upsample(scale_factor = 2, mode = "bilinear")
            module.add_module("upsample_{}".format(index), upsample)
  • route層
[route]
layers = -4

[route]
layers = -1, 61

首先是解析配置文件,而後將相應層的feature map 鏈接起來做爲輸出

#If it is a route layer
        elif (x["type"] == "route"):
            x["layers"] = x["layers"].split(',')
            #Start  of a route
            start = int(x["layers"][0]) 
            #end, if there exists one.
            try:
                end = int(x["layers"][1])
            except:
                end = 0
            #Positive anotation
            if start > 0: 
                start = start - index   #start轉換成相對於當前layer的偏移
            if end > 0:
                end = end - index       #end轉換成相對於當前layer的偏移
            route = EmptyLayer()
            module.add_module("route_{0}".format(index), route)
            if end < 0:   #route層concat當前layer前面的某2個layer,因此index>0是無心義的.
                filters = output_filters[index + start] + output_filters[index + end]
            else:
                filters= output_filters[index + start]

這裏咱們自定義了一個EmptyLayer

class EmptyLayer(nn.Module):
    def __init__(self):
        super(EmptyLayer, self).__init__()

這裏定義EmptyLayer是爲了代碼的簡便起見.在pytorch裏定義一個自定義的layer.要寫一個類,繼承自nn.Module,而後實現forward方法.
關於如何定義一個自定義layer,參見下面的link.
https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_module.html

import torch


class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

這裏因爲咱們的route layer要作的事情很簡單,就是concat兩個layer裏的feature map,調用torch.cat一行代碼的事情,因此不必定義一個RouteLayer了,直接在表明darknet的nn.Module的forward方法裏作concat操做就能夠啦.

  • shorcut層
#shortcut corresponds to skip connection
        elif x["type"] == "shortcut":
            shortcut = EmptyLayer()
            module.add_module("shortcut_{}".format(index), shortcut)

和route層相似,這邊也用個EmptyLayer替代.shortcut所作操做即對兩個feature map作addition.

  • yolo層
    yolo層負責根據feature map作預測
    首先是解析出有效的anchors.而後用咱們本身定義的layer保存這些anchors.而後生成一個module.
    涉及到一個python語法super
    詳細地看:http://www.runoob.com/python/python-func-super.html 簡單地說就是爲了安全地繼承.記住怎麼用的就好了.不必深究
#Yolo is the detection layer
        elif x["type"] == "yolo":
            mask = x["mask"].split(",")
            mask = [int(x) for x in mask]

            anchors = x["anchors"].split(",")
            anchors = [int(a) for a in anchors]
            anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]
            anchors = [anchors[i] for i in mask]

            detection = DetectionLayer(anchors)
            module.add_module("Detection_{}".format(index), detection)

#咱們本身定義了一個yolo層 
class DetectionLayer(nn.Module):
    def __init__(self, anchors):
        super(DetectionLayer, self).__init__()
        self.anchors = anchors

測試代碼

blocks = parse_cfg("cfg/yolov3.cfg")
print(create_modules(blocks))

輸出以下

完整代碼以下:

#coding=utf-8
    
from __future__ import division

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np


def parse_cfg(cfgfile):
    """
    Takes a configuration file

    Returns a list of blocks. Each blocks describes a block in the neural
    network to be built. Block is represented as a dictionary in the list

    """
    file = open(cfgfile, 'r')
    # store the lines in a list
    lines = file.read().split('\n')
    # get read of the empty lines
    lines = [x for x in lines if len(x) > 0]
    lines = [x for x in lines if x[0] != '#']              # get rid of comments
    # get rid of fringe whitespaces
    lines = [x.rstrip().lstrip() for x in lines]

    block = {}
    blocks = []

    for line in lines:
        if line[0] == "[":               # This marks the start of a new block
            # If block is not empty, implies it is storing values of previous block.
            if len(block) != 0:
                blocks.append(block)     # add it the blocks list
                block = {}               # re-init the block
            block["type"] = line[1:-1].rstrip()
        else:
            key, value = line.split("=")
            block[key.rstrip()] = value.lstrip()
    blocks.append(block)

    return blocks


class EmptyLayer(nn.Module):
    def __init__(self):
        super(EmptyLayer, self).__init__()
        

class DetectionLayer(nn.Module):
    def __init__(self, anchors):
        super(DetectionLayer, self).__init__()
        self.anchors = anchors



def create_modules(blocks):
    # Captures the information about the input and pre-processing
    net_info = blocks[0]
    module_list = nn.ModuleList()
    prev_filters = 3
    output_filters = []

    #index表明了當前layer位於網絡的第幾層
    for index, x in enumerate(blocks[1:]):
        module = nn.Sequential()

        #check the type of block
        #create a new module for the block
        #append to module_list

        if (x["type"] == "convolutional"):
            #Get the info about the layer
            activation = x["activation"]
            try:
                batch_normalize = int(x["batch_normalize"])
                bias = False
            except:
                batch_normalize = 0
                bias = True

            filters= int(x["filters"])
            padding = int(x["pad"])
            kernel_size = int(x["size"])
            stride = int(x["stride"])

            if padding:
                pad = (kernel_size - 1) // 2
            else:
                pad = 0

            #Add the convolutional layer
            #prev_filters是上一層輸出的feature map的depth.好比上層有64個卷積核,則輸出爲m*n*64
            conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias)
            module.add_module("conv_{0}".format(index), conv)

            #Add the Batch Norm Layer
            if batch_normalize:
                bn = nn.BatchNorm2d(filters)
                module.add_module("batch_norm_{0}".format(index), bn)

            #Check the activation. 
            #It is either Linear or a Leaky ReLU for YOLO
            if activation == "leaky":
                activn = nn.LeakyReLU(0.1, inplace = True)
                module.add_module("leaky_{0}".format(index), activn)

        #If it's an upsampling layer
        #We use Bilinear2dUpsampling
        elif (x["type"] == "upsample"):
            stride = int(x["stride"])
            upsample = nn.Upsample(scale_factor = 2, mode = "bilinear")
            module.add_module("upsample_{}".format(index), upsample)

            #If it is a route layer
        elif (x["type"] == "route"):
            x["layers"] = x["layers"].split(',')
            #Start  of a route
            start = int(x["layers"][0])
            #end, if there exists one.
            try:
                end = int(x["layers"][1])
            except:
                end = 0
            #Positive anotation
            if start > 0: 
                start = start - index
            if end > 0:
                end = end - index
            route = EmptyLayer()
            module.add_module("route_{0}".format(index), route)
            if end < 0:
                filters = output_filters[index + start] + output_filters[index + end]
            else:
                filters= output_filters[index + start]

        #shortcut corresponds to skip connection
        elif x["type"] == "shortcut":
            shortcut = EmptyLayer()
            module.add_module("shortcut{}".format(index), shortcut)   
        
        #Yolo is the detection layer
        elif x["type"] == "yolo":
            mask = x["mask"].split(",")
            mask = [int(x) for x in mask]
            
            anchors = x["anchors"].split(",")
            anchors = [int(a) for a in anchors]
            anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]
            anchors = [anchors[i] for i in mask]

            detection = DetectionLayer(anchors)
            module.add_module("Detection_{}".format(index), detection)  

        module_list.append(module)
        prev_filter = filters
        output_filters.append(filters)
        
    return (net_info,module_list)

        
blocks = parse_cfg("/home/suchang/work_codes/keepgoing/yolov3-torch/cfg/yolov3.cfg")
print(create_modules(blocks))
相關文章
相關標籤/搜索