配置文件yolov3.cfg定義了網絡的結構html
.... [convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky [shortcut] from=-3 activation=linear .....
配置文件描述了model的結構.python
yolov3有如下幾種結構安全
[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
相似於resnet,用以加深網絡深度.上述配置的含義是shortcut layer的輸出是前一層和前三層的輸出的疊加.
resnet skip connection解釋詳細見https://zhuanlan.zhihu.com/p/28124810網絡
[upsample] stride=2
經過雙線性插值法將N*N的feature map變爲(stride*N) * (stride*N)的feature map.模仿特徵金字塔,生成多尺度feature map.增強小目標檢測效果.app
[route] layers = -4 [route] layers = -1, 61
以上述配置爲例:
當layers只有一個值,表明route layer輸出的是router layer - 4那一層layer的feature map.
當layers有2個值時,表明route layer的輸出爲route layer -1和第61 layer的feature map在深度方向鏈接起來.(好比說3*3*100,3*3*200add起來變成3*3*300)dom
[yolo] mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1
yolo層負責預測. anchors是9個anchor,事先聚類獲得,表示最有可能的anchor形狀.
mask表示哪幾組anchor被使用.好比mask=0,1,2表明使用10,13 16,30 30,61這幾組anchor. 在原理篇裏說過了,每一個cell預測3個boudingbox. 三種尺度,總計9種.ide
[net] # Testing batch=1 subdivisions=1 # Training # batch=64 # subdivisions=16 width= 320 height = 320 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1
定義了model的輸入,batch等等.函數
如今開始寫代碼:測試
這一步裏,作配置文件的解析.把每一塊的配置內容存儲於一個dict.ui
def parse_cfg(cfgfile): """ Takes a configuration file Returns a list of blocks. Each blocks describes a block in the neural network to be built. Block is represented as a dictionary in the list """ file = open(cfgfile, 'r') # store the lines in a list lines = file.read().split('\n') # get read of the empty lines lines = [x for x in lines if len(x) > 0] lines = [x for x in lines if x[0] != '#'] # get rid of comments # get rid of fringe whitespaces lines = [x.rstrip().lstrip() for x in lines] block = {} blocks = [] for line in lines: if line[0] == "[": # This marks the start of a new block # If block is not empty, implies it is storing values of previous block. if len(block) != 0: blocks.append(block) # add it the blocks list block = {} # re-init the block block["type"] = line[1:-1].rstrip() else: key, value = line.split("=") block[key.rstrip()] = value.lstrip() blocks.append(block) return blocks
逐個layer建立.
def create_modules(blocks): # Captures the information about the input and pre-processing net_info = blocks[0] module_list = nn.ModuleList() prev_filters = 3 #卷積的時候須要知道卷積核的depth.卷積核的size在配置文件裏定義了.depeth就是上一層的output的depth. output_filters = [] #用以保存每個layer的輸出的feature map #index表明了當前layer位於網絡的第幾層 for index, x in enumerate(blocks[1:]): #生成每個layer module_list.append(module) prev_filters = filters output_filters.append(filters) return(net_info,module_list)
[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky
除了卷積以外實際上還包括了bn和leaky.batchnormalize基本成了標配了如今,用來解決梯度消失的問題(反向傳播梯度越乘越小).leaky是激活函數RLU.
因此用到了nn.Sequential()
module = nn.Sequential() module.add_module("conv_{0}".format(index), conv) module.add_module("batch_norm_{0}".format(index), bn) module.add_module("leaky_{0}".format(index), activn)
卷積層建立完整代碼
涉及到一個python語法enumerate. 就是爲一個list中的每一個元素添加一個index,造成新的list.
>>>seasons = ['Spring', 'Summer', 'Fall', 'Winter'] >>> list(enumerate(seasons)) [(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')] >>> list(enumerate(seasons, start=1)) # 下標從 1 開始 [(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]
卷積層建立
#index表明了當前layer位於網絡的第幾層 for index, x in enumerate(blocks[1:]): module = nn.Sequential() #check the type of block #create a new module for the block #append to module_list if (x["type"] == "convolutional"): #Get the info about the layer activation = x["activation"] try: batch_normalize = int(x["batch_normalize"]) bias = False except: batch_normalize = 0 bias = True filters= int(x["filters"]) padding = int(x["pad"]) kernel_size = int(x["size"]) stride = int(x["stride"]) if padding: pad = (kernel_size - 1) // 2 else: pad = 0 #Add the convolutional layer #prev_filters是上一層輸出的feature map的depth.好比上層有64個卷積核,則輸出爲m*n*64 conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias) module.add_module("conv_{0}".format(index), conv) #Add the Batch Norm Layer if batch_normalize: bn = nn.BatchNorm2d(filters) module.add_module("batch_norm_{0}".format(index), bn) #Check the activation. #It is either Linear or a Leaky ReLU for YOLO if activation == "leaky": activn = nn.LeakyReLU(0.1, inplace = True) module.add_module("leaky_{0}".format(index), activn)
#If it's an upsampling layer #We use Bilinear2dUpsampling elif (x["type"] == "upsample"): stride = int(x["stride"]) upsample = nn.Upsample(scale_factor = 2, mode = "bilinear") module.add_module("upsample_{}".format(index), upsample)
[route] layers = -4 [route] layers = -1, 61
首先是解析配置文件,而後將相應層的feature map 鏈接起來做爲輸出
#If it is a route layer elif (x["type"] == "route"): x["layers"] = x["layers"].split(',') #Start of a route start = int(x["layers"][0]) #end, if there exists one. try: end = int(x["layers"][1]) except: end = 0 #Positive anotation if start > 0: start = start - index #start轉換成相對於當前layer的偏移 if end > 0: end = end - index #end轉換成相對於當前layer的偏移 route = EmptyLayer() module.add_module("route_{0}".format(index), route) if end < 0: #route層concat當前layer前面的某2個layer,因此index>0是無心義的. filters = output_filters[index + start] + output_filters[index + end] else: filters= output_filters[index + start]
這裏咱們自定義了一個EmptyLayer
class EmptyLayer(nn.Module): def __init__(self): super(EmptyLayer, self).__init__()
這裏定義EmptyLayer是爲了代碼的簡便起見.在pytorch裏定義一個自定義的layer.要寫一個類,繼承自nn.Module,而後實現forward方法.
關於如何定義一個自定義layer,參見下面的link.
https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_module.html
import torch class TwoLayerNet(torch.nn.Module): def __init__(self, D_in, H, D_out): """ In the constructor we instantiate two nn.Linear modules and assign them as member variables. """ super(TwoLayerNet, self).__init__() self.linear1 = torch.nn.Linear(D_in, H) self.linear2 = torch.nn.Linear(H, D_out) def forward(self, x): """ In the forward function we accept a Tensor of input data and we must return a Tensor of output data. We can use Modules defined in the constructor as well as arbitrary operators on Tensors. """ h_relu = self.linear1(x).clamp(min=0) y_pred = self.linear2(h_relu) return y_pred # N is batch size; D_in is input dimension; # H is hidden dimension; D_out is output dimension. N, D_in, H, D_out = 64, 1000, 100, 10 # Create random Tensors to hold inputs and outputs x = torch.randn(N, D_in) y = torch.randn(N, D_out) # Construct our model by instantiating the class defined above model = TwoLayerNet(D_in, H, D_out) # Construct our loss function and an Optimizer. The call to model.parameters() # in the SGD constructor will contain the learnable parameters of the two # nn.Linear modules which are members of the model. criterion = torch.nn.MSELoss(reduction='sum') optimizer = torch.optim.SGD(model.parameters(), lr=1e-4) for t in range(500): # Forward pass: Compute predicted y by passing x to the model y_pred = model(x) # Compute and print loss loss = criterion(y_pred, y) print(t, loss.item()) # Zero gradients, perform a backward pass, and update the weights. optimizer.zero_grad() loss.backward() optimizer.step()
這裏因爲咱們的route layer要作的事情很簡單,就是concat兩個layer裏的feature map,調用torch.cat一行代碼的事情,因此不必定義一個RouteLayer了,直接在表明darknet的nn.Module的forward方法裏作concat操做就能夠啦.
#shortcut corresponds to skip connection elif x["type"] == "shortcut": shortcut = EmptyLayer() module.add_module("shortcut_{}".format(index), shortcut)
和route層相似,這邊也用個EmptyLayer替代.shortcut所作操做即對兩個feature map作addition.
#Yolo is the detection layer elif x["type"] == "yolo": mask = x["mask"].split(",") mask = [int(x) for x in mask] anchors = x["anchors"].split(",") anchors = [int(a) for a in anchors] anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)] anchors = [anchors[i] for i in mask] detection = DetectionLayer(anchors) module.add_module("Detection_{}".format(index), detection) #咱們本身定義了一個yolo層 class DetectionLayer(nn.Module): def __init__(self, anchors): super(DetectionLayer, self).__init__() self.anchors = anchors
blocks = parse_cfg("cfg/yolov3.cfg") print(create_modules(blocks))
輸出以下
完整代碼以下:
#coding=utf-8 from __future__ import division import torch import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable import numpy as np def parse_cfg(cfgfile): """ Takes a configuration file Returns a list of blocks. Each blocks describes a block in the neural network to be built. Block is represented as a dictionary in the list """ file = open(cfgfile, 'r') # store the lines in a list lines = file.read().split('\n') # get read of the empty lines lines = [x for x in lines if len(x) > 0] lines = [x for x in lines if x[0] != '#'] # get rid of comments # get rid of fringe whitespaces lines = [x.rstrip().lstrip() for x in lines] block = {} blocks = [] for line in lines: if line[0] == "[": # This marks the start of a new block # If block is not empty, implies it is storing values of previous block. if len(block) != 0: blocks.append(block) # add it the blocks list block = {} # re-init the block block["type"] = line[1:-1].rstrip() else: key, value = line.split("=") block[key.rstrip()] = value.lstrip() blocks.append(block) return blocks class EmptyLayer(nn.Module): def __init__(self): super(EmptyLayer, self).__init__() class DetectionLayer(nn.Module): def __init__(self, anchors): super(DetectionLayer, self).__init__() self.anchors = anchors def create_modules(blocks): # Captures the information about the input and pre-processing net_info = blocks[0] module_list = nn.ModuleList() prev_filters = 3 output_filters = [] #index表明了當前layer位於網絡的第幾層 for index, x in enumerate(blocks[1:]): module = nn.Sequential() #check the type of block #create a new module for the block #append to module_list if (x["type"] == "convolutional"): #Get the info about the layer activation = x["activation"] try: batch_normalize = int(x["batch_normalize"]) bias = False except: batch_normalize = 0 bias = True filters= int(x["filters"]) padding = int(x["pad"]) kernel_size = int(x["size"]) stride = int(x["stride"]) if padding: pad = (kernel_size - 1) // 2 else: pad = 0 #Add the convolutional layer #prev_filters是上一層輸出的feature map的depth.好比上層有64個卷積核,則輸出爲m*n*64 conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias) module.add_module("conv_{0}".format(index), conv) #Add the Batch Norm Layer if batch_normalize: bn = nn.BatchNorm2d(filters) module.add_module("batch_norm_{0}".format(index), bn) #Check the activation. #It is either Linear or a Leaky ReLU for YOLO if activation == "leaky": activn = nn.LeakyReLU(0.1, inplace = True) module.add_module("leaky_{0}".format(index), activn) #If it's an upsampling layer #We use Bilinear2dUpsampling elif (x["type"] == "upsample"): stride = int(x["stride"]) upsample = nn.Upsample(scale_factor = 2, mode = "bilinear") module.add_module("upsample_{}".format(index), upsample) #If it is a route layer elif (x["type"] == "route"): x["layers"] = x["layers"].split(',') #Start of a route start = int(x["layers"][0]) #end, if there exists one. try: end = int(x["layers"][1]) except: end = 0 #Positive anotation if start > 0: start = start - index if end > 0: end = end - index route = EmptyLayer() module.add_module("route_{0}".format(index), route) if end < 0: filters = output_filters[index + start] + output_filters[index + end] else: filters= output_filters[index + start] #shortcut corresponds to skip connection elif x["type"] == "shortcut": shortcut = EmptyLayer() module.add_module("shortcut{}".format(index), shortcut) #Yolo is the detection layer elif x["type"] == "yolo": mask = x["mask"].split(",") mask = [int(x) for x in mask] anchors = x["anchors"].split(",") anchors = [int(a) for a in anchors] anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)] anchors = [anchors[i] for i in mask] detection = DetectionLayer(anchors) module.add_module("Detection_{}".format(index), detection) module_list.append(module) prev_filter = filters output_filters.append(filters) return (net_info,module_list) blocks = parse_cfg("/home/suchang/work_codes/keepgoing/yolov3-torch/cfg/yolov3.cfg") print(create_modules(blocks))