以前的文章裏https://www.cnblogs.com/sdu20112013/p/11099244.html實現了網絡的各個layer.
本篇來實現網絡的forward的過程.html
class Darknet(nn.Module): def __init__(self, cfgfile): super(Darknet, self).__init__() self.blocks = parse_cfg(cfgfile) self.net_info, self.module_list = create_modules(self.blocks)
forward函數繼承自nn.Modulegit
if module_type == "convolutional" or module_type == "upsample": x = self.module_list[i](x)
在上一篇裏講過了,route layer的輸出是以前某一層或某兩層在depth方向的鏈接.即github
output[current_layer] = output[previous_layer] 或者 map1 = outputs[i + layers[0]] map2 = outputs[i + layers[1]] output[current layer]=torch.cat((map1, map2), 1)
因此route layer代碼以下:網絡
elif module_type == "route": layers = module["layers"] layers = [int(a) for a in layers] if (layers[0]) > 0: layers[0] = layers[0] - i if len(layers) == 1: x = outputs[i + (layers[0])] else: if (layers[1]) > 0: layers[1] = layers[1] - i map1 = outputs[i + layers[0]] map2 = outputs[i + layers[1]] x = torch.cat((map1, map2), 1)
shortcut layer的輸出爲前一層及前xx層(配置文件中配置)的輸出之和ide
elif module_type == "shortcut": from_ = int(module["from"]) x = outputs[i-1] + outputs[i+from_]
yolo層的輸出是一個n*n*depth的feature map矩陣.假設你想訪問第(5,6)個cell的第2個boundingbox的話你須要map[5,6,(5+C):2*(5+C)]這樣訪問,這種形式操做起來有點麻煩,因此咱們引入一個predict_transform函數來改變一下輸出的形式.函數
簡而言之咱們但願把一個batch_size*grid_size*grid_size*(B*(5+C))的4-D矩陣轉換爲batch_size*(grid_size*grid_size*B)*(5+C)的矩陣.
2-D矩陣的每一行的排列以下:
測試
batch_size = prediction.size(0) stride = inp_dim // prediction.size(2) grid_size = inp_dim // stride bbox_attrs = 5 + num_classes num_anchors = len(anchors) prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size) prediction = prediction.transpose(1,2).contiguous() prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)
上述代碼涉及到pytorch中view的用法,和numpy中resize相似.contiguous通常與transpose,permute,view搭配使用,維度變換後tensor在內存中再也不是連續存儲的,而view操做要求連續存儲,因此須要contiguous.最終咱們獲得一個batch_size*(grid_size*grid_size*num_anchors)*bbox_attrs的矩陣.spa
接下來要對預測boundingbox的座標.
.net
注意此時prediction[:,:,0],prediction[:,:,1],prediction[:,:,2],prediction[:,:,3]prediction[:,:,4]即相應的tx,ty,tw,th,obj score.
接下來是預測相對當前cell左上角的offsetcode
#sigmoid轉換爲0-1範圍內 #Sigmoid the centre_X, centre_Y. and object confidencce prediction[:,:,0] = torch.sigmoid(prediction[:,:,0]) prediction[:,:,1] = torch.sigmoid(prediction[:,:,1]) prediction[:,:,4] = torch.sigmoid(prediction[:,:,4]) #Add the center offsets grid = np.arange(grid_size) a,b = np.meshgrid(grid, grid) x_offset = torch.FloatTensor(a).view(-1,1) y_offset = torch.FloatTensor(b).view(-1,1) if CUDA: x_offset = x_offset.cuda() y_offset = y_offset.cuda() x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0) #prediction[:,:,:0],prediction[:,:,:1]修改成相對於當前cell偏移 prediction[:,:,:2] += x_y_offset
有關meshgrid用法效果以下:
import numpy as np import torch grid_size = 13 grid = np.arange(grid_size) a,b = np.meshgrid(grid, grid) print(a) print(b) x_offset = torch.FloatTensor(a).view(-1,1) #print(x_offset) y_offset = torch.FloatTensor(b).view(-1,1)
這段代碼輸出以下:
預測boundingbox的width,height.注意anchors的大小要轉換爲適配當前feature map的大小.配置文件中配置的是相對於模型輸入的大小.
anchors = [(a[0]/stride, a[1]/stride) for a in anchors] #適配到feature map上的尺寸 #log space transform height and the width anchors = torch.FloatTensor(anchors) if CUDA: anchors = anchors.cuda() anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0) prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors ##還原爲原始圖片上對應的座標 prediction[:,:,:4] *= stride
預測class probability
prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))
predict_transform完整代碼以下
#yolo通過不斷地卷積獲得的feature map size= batch_size*(B*(5+C))*grid_size*grid_size def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True): if CUDA: prediction = prediction.to(torch.device("cuda")) #使用gpu torch0.4不須要 torch1.0須要 batch_size = prediction.size(0) stride = inp_dim // prediction.size(2) grid_size = inp_dim // stride bbox_attrs = 5 + num_classes num_anchors = len(anchors) print("prediction.shape=",prediction.shape) print("batch_size=",batch_size) print("inp_dim=",inp_dim) #print("anchors=",anchors) #print("num_classes=",num_classes) print("grid_size=",grid_size) print("bbox_attrs=",bbox_attrs) prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size) prediction = prediction.transpose(1,2).contiguous() prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs) #Sigmoid the centre_X, centre_Y. and object confidencce prediction[:,:,0] = torch.sigmoid(prediction[:,:,0]) prediction[:,:,1] = torch.sigmoid(prediction[:,:,1]) prediction[:,:,4] = torch.sigmoid(prediction[:,:,4]) #Add the center offsets grid = np.arange(grid_size).astype(np.float32) a,b = np.meshgrid(grid, grid) x_offset = torch.FloatTensor(a).view(-1,1) y_offset = torch.FloatTensor(b).view(-1,1) if CUDA: x_offset = x_offset.cuda() y_offset = y_offset.cuda() x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0) print(type(x_y_offset),type(prediction[:,:,:2])) prediction[:,:,:2] += x_y_offset anchors = [(a[0]/stride, a[1]/stride) for a in anchors] #適配到和feature map大小匹配 #log space transform height and the width anchors = torch.FloatTensor(anchors) if CUDA: anchors = anchors.cuda() anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0) prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes])) prediction[:,:,:4] *= stride #恢復到原始圖片上的相應座標,width,height等 return prediction
助手函數寫好了,如今來繼續實現Darknet類的forward方法
elif module_type == "yolo": anchors = self.module_list[i][0].anchors inp_dim = int(self.net_info["height"]) num_classes = int (module["classes"]) x = x.data x = predict_transform(x, inp_dim, anchors, num_classes, CUDA) if not write: #if no collector has been intialised. detections = x write = 1 else: detections = torch.cat((detections, x), 1)
在沒有寫predict_transform以前,不一樣的feature map矩陣,好比13*13*N1,26*26*N2,52*52*N3是無法直接鏈接成一個tensor的,如今都變成了xx*(5+C)則能夠了.
上面代碼裏的write flag主要是爲了區別detections是否爲空,爲空則說明是第一個yolo layer作的預測,將yolo層的輸出賦值給predictions,不爲空則鏈接當前yolo layer的輸出至detections.
下載測試圖片wget https://github.com/ayooshkathuria/pytorch-yolo-v3/raw/master/dog-cycle-car.png
def get_test_input(): img = cv2.imread("dog-cycle-car.png") img = cv2.resize(img, (608,608)) #Resize to the input dimension img_ = img[:,:,::-1].transpose((2,0,1)) # BGR -> RGB | H X W C -> C X H X W img_ = img_[np.newaxis,:,:,:]/255.0 #Add a channel at 0 (for batch) | Normalise img_ = torch.from_numpy(img_).float() #Convert to float img_ = Variable(img_) # Convert to Variable return img_ model = Darknet("cfg/yolov3.cfg") inp = get_test_input() pred = model(inp, torch.cuda.is_available()) print (pred)
cv2.imread()導入圖片時是BGR通道順序,而且是h*w*c,好比416*416*3這種格式,咱們要轉換爲3*416*416這種格式.若是有
最終測試結果以下:
預測出22743個boundingbox,一共3種feature map,分別爲19*19,38*38,76*76 每種尺度下預測出3個box,一共3*(19*19 + 38*38 + 76*76) = 22743個box.