在上一篇裏咱們實現了forward函數.獲得了prediction.此時預測出了特別多的box以及各類class probability,如今咱們要從中過濾出咱們最終的預測box.
理解了yolov3的輸出的格式及每個位置的含義,並不難理解源碼.我在閱讀源碼的過程當中主要的困難在於對pytorch不熟悉,因此在這篇文章裏,關於其中涉及的一些pytorch中的函數的用法我都已經用加粗標示了而且給出了相應的連接,測試代碼等.html
咱們設置一個obj score thershold,超過這個值的才認爲是有效的.ide
conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2) prediction = prediction*conf_mask
prediction是1*boxnum*boxattr
prediction[:,:,4]是1*boxnum 元素值爲boxattr的index=4的那個值.函數
torch中的Tensor index和numpy是相似的,參看下列代碼輸出oop
import torch x = torch.Tensor(1,3,10) # Create an un-initialized Tensor of size 2x3 print(x) print(x.shape) # Print out the Tensor y = x[:,:,4] print(y) print(y.shape) z = x[:,:,4:6] print(z) print(z.shape) print((y>0.5).float().unsqueeze(2)) #### 輸出以下 tensor([[[2.5226e-18, 1.6898e-04, 1.0413e-11, 7.7198e-10, 1.0549e-08, 4.0516e-11, 1.0681e-05, 2.9575e-18, 6.7333e+22, 1.7591e+22], [1.7184e+25, 4.3222e+27, 6.1972e-04, 7.2443e+22, 1.7728e+28, 7.0367e+22, 5.9018e-10, 2.6540e-09, 1.2972e-11, 5.3370e-08], [2.7001e-06, 2.6801e-09, 4.1292e-05, 2.1511e+23, 3.2770e-09, 2.5125e-18, 7.7052e+31, 1.9447e+31, 5.0207e+28, 1.1492e-38]]]) torch.Size([1, 3, 10]) tensor([[1.0549e-08, 1.7728e+28, 3.2770e-09]]) torch.Size([1, 3]) tensor([[[1.0549e-08, 4.0516e-11], [1.7728e+28, 7.0367e+22], [3.2770e-09, 2.5125e-18]]]) torch.Size([1, 3, 2]) tensor([[[0.], [0.], [0.]]])
Squeeze and unsqueeze 下降維度,升高維度.測試
t = torch.ones(2,1,2,1) # Size 2x1x2x1 r = torch.squeeze(t) # Size 2x2 r = torch.squeeze(t, 1) # Squeeze dimension 1: Size 2x2x1 # Un-squeeze a dimension x = torch.Tensor([1, 2, 3]) r = torch.unsqueeze(x, 0) # Size: 1x3 表示在第0個維度添加1維 r = torch.unsqueeze(x, 1) # Size: 3x1 表示在第1個維度添加1維
這樣prediction中objscore<threshold的已經變成了0.ui
tensor.new() 建立一個和原有tensor的dtype一致的新tensor https://stackoverflow.com/questions/49263588/pytorch-beginner-tensor-new-method.net
#獲得box座標(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y) box_corner = prediction.new(prediction.shape) box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2) box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2) box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2) prediction[:,:,:4] = box_corner[:,:,:4]
原始的prediction中boxattr存放的是x,y,w,h,...,不方便咱們處理,咱們將其轉換成(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)scala
接下來咱們挨個處理每一張圖片對應的feature map.code
batch_size = prediction.size(0) write = False for ind in range(batch_size): #image_pred.shape=boxnum\*boxattr image_pred = prediction[ind] #image Tensor box_num*box_attr #confidence threshholding #NMS #返回每一行的最大值,及最大值所在的列. max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1) #升級成和image_pred一樣的維度 max_conf = max_conf.float().unsqueeze(1) max_conf_score = max_conf_score.float().unsqueeze(1) seq = (image_pred[:,:5], max_conf, max_conf_score) #沿着列的方向拼接. 如今image_pred變成boxnum\*7 image_pred = torch.cat(seq, 1)
這裏涉及到torch.max的用法,參見https://blog.csdn.net/Z_lbj/article/details/79766690
torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)
按維度dim 返回最大值.能夠這麼記憶,沿着第dim維度比較.torch.max(0)即沿着行的方向比較,即獲得每列的最大值.
假設input是二維矩陣,即行*列,行是第0維,列是第一維.orm
c=torch.Tensor([[1,2,3],[6,5,4]]) print(c) a,b=torch.max(c,1) print(a) print(b) ##輸出以下: tensor([[1., 2., 3.], [6., 5., 4.]]) tensor([3., 6.]) tensor([2, 0])
torch.cat用法,參見https://pytorch.org/docs/stable/torch.html
torch.cat(tensors, dim=0, out=None) → Tensor >>> x = torch.randn(2, 3) >>> x tensor([[ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497]]) >>> torch.cat((x, x, x), 0) tensor([[ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497], [ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497], [ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497]]) >>> torch.cat((x, x, x), 1) tensor([[ 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497]])
接下來咱們只處理obj_score非0的數據(obj_score<obj_threshold轉變爲0)
non_zero_ind = (torch.nonzero(image_pred[:,4])) try: image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7) except: continue #For PyTorch 0.4 compatibility #Since the above code with not raise exception for no detection #as scalars are supported in PyTorch 0.4 if image_pred_.shape[0] == 0: continue
ok,接下來咱們對每一種class作nms.
首先取到咱們有哪些類別
#Get the various classes detected in the image img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index
而後依次對每一種類別作處理
for cls in img_classes: #perform NMS #get the detections with one particular class #取出當前class爲當前class且class prob!=0的行 cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1) class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze() image_pred_class = image_pred_[class_mask_ind].view(-1,7) #sort the detections such that the entry with the maximum objectness #confidence is at the top #按照obj score從高到低作排序 conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1] image_pred_class = image_pred_class[conf_sort_index] idx = image_pred_class.size(0) #Number of detections for i in range(idx): #Get the IOUs of all boxes that come after the one we are looking at #in the loop try: #計算第i個和其後每一行的的iou ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:]) except ValueError: break except IndexError: break #Zero out all the detections that have IoU > treshhold #把與第i行iou>nms_conf的認爲是同一個目標的box,將其轉成0 iou_mask = (ious < nms_conf).float().unsqueeze(1) image_pred_class[i+1:] *= iou_mask #把iou>nms_conf的移除掉 non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze() image_pred_class = image_pred_class[non_zero_ind].view(-1,7) batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind) #Repeat the batch_id for as many detections of the class cls in the image seq = batch_ind, image_pred_class
其中計算iou的代碼以下,很少解釋了.iou=交疊面積/總面積
def bbox_iou(box1, box2): """ Returns the IoU of two bounding boxes """ #Get the coordinates of bounding boxes b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3] b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3] #get the corrdinates of the intersection rectangle inter_rect_x1 = torch.max(b1_x1, b2_x1) inter_rect_y1 = torch.max(b1_y1, b2_y1) inter_rect_x2 = torch.min(b1_x2, b2_x2) inter_rect_y2 = torch.min(b1_y2, b2_y2) #Intersection area inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0) #Union Area b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1) b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1) iou = inter_area / (b1_area + b2_area - inter_area) return iou
關於nms能夠看下https://blog.csdn.net/shuzfan/article/details/52711706
tensor index操做用法以下:
image_pred_ = torch.Tensor([[1,2,3,4,9],[5,6,7,8,9]]) #print(image_pred_[:,-1] == 9) has_9 = (image_pred_[:,-1] == 9) print(has_9) ###執行順序是(image_pred_[:,-1] == 9).float().unsqueeze(1) 再作tensor乘法 cls_mask = image_pred_*(image_pred_[:,-1] == 9).float().unsqueeze(1) print(cls_mask) class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze() image_pred_class = image_pred_[class_mask_ind] 輸出以下: tensor([1, 1], dtype=torch.uint8) tensor([[1., 2., 3., 4., 9.], [5., 6., 7., 8., 9.]])
torch.sort用法以下:
d=torch.Tensor([[1,2,3],[6,5,4]]) e=d[:,2] print(e) print(torch.sort(e)) 輸出 tensor([3., 4.]) torch.return_types.sort( values=tensor([3., 4.]), indices=tensor([0, 1]))
總結一下咱們作nms的流程
每個image,會預測出N個detetction信息,包括4+1+C(4個座標信息,1個obj score以及C個class probability)
write_results最終的返回值是一個n*8的tensor,其中8是(batch_index,4個座標,1個objscore,1個class prob,一個class index)
def write_results(prediction, confidence, num_classes, nms_conf = 0.4): print("prediction.shape=",prediction.shape) #將obj_score < confidence的行置爲0 conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2) prediction = prediction*conf_mask #獲得box座標(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y) box_corner = prediction.new(prediction.shape) box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2) box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2) box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2) #修改prediction第三個維度的前四列 prediction[:,:,:4] = box_corner[:,:,:4] batch_size = prediction.size(0) write = False for ind in range(batch_size): #image_pred.shape=boxnum\*boxattr image_pred = prediction[ind] #image Tensor #confidence threshholding #NMS ##取出每一行的class score最大的一個 max_conf_score,max_conf = torch.max(image_pred[:,5:5+ num_classes], 1) max_conf = max_conf.float().unsqueeze(1) max_conf_score = max_conf_score.float().unsqueeze(1) seq = (image_pred[:,:5], max_conf_score, max_conf) image_pred = torch.cat(seq, 1) #如今變成7列,分別爲左上角x,左上角y,右下角x,右下角y,obj score,最大probabilty,相應的class index print(image_pred.shape) non_zero_ind = (torch.nonzero(image_pred[:,4])) try: image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7) except: continue #For PyTorch 0.4 compatibility #Since the above code with not raise exception for no detection #as scalars are supported in PyTorch 0.4 if image_pred_.shape[0] == 0: continue #Get the various classes detected in the image img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index for cls in img_classes: #perform NMS #get the detections with one particular class #取出當前class爲當前class且class prob!=0的行 cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1) class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze() image_pred_class = image_pred_[class_mask_ind].view(-1,7) #sort the detections such that the entry with the maximum objectness #confidence is at the top #按照obj score從高到低作排序 conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1] image_pred_class = image_pred_class[conf_sort_index] idx = image_pred_class.size(0) #Number of detections for i in range(idx): #Get the IOUs of all boxes that come after the one we are looking at #in the loop try: #計算第i個和其後每一行的的iou ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:]) except ValueError: break except IndexError: break #Zero out all the detections that have IoU > treshhold #把與第i行iou>nms_conf的認爲是同一個目標的box,將其轉成0 iou_mask = (ious < nms_conf).float().unsqueeze(1) image_pred_class[i+1:] *= iou_mask #把iou>nms_conf的移除掉 non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze() image_pred_class = image_pred_class[non_zero_ind].view(-1,7) batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind) #Repeat the batch_id for as many detections of the class cls in the image seq = batch_ind, image_pred_class if not write: output = torch.cat(seq,1) #沿着列方向,shape 1*8 write = True else: out = torch.cat(seq,1) output = torch.cat((output,out)) #沿着行方向 shape n*8 try: return output except: return 0