yolov3在作boundingbox預測的時候,用到了anchor boxes.這個anchors的含義即最有可能的object的width,height.事先經過聚類獲得.好比某一個像素單元,我想對這個像素單元預測出一個object,圍繞這個像素單元,能夠預測出無數種object的形狀,並非隨便預測的,要參考anchor box的大小,即從已標註的數據中經過聚類統計到的最有可能的object的形狀.html
.cfg文件內的配置以下:python
[yolo] mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
在用咱們本身的數據作訓練的時候,要先修改anchors,匹配咱們本身的數據.anchors大小經過聚類獲得.git
通俗地說,聚類就是把捱得近的數據點劃分到一塊兒.
kmeans算法的思想很簡單github
<object-class> <x_center> <y_center> <width> <height> Where: <object-class> - integer object number from 0 to (classes-1) <x_center> <y_center> <width> <height> - float values relative to width and height of image, it can be equal from (0.0 to 1.0] > for example: <x> = <absolute_x> / <image_width> or <height> = <absolute_height> / <image_height> atention: <x_center> <y_center> - are center of rectangle (are not top-left corner)
舉例:
1 0.716797 0.395833 0.216406 0.147222
全部的值都是比例.(中心點x,中心點y,目標寬,目標高)算法
通常來講,計算樣本點到質心的距離的時候直接算的是兩點之間的距離,而後將樣本點劃歸爲與之距離最近的一個質心.
在yolov3中樣本點的數據是有具體的業務上的含義的,咱們其實最終目的是想知道最有可能的object對應的bounding box的形狀是什麼樣子的. 因此這個距離的計算咱們並非直接算兩點之間的距離,咱們計算兩個box的iou,即2個box的類似程度.d=1-iou(box1,box_cluster). 這樣d越小,說明box1與box_cluster越相似.將box劃歸爲box_cluster.app
f = open(args.filelist) lines = [line.rstrip('\n') for line in f.readlines()] annotation_dims = [] size = np.zeros((1,1,3)) for line in lines: #line = line.replace('images','labels') #line = line.replace('img1','labels') line = line.replace('JPEGImages','labels') line = line.replace('.jpg','.txt') line = line.replace('.png','.txt') print(line) f2 = open(line) for line in f2.readlines(): line = line.rstrip('\n') w,h = line.split(' ')[3:] #print(w,h) annotation_dims.append(tuple(map(float,(w,h)))) annotation_dims = np.array(annotation_dims)
看着一大段,其實重點就一句函數
w,h = line.split(' ')[3:] annotation_dims.append(tuple(map(float,(w,h))))
這裏涉及到了python的語法,map用法https://www.runoob.com/python/python-func-map.html
這樣就生成了一個N*2矩陣. N表明你的樣本個數.ui
def IOU(x,centroids): similarities = [] k = len(centroids) for centroid in centroids: c_w,c_h = centroid w,h = x if c_w>=w and c_h>=h: #box(c_w,c_h)徹底包含box(w,h) similarity = w*h/(c_w*c_h) elif c_w>=w and c_h<=h: #box(c_w,c_h)寬而扁平 similarity = w*c_h/(w*h + (c_w-w)*c_h) elif c_w<=w and c_h>=h: similarity = c_w*h/(w*h + c_w*(c_h-h)) else: #means both w,h are bigger than c_w and c_h respectively similarity = (c_w*c_h)/(w*h) similarities.append(similarity) # will become (k,) shape return np.array(similarities)
def kmeans(X,centroids,eps,anchor_file): N = X.shape[0] iterations = 0 k,dim = centroids.shape prev_assignments = np.ones(N)*(-1) iter = 0 old_D = np.zeros((N,k)) #距離矩陣 N個點,每一個點到k個質心 共計N*K個距離 while True: D = [] iter+=1 for i in range(N): d = 1 - IOU(X[i],centroids) #d是一個k維的 D.append(d) D = np.array(D) # D.shape = (N,k) print("iter {}: dists = {}".format(iter,np.sum(np.abs(old_D-D)))) #assign samples to centroids assignments = np.argmin(D,axis=1) #返回每一行的最小值的下標.即當前樣本應該歸爲k個質心中的哪個質心. if (assignments == prev_assignments).all() : #質心已經再也不變化 print("Centroids = ",centroids) write_anchors_to_file(centroids,X,anchor_file) return #calculate new centroids centroid_sums=np.zeros((k,dim),np.float) #(k,2) for i in range(N): centroid_sums[assignments[i]]+=X[i] #將每個樣本劃分到對應質心 for j in range(k): centroids[j] = centroid_sums[j]/(np.sum(assignments==j)) #更新質心 prev_assignments = assignments.copy() old_D = D.copy()
for i in range(N): centroid_sums[assignments[i]]+=X[i] #將每個樣本劃分到對應質心 for j in range(k): centroids[j] = centroid_sums[j]/(np.sum(assignments==j)) #更新質心
def write_anchors_to_file(centroids,X,anchor_file): f = open(anchor_file,'w') anchors = centroids.copy() print(anchors.shape) for i in range(anchors.shape[0]): anchors[i][0]*=width_in_cfg_file/32. anchors[i][1]*=height_in_cfg_file/32. widths = anchors[:,0] sorted_indices = np.argsort(widths) print('Anchors = ', anchors[sorted_indices]) for i in sorted_indices[:-1]: f.write('%0.2f,%0.2f, '%(anchors[i,0],anchors[i,1])) #there should not be comma after last anchor, that's why f.write('%0.2f,%0.2f\n'%(anchors[sorted_indices[-1:],0],anchors[sorted_indices[-1:],1])) f.write('%f\n'%(avg_IOU(X,centroids))) print()
因爲yolo要求的label文件中,填寫的是相對於width,height的比例.因此獲得的anchor box的大小要乘以模型輸入圖片的尺寸.
上述代碼裏code
anchors[i][0]*=width_in_cfg_file/32. anchors[i][1]*=height_in_cfg_file/32.
這裏除以32是yolov2的算法要求. yolov3實際上不須要.參見如下連接https://github.com/pjreddie/darknet/issues/911orm
for Yolo v2: width=704 height=576 in cfg-file
./darknet detector calc_anchors data/hand.data -num_of_clusters 5 -width 22 -height 18 -show
for Yolo v3: width=704 height=576 in cfg-file
./darknet detector calc_anchors data/hand.data -num_of_clusters 9 -width 704 -height 576 -show
And you can use any images with any sizes.
完整代碼見https://github.com/AlexeyAB/darknet/tree/master/scripts 用法:python3 gen_anchors.py -filelist ../build/darknet/x64/data/park_train.txt