做者:高雨茁python
目標檢測(Object Detection)的任務是找出圖像中全部感興趣的目標(物體),肯定它們的類別和位置。
計算機視覺中關於圖像識別有四大類任務:
1.分類-Classification:解決「是什麼?」的問題,即給定一張圖片或一段視頻判斷裏面包含什麼類別的目標。
2.定位-Location:解決「在哪裏?」的問題,即定位出這個目標的的位置。
3.檢測-Detection:解決「是什麼?在哪裏?」的問題,即定位出這個目標的的位置而且知道目標物是什麼。
4.分割-Segmentation:分爲實例的分割(Instance-level)和場景分割(Scene-level),解決「每個像素屬於哪一個目標物或場景」的問題。
算法
1.Two stage目標檢測算法
先進行區域生成(region proposal,RP)(一個有可能包含待檢物體的預選框),再經過卷積神經網絡進行樣本分類。
任務:特徵提取—>生成RP—>分類/定位迴歸。
常見的two stage目標檢測算法有:R-CNN、SPP-Net、Fast R-CNN、Faster R-CNN和R-FCN等。網絡
2.One stage目標檢測算法
不用RP,直接在網絡中提取特徵來預測物體分類和位置。
任務:特徵提取—>分類/定位迴歸。
常見的one stage目標檢測算法有:OverFeat、YOLOv一、YOLOv二、YOLOv三、SSD和RetinaNet等。架構
本文後續將介紹其中的經典算法R-CNN並給出相應的代碼實現。app
R-CNN(Regions with CNN features)是將CNN方法應用到目標檢測問題上的一個里程碑。藉助CNN良好的特徵提取和分類性能,經過RegionProposal方法實現目標檢測問題的轉化。
算法分爲四個步驟:框架
算法前向流程圖以下(圖中數字標記對應上述四個步驟):
在下文中咱們也會按照上述四個步驟的順序講解模型構建,在這以後咱們會講解如何進行模型訓練。
但在開始具體上述操做以前,讓咱們簡單瞭解下在訓練中咱們將會使用到的數據集。dom
原論文中使用的數據集爲:
1.ImageNet ILSVC(一個較大的識別庫) 一千萬圖像,1000類。
2.PASCAL VOC 2007(一個較小的檢測庫) 一萬圖像,20類。
訓練時使用識別庫進行預訓練,然後用檢測庫調優參數並在檢測庫上評測模型效果。機器學習
因爲原數據集容量較大,模型的訓練時間可能會達到幾十個小時之久。爲了簡化訓練,咱們替換了訓練數據集。
與原論文相似,咱們使用的數據包括兩部分:
1.含17種分類的花朵圖片
2.含2種分類的花朵圖片。ide
咱們後續將使用17分類數據進行模型的預訓練,用2分類數據進行fine-tuning獲得最終的預測模型,並在2分類圖片上進行評測。工具
該步驟中咱們要完成的算法流程部分以下圖數字標記:
R-CNN中採用了selective search算法來進行region proposal。該算法首先經過基於圖的圖像分割方法初始化原始區域,即將圖像分割成不少不少的小塊。而後使用貪心策略,計算每兩個相鄰的區域的類似度,而後每次合併最類似的兩塊,直至最終只剩下一塊完整的圖片。並將該過程當中每次產生的圖像塊包括合併的圖像塊都保存下來做爲最終的RoI(Region of Interest)集。詳細算法流程以下:
區域合併採用了多樣性的策略,若是簡單採用一種策略很容易錯誤合併不類似的區域,好比只考慮紋理時,不一樣顏色的區域很容易被誤合併。selective search採用三種多樣性策略來增長候選區域以保證召回:
不少機器學習框架都內置實現了selective search操做。
該步驟中咱們要完成的算法流程部分以下圖數字標記:
在步驟一中咱們獲得了由selective search算法生成的region proposals,但各proposal大小基本不一致,考慮到region proposals後續要被輸入到ConvNet中進行特徵提取,所以有必要將全部region proposals調整至統一且符合ConvNet架構的標準尺寸。相關的代碼實現以下:
import matplotlib.patches as mpatches # Clip Image def clip_pic(img, rect): x = rect[0] y = rect[1] w = rect[2] h = rect[3] x_1 = x + w y_1 = y + h # return img[x:x_1, y:y_1, :], [x, y, x_1, y_1, w, h] return img[y:y_1, x:x_1, :], [x, y, x_1, y_1, w, h] #Resize Image def resize_image(in_image, new_width, new_height, out_image=None, resize_mode=cv2.INTER_CUBIC): img = cv2.resize(in_image, (new_width, new_height), resize_mode) if out_image: cv2.imwrite(out_image, img) return img def image_proposal(img_path): img = cv2.imread(img_path) img_lbl, regions = selective_search( img, scale=500, sigma=0.9, min_size=10) candidates = set() images = [] vertices = [] for r in regions: # excluding same rectangle (with different segments) if r['rect'] in candidates: continue # excluding small regions if r['size'] < 220: continue if (r['rect'][2] * r['rect'][3]) < 500: continue # resize to 227 * 227 for input proposal_img, proposal_vertice = clip_pic(img, r['rect']) # Delete Empty array if len(proposal_img) == 0: continue # Ignore things contain 0 or not C contiguous array x, y, w, h = r['rect'] if w == 0 or h == 0: continue # Check if any 0-dimension exist [a, b, c] = np.shape(proposal_img) if a == 0 or b == 0 or c == 0: continue resized_proposal_img = resize_image(proposal_img,224, 224) candidates.add(r['rect']) img_float = np.asarray(resized_proposal_img, dtype="float32") images.append(img_float) vertices.append(r['rect']) return images, vertices
讓咱們選擇一張圖片檢查下selective search算法效果
img_path = './17flowers/jpg/7/image_0591.jpg' imgs, verts = image_proposal(img_path) fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6)) img = skimage.io.imread(img_path) ax.imshow(img) for x, y, w, h in verts: rect = mpatches.Rectangle((x, y), w, h, fill=False, edgecolor='red', linewidth=1) ax.add_patch(rect) plt.show()
獲得尺寸統一的proposals後,能夠將其輸入到ConvNet進行特徵提取。這裏咱們ConvNet使用的網絡架構模型爲AlexNet。其網絡具體構造以下:
import tflearn from tflearn.layers.core import input_data, dropout, fully_connected from tflearn.layers.conv import conv_2d, max_pool_2d from tflearn.layers.normalization import local_response_normalization from tflearn.layers.estimator import regression # Building 'AlexNet' def create_alexnet(num_classes, restore = True): # Building 'AlexNet' network = input_data(shape=[None, 224, 224, 3]) network = conv_2d(network, 96, 11, strides=4, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = conv_2d(network, 256, 5, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = conv_2d(network, 384, 3, activation='relu') network = conv_2d(network, 384, 3, activation='relu') network = conv_2d(network, 256, 3, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = fully_connected(network, 4096, activation='tanh') network = dropout(network, 0.5) network = fully_connected(network, 4096, activation='tanh') network = dropout(network, 0.5) network = fully_connected(network, num_classes, activation='softmax', restore=restore) network = regression(network, optimizer='momentum', loss='categorical_crossentropy', learning_rate=0.001) return network
至此,咱們完成了ConvNet部分的架構,經過ConvNet咱們能夠從proposal上提取到feature map。
該步驟中咱們要完成的算法流程部分以下圖數字標記:
獲得每一個proposal上提取到的feature map以後,咱們能夠將其輸入到SVMs(值得注意的是SVM分類器的數量並不惟一,每對應一個分類類別咱們都須要訓練一個SVM。對應到咱們的數據集,最終要分類的花朵類別是兩類,所以此時咱們的SVM數量爲2個)中進行分類判別。
對於上述判別爲正例(非背景)的proposal後續輸入到Bbox reg中進行bbox的微調,並輸出最終的邊框預測。
在知曉了算法的整個流程後,如今讓咱們着手於模型訓練。
R-CNN模型的訓練分爲兩步:
首先在大數據集上預訓練,訓練時輸入X爲原圖片,正確標籤Y爲原圖片的分類。相關代碼以下:
import codecs def load_data(datafile, num_class, save=False, save_path='dataset.pkl'): fr = codecs.open(datafile, 'r', 'utf-8') train_list = fr.readlines() labels = [] images = [] for line in train_list: tmp = line.strip().split(' ') fpath = tmp[0] img = cv2.imread(fpath) img = resize_image(img, 224, 224) np_img = np.asarray(img, dtype="float32") images.append(np_img) index = int(tmp[1]) label = np.zeros(num_class) label[index] = 1 labels.append(label) if save: pickle.dump((images, labels), open(save_path, 'wb')) fr.close() return images, labels def train(network, X, Y, save_model_path): # Training model = tflearn.DNN(network, checkpoint_path='model_alexnet', max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output') if os.path.isfile(save_model_path + '.index'): model.load(save_model_path) print('load model...') for _ in range(5): model.fit(X, Y, n_epoch=1, validation_set=0.1, shuffle=True, show_metric=True, batch_size=64, snapshot_step=200, snapshot_epoch=False, run_id='alexnet_oxflowers17') # epoch = 1000 # Save the model model.save(save_model_path) print('save model...') X, Y = load_data('./train_list.txt', 17) net = create_alexnet(17) train(net, X, Y,'./pre_train_model/model_save.model')
以後在預訓練模型上,使用小數據集fine-tuning。這部分訓練方式與上部分訓練有兩個不一樣點:
1.輸入使用region proposal生成的RoI而不是原圖片。
2.對於每一個RoI的正確標籤Y,咱們經過計算RoI與ground truth(原圖片標註的檢測物體範圍標籤)的IOU(Intersection over Union)來肯定。
IoU計算方式以下圖:
可知IoU取值∈[0,1]且取值越大代表RoI與ground truth差距越小。 定義IoU大於0.5的候選區域爲正樣本,其他的爲負樣本。
計算IoU的代碼以下:
# IOU Part 1 def if_intersection(xmin_a, xmax_a, ymin_a, ymax_a, xmin_b, xmax_b, ymin_b, ymax_b): if_intersect = False if xmin_a < xmax_b <= xmax_a and (ymin_a < ymax_b <= ymax_a or ymin_a <= ymin_b < ymax_a): if_intersect = True elif xmin_a <= xmin_b < xmax_a and (ymin_a < ymax_b <= ymax_a or ymin_a <= ymin_b < ymax_a): if_intersect = True elif xmin_b < xmax_a <= xmax_b and (ymin_b < ymax_a <= ymax_b or ymin_b <= ymin_a < ymax_b): if_intersect = True elif xmin_b <= xmin_a < xmax_b and (ymin_b < ymax_a <= ymax_b or ymin_b <= ymin_a < ymax_b): if_intersect = True else: return if_intersect if if_intersect: x_sorted_list = sorted([xmin_a, xmax_a, xmin_b, xmax_b]) y_sorted_list = sorted([ymin_a, ymax_a, ymin_b, ymax_b]) x_intersect_w = x_sorted_list[2] - x_sorted_list[1] y_intersect_h = y_sorted_list[2] - y_sorted_list[1] area_inter = x_intersect_w * y_intersect_h return area_inter # IOU Part 2 def IOU(ver1, vertice2): # vertices in four points vertice1 = [ver1[0], ver1[1], ver1[0]+ver1[2], ver1[1]+ver1[3]] area_inter = if_intersection(vertice1[0], vertice1[2], vertice1[1], vertice1[3], vertice2[0], vertice2[2], vertice2[1], vertice2[3]) if area_inter: area_1 = ver1[2] * ver1[3] area_2 = vertice2[4] * vertice2[5] iou = float(area_inter) / (area_1 + area_2 - area_inter) return iou return False
在使用小數據集進行fine-tuning以前,讓咱們完成相關訓練數據(RoI集的標籤、對應圖片、框體標記等)的讀取工做,下方代碼中咱們順帶讀取並保存了用於SVM訓練和目標框體迴歸的數據。
# Read in data and save data for Alexnet def load_train_proposals(datafile, num_clss, save_path, threshold=0.5, is_svm=False, save=False): fr = open(datafile, 'r') train_list = fr.readlines() # random.shuffle(train_list) for num, line in enumerate(train_list): labels = [] images = [] rects = [] tmp = line.strip().split(' ') # tmp0 = image address # tmp1 = label # tmp2 = rectangle vertices img = cv2.imread(tmp[0]) # 選擇搜索獲得候選框 img_lbl, regions = selective_search( img, scale=500, sigma=0.9, min_size=10) candidates = set() ref_rect = tmp[2].split(',') ref_rect_int = [int(i) for i in ref_rect] Gx = ref_rect_int[0] Gy = ref_rect_int[1] Gw = ref_rect_int[2] Gh = ref_rect_int[3] for r in regions: # excluding same rectangle (with different segments) if r['rect'] in candidates: continue # excluding small regions if r['size'] < 220: continue if (r['rect'][2] * r['rect'][3]) < 500: continue # 截取目標區域 proposal_img, proposal_vertice = clip_pic(img, r['rect']) # Delete Empty array if len(proposal_img) == 0: continue # Ignore things contain 0 or not C contiguous array x, y, w, h = r['rect'] if w == 0 or h == 0: continue # Check if any 0-dimension exist [a, b, c] = np.shape(proposal_img) if a == 0 or b == 0 or c == 0: continue resized_proposal_img = resize_image(proposal_img, 224, 224) candidates.add(r['rect']) img_float = np.asarray(resized_proposal_img, dtype="float32") images.append(img_float) # IOU iou_val = IOU(ref_rect_int, proposal_vertice) # x,y,w,h做差,用於boundingbox迴歸 rects.append([(Gx-x)/w, (Gy-y)/h, math.log(Gw/w), math.log(Gh/h)]) # propasal_rect = [proposal_vertice[0], proposal_vertice[1], proposal_vertice[4], proposal_vertice[5]] # print(iou_val) # labels, let 0 represent default class, which is background index = int(tmp[1]) if is_svm: # iou小於閾值,爲背景,0 if iou_val < threshold: labels.append(0) else: labels.append(index) else: label = np.zeros(num_clss + 1) if iou_val < threshold: label[0] = 1 else: label[index] = 1 labels.append(label) if is_svm: ref_img, ref_vertice = clip_pic(img, ref_rect_int) resized_ref_img = resize_image(ref_img, 224, 224) img_float = np.asarray(resized_ref_img, dtype="float32") images.append(img_float) rects.append([0, 0, 0, 0]) labels.append(index) view_bar("processing image of %s" % datafile.split('\\')[-1].strip(), num + 1, len(train_list)) if save: if is_svm: # strip()去除首位空格 np.save((os.path.join(save_path, tmp[0].split('/')[-1].split('.')[0].strip()) + '_data.npy'), [images, labels, rects]) else: # strip()去除首位空格 np.save((os.path.join(save_path, tmp[0].split('/')[-1].split('.')[0].strip()) + '_data.npy'), [images, labels]) print(' ') fr.close() # load data def load_from_npy(data_set): images, labels = [], [] data_list = os.listdir(data_set) # random.shuffle(data_list) for ind, d in enumerate(data_list): i, l = np.load(os.path.join(data_set, d),allow_pickle=True) images.extend(i) labels.extend(l) view_bar("load data of %s" % d, ind + 1, len(data_list)) print(' ') return images, labels import math import sys #Progress bar def view_bar(message, num, total): rate = num / total rate_num = int(rate * 40) rate_nums = math.ceil(rate * 100) r = '\r%s:[%s%s]%d%%\t%d/%d' % (message, ">" * rate_num, " " * (40 - rate_num), rate_nums, num, total,) sys.stdout.write(r) sys.stdout.flush()
有了上述準備咱們能夠開始模型fine-tuning階段的訓練,相關代碼以下:
def fine_tune_Alexnet(network, X, Y, save_model_path, fine_tune_model_path): # Training model = tflearn.DNN(network, checkpoint_path='rcnn_model_alexnet', max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output_RCNN') if os.path.isfile(fine_tune_model_path + '.index'): print("Loading the fine tuned model") model.load(fine_tune_model_path) elif os.path.isfile(save_model_path + '.index'): print("Loading the alexnet") model.load(save_model_path) else: print("No file to load, error") return False model.fit(X, Y, n_epoch=1, validation_set=0.1, shuffle=True, show_metric=True, batch_size=64, snapshot_step=200, snapshot_epoch=False, run_id='alexnet_rcnnflowers2') # Save the model model.save(fine_tune_model_path) data_set = './data_set' if len(os.listdir('./data_set')) == 0: print("Reading Data") load_train_proposals('./fine_tune_list.txt', 2, save=True, save_path=data_set) print("Loading Data") X, Y = load_from_npy(data_set) restore = False if os.path.isfile('./fine_tune_model/fine_tune_model_save.model' + '.index'): restore = True print("Continue fine-tune") # three classes include background net = create_alexnet(3, restore=restore) fine_tune_Alexnet(net, X, Y, './pre_train_model/model_save.model', './fine_tune_model/fine_tune_model_save.model')
該步驟中咱們要訓練SVMs和Bbox reg以下圖數字標記:
首先咱們從步驟一這裏使用的CNN模型裏提取出feature map,注意這裏使用的ConvNet與以前訓練時所用的相比少了最後一層softmax,由於此時咱們須要的是從RoI上提取到的特徵而訓練中須要softmax層來進行分類。相關代碼以下:
def create_alexnet(): # Building 'AlexNet' network = input_data(shape=[None, 224, 224, 3]) network = conv_2d(network, 96, 11, strides=4, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = conv_2d(network, 256, 5, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = conv_2d(network, 384, 3, activation='relu') network = conv_2d(network, 384, 3, activation='relu') network = conv_2d(network, 256, 3, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = fully_connected(network, 4096, activation='tanh') network = dropout(network, 0.5) network = fully_connected(network, 4096, activation='tanh') network = regression(network, optimizer='momentum', loss='categorical_crossentropy', learning_rate=0.001) return network
每對應一個分類類別咱們都須要訓練一個SVM。咱們最終要分類的花朵類別是兩類,所以咱們須要訓練的SVM數量爲2個。
SVM訓練所用的輸入爲RoI中提取到的feature map,所用的標籤共有n+1個類別(+1的爲背景),對應到咱們的數據集此時標籤共有三個類別。
相關代碼以下:
from sklearn import svm from sklearn.externals import joblib # Construct cascade svms def train_svms(train_file_folder, model): files = os.listdir(train_file_folder) svms = [] train_features = [] bbox_train_features = [] rects = [] for train_file in files: if train_file.split('.')[-1] == 'txt': X, Y, R = generate_single_svm_train(os.path.join(train_file_folder, train_file)) Y1 = [] features1 = [] features_hard = [] for ind, i in enumerate(X): # extract features 提取特徵 feats = model.predict([i]) train_features.append(feats[0]) # 全部正負樣本加入feature1,Y1 if Y[ind]>=0: Y1.append(Y[ind]) features1.append(feats[0]) # 對與groundtruth的iou>0.5的加入boundingbox訓練集 if Y[ind]>0: bbox_train_features.append(feats[0]) view_bar("extract features of %s" % train_file, ind + 1, len(X)) clf = svm.SVC(probability=True) clf.fit(features1, Y1) print(' ') print("feature dimension") print(np.shape(features1)) svms.append(clf) # 將clf序列化,保存svm分類器 joblib.dump(clf, os.path.join(train_file_folder, str(train_file.split('.')[0]) + '_svm.pkl')) # 保存boundingbox迴歸訓練集 np.save((os.path.join(train_file_folder, 'bbox_train.npy')), [bbox_train_features, rects]) return svms # Load training images def generate_single_svm_train(train_file): save_path = train_file.rsplit('.', 1)[0].strip() if len(os.listdir(save_path)) == 0: print("reading %s's svm dataset" % train_file.split('\\')[-1]) load_train_proposals(train_file, 2, save_path, threshold=0.3, is_svm=True, save=True) print("restoring svm dataset") images, labels,rects = load_from_npy_(save_path) return images, labels,rects # load data def load_from_npy_(data_set): images, labels ,rects= [], [], [] data_list = os.listdir(data_set) # random.shuffle(data_list) for ind, d in enumerate(data_list): i, l, r = np.load(os.path.join(data_set, d),allow_pickle=True) images.extend(i) labels.extend(l) rects.extend(r) view_bar("load data of %s" % d, ind + 1, len(data_list)) print(' ') return images, labels ,rects
迴歸器是線性的,輸入爲N對值,{(𝑃𝑖,𝐺𝑖)}𝑖=1,2,…,𝑁{(Pi,Gi)}i=1,2,…,N,分別爲候選區域的框座標和真實的框座標。相關代碼以下:
from sklearn.linear_model import Ridge #在圖片上顯示boundingbox def show_rect(img_path, regions): fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6)) img = skimage.io.imread(img_path) ax.imshow(img) for x, y, w, h in regions: rect = mpatches.Rectangle( (x, y), w, h, fill=False, edgecolor='red', linewidth=1) ax.add_patch(rect) plt.show() # 訓練boundingbox迴歸 def train_bbox(npy_path): features, rects = np.load((os.path.join(npy_path, 'bbox_train.npy')),allow_pickle=True) # 不能直接np.array(),應該把元素所有取出放入空列表中。由於features和rects創建時用的append,致使其中元素結構不能直接轉換成矩陣 X = [] Y = [] for ind, i in enumerate(features): X.append(i) X_train = np.array(X) for ind, i in enumerate(rects): Y.append(i) Y_train = np.array(Y) # 線性迴歸模型訓練 clf = Ridge(alpha=1.0) clf.fit(X_train, Y_train) # 序列化,保存bbox迴歸 joblib.dump(clf, os.path.join(npy_path,'bbox_train.pkl')) return clf
開始訓練SVM分類器與框體迴歸器。
train_file_folder = './svm_train' # 創建模型,網絡 net = create_alexnet() model = tflearn.DNN(net) # 加載微調後的alexnet網絡參數 model.load('./fine_tune_model/fine_tune_model_save.model') # 加載/訓練svm分類器 和 boundingbox迴歸器 svms = [] bbox_fit = [] # boundingbox迴歸器是否有存檔 bbox_fit_exit = 0 # 加載svm分類器和boundingbox迴歸器 for file in os.listdir(train_file_folder): if file.split('_')[-1] == 'svm.pkl': svms.append(joblib.load(os.path.join(train_file_folder, file))) if file == 'bbox_train.pkl': bbox_fit = joblib.load(os.path.join(train_file_folder, file)) bbox_fit_exit = 1 if len(svms) == 0: svms = train_svms(train_file_folder, model) if bbox_fit_exit == 0: bbox_fit = train_bbox(train_file_folder) print("Done fitting svms")
至此模型已訓練完畢。
讓咱們選擇一張圖片順着模型正向傳播的順序查看模型的具體運行效果。首先查看下region proposal所產生的RoI區域。
img_path = './2flowers/jpg/1/image_1282.jpg' image = cv2.imread(img_path) im_width = image.shape[1] im_height = image.shape[0] # 提取region proposal imgs, verts = image_proposal(img_path) show_rect(img_path, verts)
將RoI輸入ConvNet中獲得特徵並輸入SVMs中與迴歸器中,並選取SVM分類結果爲正例的樣例進行邊框迴歸。
# 從CNN中提取RoI的特徵 features = model.predict(imgs) print("predict image:") # print(np.shape(features)) results = [] results_label = [] results_score = [] count = 0 print(len(features)) for f in features: for svm in svms: pred = svm.predict([f.tolist()]) # not background if pred[0] != 0: # boundingbox迴歸 bbox = bbox_fit.predict([f.tolist()]) tx, ty, tw, th = bbox[0][0], bbox[0][1], bbox[0][2], bbox[0][3] px, py, pw, ph = verts[count] gx = tx * pw + px gy = ty * ph + py gw = math.exp(tw) * pw gh = math.exp(th) * ph if gx < 0: gw = gw - (0 - gx) gx = 0 if gx + gw > im_width: gw = im_width - gx if gy < 0: gh = gh - (0 - gh) gy = 0 if gy + gh > im_height: gh = im_height - gy results.append([gx, gy, gw, gh]) results_label.append(pred[0]) results_score.append(svm.predict_proba([f.tolist()])[0][1]) count += 1 print(results) print(results_label) print(results_score) show_rect(img_path, results)
能夠看到可能會獲得數量大於一的框體,此時咱們須要藉助NMS(Non-Maximum Suppression)來選擇出相對最優的結果。
代碼以下:
results_final = [] results_final_label = [] # 非極大抑制 # 刪除得分小於0.5的候選框 delete_index1 = [] for ind in range(len(results_score)): if results_score[ind] < 0.5: delete_index1.append(ind) num1 = 0 for idx in delete_index1: results.pop(idx - num1) results_score.pop(idx - num1) results_label.pop(idx - num1) num1 += 1 while len(results) > 0: # 找到列表中得分最高的 max_index = results_score.index(max(results_score)) max_x, max_y, max_w, max_h = results[max_index] max_vertice = [max_x, max_y, max_x + max_w, max_y + max_h, max_w, max_h] # 該候選框加入最終結果 results_final.append(results[max_index]) results_final_label.append(results_label[max_index]) # 從results中刪除該候選框 results.pop(max_index) results_label.pop(max_index) results_score.pop(max_index) # print(len(results_score)) # 刪除與得分最高候選框iou>0.5的其餘候選框 delete_index = [] for ind, i in enumerate(results): iou_val = IOU(i, max_vertice) if iou_val > 0.5: delete_index.append(ind) num = 0 for idx in delete_index: # print('\n') # print(idx) # print(len(results)) results.pop(idx - num) results_score.pop(idx - num) results_label.pop(idx - num) num += 1 print("result:",results_final) print("result label:",results_final_label) show_rect(img_path, results_final)
至此咱們獲得了一個粗糙的R-CNN模型。
R-CNN靈活地運用了當時比較先進的工具和技術,並充分吸取,根據本身的邏輯改造,最終取得了很大的進步。但其中也有很多明顯的缺點:
幸運的是,這些問題在後續的Fast R-CNN與Faster R-CNN都有了很大的改善。
https://momodel.cn/workspace/5f1ec0505607a4070d65203b?type=app