兩週多的努力總算寫出了RCNN的代碼,這段代碼很是有意思,而且還順帶複習了幾個Tensorflow應用方面的知識點,故特此總結下,帶你們分享下經驗。理論方面,RCNN的理論教程頗多,這裏我不在作詳盡說明,有興趣的朋友能夠看看這個博客以瞭解大概。python
系統概況git
RCNN的邏輯基於Alexnet模型。爲增長模型的物體辨識率,在圖片未經CNN處理前,先由傳統算法(文中所用算法爲Selective Search算法)取得大概2000左右的疑似物品框。以後,這些疑似框被導入CNN系統中以取得輸出層前一層的特徵後,由訓練好的svm來區分物體。這之中,比較有意思的部分包括了對通過ImageNet訓練後的Alexnet的fine tune,對fine tune後框架裏輸出層前的最後一層特徵點的提取以及訓練svm分類器。下面,讓咱們來看看如何實現這個模型吧!github
代碼解析算法
爲方便編寫,這裏應用了tflearn庫做爲tensorflow的一個wrapper來編寫Alexnet,關於tflearn,具體資料請點擊這裏查看其官網。數據庫
那麼下面,讓咱們先來看看系統流程:網絡
第一步,訓練Alexnet,這裏咱們運用的是github上tensorflow-alexnet項目。該項目將Alexnet運用在學習flower17數據庫上,說白了也就是區分不一樣種類的花的項目。github提供的代碼全部功能做者都有認真的寫出,不過在main的寫做以及對模型是否支持在斷點處繼續訓練等問題上做者並沒寫明,這裏貼上個人代碼:app
def train(network, X, Y): # Training model = tflearn.DNN(network, checkpoint_path='model_alexnet', max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output') # 這裏增長了讀取存檔的模式。若是已經有保存了的模型,咱們固然就讀取它而後繼續 # 訓練了啊! if os.path.isfile('model_save.model'): model.load('model_save.model') model.fit(X, Y, n_epoch=100, validation_set=0.1, shuffle=True, show_metric=True, batch_size=64, snapshot_step=200, snapshot_epoch=False, run_id='alexnet_oxflowers17') # epoch = 1000 # Save the model # 這裏是保存已經運算好了的模型 model.save('model_save.model')
同時,咱們但願能夠檢測模型是否運做正常。如下是檢測Alexnet用代碼框架
# 預處理圖片函數: # ------------------------------------------------------------------------------------------------ # 首先,讀取圖片,造成一個Image文件 def load_image(img_path): img = Image.open(img_path) return img # 將Image文件給修改爲224 * 224的圖片大小(固然,RGB三個頻道咱們保持不變) def resize_image(in_image, new_width, new_height, out_image=None, resize_mode=Image.ANTIALIAS): img = in_image.resize((new_width, new_height), resize_mode) if out_image: img.save(out_image) return img # 將Image加載後轉換成float32格式的tensor def pil_to_nparray(pil_image): pil_image.load() return np.asarray(pil_image, dtype="float32") # 網絡框架函數: # ------------------------------------------------------------------------------------------------ def create_alexnet(num_classes): # Building 'AlexNet' network = input_data(shape=[None, 224, 224, 3]) network = conv_2d(network, 96, 11, strides=4, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = conv_2d(network, 256, 5, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = conv_2d(network, 384, 3, activation='relu') network = conv_2d(network, 384, 3, activation='relu') network = conv_2d(network, 256, 3, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = fully_connected(network, 4096, activation='tanh') network = dropout(network, 0.5) network = fully_connected(network, 4096, activation='tanh') network = dropout(network, 0.5) network = fully_connected(network, num_classes, activation='softmax') network = regression(network, optimizer='momentum', loss='categorical_crossentropy', learning_rate=0.001) return network # 咱們就是用這個函數來推斷輸入圖片的類別的 def predict(network, modelfile,images): model = tflearn.DNN(network) model.load(modelfile) return model.predict(images) if __name__ == '__main__': img_path = 'testimg7.jpg' imgs = [] img = load_image(img_path) img = resize_image(img, 224, 224) imgs.append(pil_to_nparray(img)) net = create_alexnet(17) predicted = predict(net, 'model_save.model',imgs) print(predicted)
到此爲止,咱們跟RCNN尚未直接的關係。不過,值得注意的是,咱們以前保存的那個訓練模型model_save.model文件就是咱們預訓練的Alexnet。那麼下面,咱們開始正式製做RCNN系統了,讓咱們先編寫傳統的框架proposal代碼吧。ide
鑑於文中運用的算法是selective search, 對這個算法我我的沒有太接觸過,因此從頭編寫很是耗時。這裏我偷了個懶,運用python現成的庫selectivesearch去完成,那麼,預處理代碼的重心就在另外一個概念上了,即IOU, interection or union概念。這個概念之因此在這裏頗有用是由於一張圖片咱們人爲的去標註每每只爲途中的某同樣物體進行了標註,其他的咱們所有算做背景了。在這個概念下,若是電腦一次性選擇了許多可能物品框,咱們如何決定哪一個框對應這物體呢?對於徹底不重疊的方框咱們天然認爲其標註的不是物體而是背景,可是對於那些重疊的方框怎麼分類呢?咱們這裏便使用了IOU概念,即重疊數值超過一個閥門數值咱們便將其標註爲該物體類別,其餘狀況下咱們均標註該方框爲背景。更加詳細的講解請點擊這裏。函數
那麼在代碼上咱們如何實現這個IOU呢?
# IOU Part 1 def if_intersection(xmin_a, xmax_a, ymin_a, ymax_a, xmin_b, xmax_b, ymin_b, ymax_b): if_intersect = False # 經過四條if來查看兩個方框是否有交集。若是四種情況都不存在,咱們視爲無交集 if xmin_a < xmax_b <= xmax_a and (ymin_a < ymax_b <= ymax_a or ymin_a <= ymin_b < ymax_a): if_intersect = True elif xmin_a <= xmin_b < xmax_a and (ymin_a < ymax_b <= ymax_a or ymin_a <= ymin_b < ymax_a): if_intersect = True elif xmin_b < xmax_a <= xmax_b and (ymin_b < ymax_a <= ymax_b or ymin_b <= ymin_a < ymax_b): if_intersect = True elif xmin_b <= xmin_a < xmax_b and (ymin_b < ymax_a <= ymax_b or ymin_b <= ymin_a < ymax_b): if_intersect = True else: return False # 在有交集的狀況下,咱們經過大小關係整理兩個方框各自的四個頂點, 經過它們獲得交集面積 if if_intersect == True: x_sorted_list = sorted([xmin_a, xmax_a, xmin_b, xmax_b]) y_sorted_list = sorted([ymin_a, ymax_a, ymin_b, ymax_b]) x_intersect_w = x_sorted_list[2] - x_sorted_list[1] y_intersect_h = y_sorted_list[2] - y_sorted_list[1] area_inter = x_intersect_w * y_intersect_h return area_inter # IOU Part 2 def IOU(ver1, vertice2): # vertices in four points # 整理輸入頂點 vertice1 = [ver1[0], ver1[1], ver1[0]+ver1[2], ver1[1]+ver1[3]] area_inter = if_intersection(vertice1[0], vertice1[2], vertice1[1], vertice1[3], vertice2[0], vertice2[2], vertice2[1], vertice2[3]) # 若是有交集,計算IOU if area_inter: area_1 = ver1[2] * ver1[3] area_2 = vertice2[4] * vertice2[5] iou = float(area_inter) / (area_1 + area_2 - area_inter) return iou return False
以後,咱們即可以在fine tune Alexnet時以0.5爲IOU的threthold, 並在訓練SVM時以0.3爲threthold。達成該思惟的函數以下:
# Read in data and save data for Alexnet def load_train_proposals(datafile, num_clss, threshold = 0.5, svm = False, save=False, save_path='dataset.pkl'): train_list = open(datafile,'r') labels = [] images = [] for line in train_list: tmp = line.strip().split(' ') # tmp0 = image address # tmp1 = label # tmp2 = rectangle vertices img = skimage.io.imread(tmp[0]) # python的selective search函數 img_lbl, regions = selectivesearch.selective_search(img, scale=500, sigma=0.9, min_size=10) candidates = set() for r in regions: # excluding same rectangle (with different segments) # 剔除重複的方框 if r['rect'] in candidates: continue # 剔除過小的方框 if r['size'] < 220: continue # resize to 224 * 224 for input # 重整方框的大小 proposal_img, proposal_vertice = clip_pic(img, r['rect']) # Delete Empty array # 若是截取後的圖片爲空,剔除 if len(proposal_img) == 0: continue # Ignore things contain 0 or not C contiguous array x, y, w, h = r['rect'] # 長或寬爲0的方框,剔除 if w == 0 or h == 0: continue # Check if any 0-dimension exist # image array的dim裏有0的,剔除 [a, b, c] = np.shape(proposal_img) if a == 0 or b == 0 or c == 0: continue im = Image.fromarray(proposal_img) resized_proposal_img = resize_image(im, 224, 224) candidates.add(r['rect']) img_float = pil_to_nparray(resized_proposal_img) images.append(img_float) # 計算IOU ref_rect = tmp[2].split(',') ref_rect_int = [int(i) for i in ref_rect] iou_val = IOU(ref_rect_int, proposal_vertice) # labels, let 0 represent default class, which is background index = int(tmp[1]) if svm == False: label = np.zeros(num_clss+1) if iou_val < threshold: label[0] = 1 else: label[index] = 1 labels.append(label) else: if iou_val < threshold: labels.append(0) else: labels.append(index) if save: pickle.dump((images, labels), open(save_path, 'wb')) return images, labels
須要注意的是,這裏輸入參數的svm當爲True時咱們便不須要用one hot的方式表達label了。
在預處理了輸入圖片後,咱們須要用預處理後的圖片集來fine tune Alexnet。
# Use a already trained alexnet with the last layer redesigned # 這裏定義了咱們的Alexnet的fine tune框架。按照原文,咱們須要丟棄alexnet的最後一層,即softmax # 而後換上一層新的softmax專門針對新的預測的class數+1(由於多出了個背景class)。具體方法爲設 # restore爲False,這樣在最後一層softmax處,我不restore任何數值。 def create_alexnet(num_classes, restore=False): # Building 'AlexNet' network = input_data(shape=[None, 224, 224, 3]) network = conv_2d(network, 96, 11, strides=4, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = conv_2d(network, 256, 5, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = conv_2d(network, 384, 3, activation='relu') network = conv_2d(network, 384, 3, activation='relu') network = conv_2d(network, 256, 3, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = fully_connected(network, 4096, activation='tanh') network = dropout(network, 0.5) network = fully_connected(network, 4096, activation='tanh') network = dropout(network, 0.5) network = fully_connected(network, num_classes, activation='softmax', restore=restore) network = regression(network, optimizer='momentum', loss='categorical_crossentropy', learning_rate=0.001) return network # 這裏,咱們的訓練從已經訓練好的alexnet開始,即model_save.model開始讀取。在訓練後,咱們 # 將訓練資料收錄到fine_tune_model_save.model裏 def fine_tune_Alexnet(network, X, Y): # Training model = tflearn.DNN(network, checkpoint_path='rcnn_model_alexnet', max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output_RCNN') if os.path.isfile('fine_tune_model_save.model'): print("Loading the fine tuned model") model.load('fine_tune_model_save.model') elif os.path.isfile('model_save.model'): print("Loading the alexnet") model.load('model_save.model') else: print("No file to load, error") return False model.fit(X, Y, n_epoch=10, validation_set=0.1, shuffle=True, show_metric=True, batch_size=64, snapshot_step=200, snapshot_epoch=False, run_id='alexnet_rcnnflowers2') # epoch = 1000 # Save the model model.save('fine_tune_model_save.model')
運用這兩個函數可完成對Alexnet的fine tune。到此爲止,咱們完成了對Alexnet的直接運用,接下來,咱們須要讀取alexnet最後一層特徵並用以訓練svm。那麼,咱們怎麼取得圖片的feature呢?方法很簡單,咱們減去輸出層便可。代碼以下:
# Use a already trained alexnet with the last layer redesigned def create_alexnet(num_classes, restore=False): # Building 'AlexNet' network = input_data(shape=[None, 224, 224, 3]) network = conv_2d(network, 96, 11, strides=4, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = conv_2d(network, 256, 5, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = conv_2d(network, 384, 3, activation='relu') network = conv_2d(network, 384, 3, activation='relu') network = conv_2d(network, 256, 3, activation='relu') network = max_pool_2d(network, 3, strides=2) network = local_response_normalization(network) network = fully_connected(network, 4096, activation='tanh') network = dropout(network, 0.5) network = fully_connected(network, 4096, activation='tanh') network = regression(network, optimizer='momentum', loss='categorical_crossentropy', learning_rate=0.001) return network
在獲得features後,咱們須要訓練SVM。爲什麼要訓練SVM呢?直接用CNN的softmax就好不就是麼?這個問題在以前說起的博客裏有說起。簡而言之,SVM適用於小樣本訓練,這裏這麼作能夠提升準確率。訓練SVM的代碼以下:
# Construct cascade svms def train_svms(train_file_folder, model): # 這裏,咱們將不一樣的訓練集合分配到不一樣的txt文件裏,每個文件只含有一個種類 listings = os.listdir(train_file_folder) svms = [] for train_file in listings: if "pkl" in train_file: continue # 獲得訓練單一種類SVM的數據。 X, Y = generate_single_svm_train(train_file_folder+train_file) train_features = [] for i in X: feats = model.predict([i]) train_features.append(feats[0]) print("feature dimension") print(np.shape(train_features)) # 這裏創建一個Cascade的SVM以區分全部物體 clf = svm.LinearSVC() print("fit svm") clf.fit(train_features, Y) svms.append(clf) return svms
在識別物體的時候,咱們該怎麼作呢?首先,咱們經過一下函數獲得輸入圖片的疑似物體框:
def image_proposal(img_path): img = skimage.io.imread(img_path) img_lbl, regions = selectivesearch.selective_search( img, scale=500, sigma=0.9, min_size=10) candidates = set() images = [] vertices = [] for r in regions: # excluding same rectangle (with different segments) if r['rect'] in candidates: continue if r['size'] < 220: continue # resize to 224 * 224 for input proposal_img, proposal_vertice = prep.clip_pic(img, r['rect']) # Delete Empty array if len(proposal_img) == 0: continue # Ignore things contain 0 or not C contiguous array x, y, w, h = r['rect'] if w == 0 or h == 0: continue # Check if any 0-dimension exist [a, b, c] = np.shape(proposal_img) if a == 0 or b == 0 or c == 0: continue im = Image.fromarray(proposal_img) resized_proposal_img = resize_image(im, 224, 224) candidates.add(r['rect']) img_float = pil_to_nparray(resized_proposal_img) images.append(img_float) vertices.append(r['rect']) return images, vertices
該過程與預處理中函數相似,不過更簡單,由於咱們不須要考慮對應的label了。以後,咱們將這些圖片一個一個的輸入網絡以獲得相對輸出(其實能夠一塊兒作,不過個人電腦老是kill了,多是內存或者其餘問題吧),最後,應用cascaded的SVM就能夠獲得預測結果了。
你們對於試驗結果必定很好奇。如下結果是對比了Alexnet和RCNN的運行結果。
首先,讓咱們來看看對於如下圖片的結果:
對它的分析結果以下:在Alexnet的狀況下,獲得瞭如下數據:
判斷爲第四類花。實際結果在flower 17數據庫中是最後一類,也就是第17類花。這裏,第17類花的可能性僅次於第四類,爲34%。那麼,RCNN的結果如何呢?咱們看下圖:
顯而易見,RCNN的正確率(1類)很是之高。對於感興趣的朋友,請看點擊這裏察看代碼。