要學習目標檢測算法嗎?任何一個ML學習者都但願可以給圖像中的目標物體圈個漂亮的框框,在這篇文章中咱們將學習目標檢測中的一個基本概念:邊框迴歸/Bounding Box Regression。邊框迴歸並不複雜,可是即便像YOLO這樣頂尖的目標檢測器也使用了這一技術!git
咱們將使用Tensorflow的Keras API實現一個邊框迴歸模型。如今開始吧!若是你能夠訪問Google Colab的話,能夠訪問這裏。github
學編程,上匯智網,在線編程環境,一對一助教指導。算法
咱們將使用Kaggle.com上的這個圖像定位數據集,它包含了3類(黃瓜、茄子和蘑菇)共373個已經標註了目標邊框的圖像文件。咱們的目標是解析圖像並進行歸一化處理,同時從XML格式的標註文件中解析獲得目標物體包圍框的4個頂點的座標:編程
若是你但願建立本身的標註數據集也沒有問題!你可使用LabelImage。利用LabelImage你能夠快速標註目標物體的包圍邊框,而後保存爲PASCAL-VOC格式:app
學編程,上匯智網,在線編程環境,一對一助教指導。ide
首先咱們須要處理一下圖像。使用glob
包,咱們能夠列出後綴爲jpg的文件,逐個處理:函數
input_dim = 228 from PIL import Image , ImageDraw import os import glob images = [] image_paths = glob.glob( 'training_images/*.jpg' ) for imagefile in image_paths: image = Image.open( imagefile ).resize( ( input_dim , input_dim )) image = np.asarray( image ) / 255.0 images.append( image )
接下來咱們須要處理XML標註。標註文件的格式爲PASCAL-VOC。咱們使用xmltodict
包將XML文件轉換爲Python的字典對象:post
import xmltodict import os bboxes = [] classes_raw = [] annotations_paths = glob.glob( 'training_images/*.xml' ) for xmlfile in annotations_paths: x = xmltodict.parse( open( xmlfile , 'rb' ) ) bndbox = x[ 'annotation' ][ 'object' ][ 'bndbox' ] bndbox = np.array([ int(bndbox[ 'xmin' ]) , int(bndbox[ 'ymin' ]) , int(bndbox[ 'xmax' ]) , int(bndbox[ 'ymax' ]) ]) bndbox2 = [ None ] * 4 bndbox2[0] = bndbox[0] bndbox2[1] = bndbox[1] bndbox2[2] = bndbox[2] bndbox2[3] = bndbox[3] bndbox2 = np.array( bndbox2 ) / input_dim bboxes.append( bndbox2 ) classes_raw.append( x[ 'annotation' ][ 'object' ][ 'name' ] )
如今咱們準備訓練集和測試集:學習
from sklearn.preprocessing import LabelBinarizer from sklearn.model_selection import train_test_split boxes = np.array( bboxes ) encoder = LabelBinarizer() classes_onehot = encoder.fit_transform( classes_raw ) Y = np.concatenate( [ boxes , classes_onehot ] , axis=1 ) X = np.array( images ) x_train, x_test, y_train, y_test = train_test_split( X, Y, test_size=0.1 )
學編程,上匯智網,在線編程環境,一對一助教指導。測試
咱們首先爲模型定義一個損失函數和一個衡量指標。損失函數同時使用了平方差(MSE:Mean Squared Error)和交併比(IoU:Intersection over Union),指標則用來衡量模型的準確性同時輸出IoU得分:
IoU計算兩個邊框的交集與並集的比率:
Python實現代碼以下:
input_shape = ( input_dim , input_dim , 3 ) dropout_rate = 0.5 alpha = 0.2 def calculate_iou( target_boxes , pred_boxes ): xA = K.maximum( target_boxes[ ... , 0], pred_boxes[ ... , 0] ) yA = K.maximum( target_boxes[ ... , 1], pred_boxes[ ... , 1] ) xB = K.minimum( target_boxes[ ... , 2], pred_boxes[ ... , 2] ) yB = K.minimum( target_boxes[ ... , 3], pred_boxes[ ... , 3] ) interArea = K.maximum( 0.0 , xB - xA ) * K.maximum( 0.0 , yB - yA ) boxAArea = (target_boxes[ ... , 2] - target_boxes[ ... , 0]) * (target_boxes[ ... , 3] - target_boxes[ ... , 1]) boxBArea = (pred_boxes[ ... , 2] - pred_boxes[ ... , 0]) * (pred_boxes[ ... , 3] - pred_boxes[ ... , 1]) iou = interArea / ( boxAArea + boxBArea - interArea ) return iou def custom_loss( y_true , y_pred ): mse = tf.losses.mean_squared_error( y_true , y_pred ) iou = calculate_iou( y_true , y_pred ) return mse + ( 1 - iou ) def iou_metric( y_true , y_pred ): return calculate_iou( y_true , y_pred )
接下來咱們建立CNN模型。咱們堆疊幾個Conv2D層並拉平其輸出,而後送入後邊的全鏈接層。爲了不過擬合,咱們在全鏈接層使用Dropout,並使用LeakyReLU激活層:
num_classes = 3 pred_vector_length = 4 + num_classes model_layers = [ keras.layers.Conv2D(16, kernel_size=(3, 3), strides=1, input_shape=input_shape), keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.Conv2D(16, kernel_size=(3, 3), strides=1 ), keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) ), keras.layers.Conv2D(32, kernel_size=(3, 3), strides=1), keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.Conv2D(32, kernel_size=(3, 3), strides=1), keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) ), keras.layers.Conv2D(64, kernel_size=(3, 3), strides=1), keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.Conv2D(64, kernel_size=(3, 3), strides=1), keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) ), keras.layers.Conv2D(128, kernel_size=(3, 3), strides=1), keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.Conv2D(128, kernel_size=(3, 3), strides=1), keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) ), keras.layers.Conv2D(256, kernel_size=(3, 3), strides=1), keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.Conv2D(256, kernel_size=(3, 3), strides=1), keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) ), keras.layers.Flatten() , keras.layers.Dense( 1240 ) , keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.Dense( 640 ) , keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.Dense( 480 ) , keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.Dense( 120 ) , keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.Dense( 62 ) , keras.layers.LeakyReLU( alpha=alpha ) , keras.layers.Dense( pred_vector_length ), keras.layers.LeakyReLU( alpha=alpha ) , ] model = keras.Sequential( model_layers ) model.compile( optimizer=keras.optimizers.Adam( lr=0.0001 ), loss=custom_loss, metrics=[ iou_metric ] )
如今能夠開始訓練了:
model.fit( x_train , y_train , validation_data=( x_test , y_test ), epochs=100 , batch_size=3 )model.save( 'model.h5')
如今咱們的模型已經訓練好了,能夠用它來檢測一些測試圖像並繪製檢測出的對象的邊框,而後把結果圖像保存下來。
!mkdir -v inference_images boxes = model.predict( x_test ) for i in range( boxes.shape[0] ): b = boxes[ i , 0 : 4 ] * input_dim img = x_test[i] * 255 source_img = Image.fromarray( img.astype( np.uint8 ) , 'RGB' ) draw = ImageDraw.Draw( source_img ) draw.rectangle( b , outline="black" ) source_img.save( 'inference_images/image_{}.png'.format( i + 1 ) , 'png' )
下面是檢測結果圖示例:
要決定測試集上的IOU得分,同時計算分類準確率,咱們使用以下的代碼:
xA = np.maximum( target_boxes[ ... , 0], pred_boxes[ ... , 0] ) yA = np.maximum( target_boxes[ ... , 1], pred_boxes[ ... , 1] ) xB = np.minimum( target_boxes[ ... , 2], pred_boxes[ ... , 2] ) yB = np.minimum( target_boxes[ ... , 3], pred_boxes[ ... , 3] ) interArea = np.maximum(0.0, xB - xA ) * np.maximum(0.0, yB - yA ) boxAArea = (target_boxes[ ... , 2] - target_boxes[ ... , 0]) * (target_boxes[ ... , 3] - target_boxes[ ... , 1]) boxBArea = (pred_boxes[ ... , 2] - pred_boxes[ ... , 0]) * (pred_boxes[ ... , 3] - pred_boxes[ ... , 1]) iou = interArea / ( boxAArea + boxBArea - interArea ) return iou def class_accuracy( target_classes , pred_classes ): target_classes = np.argmax( target_classes , axis=1 ) pred_classes = np.argmax( pred_classes , axis=1 ) return ( target_classes == pred_classes ).mean() target_boxes = y_test * input_dim pred = model.predict( x_test ) pred_boxes = pred[ ... , 0 : 4 ] * input_dim pred_classes = pred[ ... , 4 : ] iou_scores = calculate_avg_iou( target_boxes , pred_boxes ) print( 'Mean IOU score {}'.format( iou_scores.mean() ) ) print( 'Class Accuracy is {} %'.format( class_accuracy( y_test[ ... , 4 : ] , pred_classes ) * 100 ))