SSD 車輛檢測　實現

時間 2020-05-23

標籤 ssd 車輛檢測實現欄目存儲简体版

原文原文鏈接

part1 模型實現部部分ios

１咱們使用builde_model 來實現模型的構建　es6

def build_model(image_size, n_classes, mode='training', l2_regularization=0.0, min_scale=0.1, max_scale=0.9, scales=None, aspect_ratios_global=[0.5, 1.0, 2.0], aspect_ratios_per_layer=None, two_boxes_for_ar1=True, steps=None, offsets=None, clip_boxes=False, variances=[1.0, 1.0, 1.0, 1.0], coords='centroids', normalize_coords=False, subtract_mean=None, divide_by_stddev=None, swap_channels=False, confidence_thresh=0.01, iou_threshold=0.45, top_k=200, nms_max_output_size=400, return_predictor_sizes=False):

解釋一下這個函數的參數app

Arguments: image_size (tuple): The input image size in the format `(height, width, channels)`. n_classes (int): The number of positive classes: 5. mode (str, optional): training:相對座標 inference：絕對座標 One of 'training', 'inference' and 'inference_fast'. In 'training' mode, the model outputs the raw prediction tensor, while in 'inference' and 'inference_fast' modes, the raw predictions are decoded into absolute coordinates and filtered via confidence thresholding, non-maximum suppression, and top-k filtering. The difference between latter two modes is that 'inference' follows the exact procedure of the original Caffe implementation, while
'inference_fast' uses a faster prediction decoding procedure. l2_regularization (float, optional): The L2-regularization rate. Applies to all convolutional layers. min_scale (float, optional): minS 和 maxS The smallest scaling factor for the size of the anchor boxes as a fraction of the shorter side of the input images. max_scale (float, optional): The largest scaling factor for the size of the anchor boxes as a fraction of the shorter side of the input images. All scaling factors between the smallest and the largest will be linearly interpolated. Note that the second to last of the linearly interpolated scaling factors will actually be the scaling factor for the last predictor layer, while the last scaling factor is used for the second box for aspect ratio 1 in the last predictor layer if `two_boxes_for_ar1` is `True`. scales (list, optional): 一個list 包含每一層的anchor 佔比 A list of floats containing scaling factors per convolutional predictor layer. This list must be one element longer than the number of predictor layers. The first `k` elements are the scaling factors for the `k` predictor layers, while the last element is used for the second box for aspect ratio 1 in the last predictor layer if `two_boxes_for_ar1` is `True`. This additional last scaling factor must be passed either way, even if it is not being used. If a list is passed, this argument overrides `min_scale` and `max_scale`. All scaling factors must be greater than zero. aspect_ratios_global (list, optional): 全局長寬比 The list of aspect ratios for which anchor boxes are to be generated. This list is valid for all predictor layers. The original implementation uses more aspect ratios for some predictor layers and fewer for others. If you want to do that, too, then use the next argument instead. aspect_ratios_per_layer (list, optional): 每一層長寬比設定 A list containing one aspect ratio list for each predictor layer. This allows you to set the aspect ratios for each predictor layer individually. If a list is passed, it overrides `aspect_ratios_global`. two_boxes_for_ar1 (bool, optional): 在長寬比爲1的狀況下 是否考慮使用2個box 第二個的比例 是取下一層的 Ｓ與當前層S 來計算集合平均值 如第一層s = 0.2 第二層s = 0.34 Only relevant for aspect ratio lists that contain 1. Will be ignored otherwise. If `True`, two anchor boxes will be generated for aspect ratio 1. The first will be generated using the scaling factor for the respective layer, the second one will be generated using geometric mean of said scaling factor and next bigger scaling factor. steps (list, optional): 前一像素個點的anchor 到 下一像素個點 滑動的步長 默認爲 輸入原圖 如300*300 除以 當前feature map 長寬 如 10*10 `None` or a list with as many elements as there are predictor layers. The elements can be either ints/floats or tuples of two ints/floats. These numbers represent for each predictor layer how many pixels apart the anchor box center points should be vertically and horizontally along the spatial grid over the image. If the list contains ints/floats, then that value will be used for both spatial dimensions. If the list contains tuples of two ints/floats, then they represent `(step_height, step_width)`. If no steps are provided, then they will be computed such that the anchor box center points will form an equidistant grid within the image dimensions. offsets (list, optional): 起始anchor 的 中心位置 可設置爲None `None` or a list with as many elements as there are predictor layers. The elements can be either floats or tuples of two floats. These numbers represent for each predictor layer how many pixels from the top and left boarders of the image the top-most and left-most anchor box center points should be as a fraction of `steps`. The last bit is important: The offsets are not absolute pixel values, but fractions of the step size specified in the `steps` argument. If the list contains floats, then that value will be used for both spatial dimensions. If the list contains tuples of two floats, then they represent `(vertical_offset, horizontal_offset)`. If no offsets are provided, then they will default to 0.5 of the step size, which is also the recommended setting. clip_boxes (bool, optional): 是否對超出邊界的anchor 進行剪切操做 默認爲False 剪切效果不是很好 If `True`, clips the anchor box coordinates to stay within image boundaries. variances (list, optional): 默認爲１ 做者設定的值 A list of 4 floats >0. The anchor box offset for each coordinate will be divided by its respective variance value. coords (str, optional): 標定框表示的形式 'centroids' for the format `(cx, cy, w, h)` 'minmax' for the format `(xmin, xmax, ymin, ymax)` 'corners' for the format `(xmin, ymin, xmax, ymax) The box coordinate format to be used internally by the model (i.e. this is not the input format of the ground truth labels). Can be either 'centroids' for the format `(cx, cy, w, h)` (box center coordinates, width, and height), 'minmax' for the format `(xmin, xmax, ymin, ymax)`, or 'corners' for the format `(xmin, ymin, xmax, ymax)`. normalize_coords (bool, optional): 是否使用歸一化的形式來表示像素的座標 Set to `True` if the model is supposed to use relative instead of absolute coordinates, i.e. if the model predicts box coordinates within [0,1] instead of absolute coordinates. subtract_mean (array-like, optional): 均值化 把圖像像素變爲【-127,+127】 `None` or an array-like object of integers or floating point values of any shape that is broadcast-compatible with the image shape. The elements of this array will be subtracted from the image pixel intensity values. For example, pass a list of three integers to perform per-channel mean normalization for color images. divide_by_stddev (array-like, optional): 歸一化 均值化以後縮放到0,1 之間 或正負0.5之間 `None` or an array-like object of non-zero integers or floating point values of any shape that is broadcast-compatible with the image shape. The image pixel intensity values will be divided by the elements of this array. For example, pass a list of three integers to perform per-channel standard deviation normalization for color images. swap_channels (list, optional): 對通道 進行操做 默認爲False Either `False` or a list of integers representing the desired order in which the input image channels should be swapped.

image_size：模型的輸入形狀

n_classes：檢測的類別送數

mode：　training:相對座標 inference：絕對座標　相對座標就是相對於整個feature 的比例值　絕對目標就真實的像素值

 One of 'training', 'inference' and 'inference_fast'. In 'training' mode,


l2_regularization：ｌ2正則化的係數

min_scale (float, optional):

max_scale (float, optional)

minS 和 maxS　論文中　anchor 現對於　featuremap的　比例值　　一個是最大值　一個是最小值　最小值是第一個預測層的比例　最大值是最後一層預測層的比例

scales (list, optional):一個list 包含每一層的anchor 佔比

aspect_ratios_global (list, optional): 全局長寬比

aspect_ratios_per_layer (list, optional): 每一層長寬比設定

two_boxes_for_ar1 (bool, optional): 在長寬比爲1的狀況下 是否考慮使用2個box 第二個的比例 是取下一層的 Ｓ與當前層S 來計算集合平均值 如第一層s = 0.2 第二層s = 0.34

steps (list, optional): 前一像素個點的anchor 到　下一像素個點　滑動的步長

offsets (list, optional): 起始anchor 的　中心位置　可設置爲None

clip_boxes (bool, optional): 是否對超出邊界的anchor 進行剪切操做　默認爲False 剪切效果不是很好

 variances (list, optional): 默認爲１　做者設定的值

coords (str, optional): 標定框表示的形式 'centroids' for the format `(cx, cy, w, h)` 'minmax' for the format `(xmin, xmax, ymin, ymax)` 'corners' for the format `(xmin, ymin, xmax, ymax)

normalize_coords (bool, optional): 是否使用歸一化的形式來表示像素的座標

divide_by_stddev (array-like, optional): 歸一化　均值化以後縮放到0,1 之間　或正負0.5之間

divide_by_stddev (array-like, optional): 歸一化　均值化以後縮放到0,1 之間　或正負0.5之間

swap_channels (list, optional): 對通道　進行操做　默認爲False

confidence_thresh (float, optional): 檢測目標的機率的閾值　達到這個閾值認爲　正確檢測到了　目標 A float in [0,1), the minimum classification confidence in a specific positive class in order to be considered for the non-maximum suppression stage for the respective class. A lower value will result in a larger part of the selection process being done by the non-maximum suppression stage, while a larger value will result in a larger part of the selection process happening in the confidence thresholding stage. iou_threshold (float, optional): 非極大值抑制　中的iou操做表示兩個anchor 的重合區域　咱們把　兩個anchor iou 超過這個閾值咱們認爲檢測到了同一個目標 A float in [0,1]. All boxes that have a Jaccard similarity of greater than `iou_threshold` with a locally maximal box will be removed from the set of predictions for a given class, where 'maximal' refers to the box's confidence score.
 top_k (int, optional): 保留機率最高的ｋ個邊界框　　假如一共檢測３個　　能夠設定３ The number of highest scoring predictions to be kept for each batch item after the non-maximum suppression stage. nms_max_output_size (int, optional): The maximal number of predictions that will be left over after the NMS stage. return_predictor_sizes (bool, optional): 返回沒個輸出層的特徵圖大小　可用於調試 If `True`, this function not only returns the model, but also a list containing the spatial dimensions of the predictor layers. This isn't strictly necessary since
            you can always get their sizes easily via the Keras API, but it's convenient and less error-prone
            to get them this way. They are only relevant for training anyway (SSDBoxEncoder needs to know the spatial dimensions of the predictor layers), for inference you don't need them.

part2模型的實現less

conv1 = Conv2D(32, (5,5), strides=(1,1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv1')(x1) conv1 = BatchNormalization(axis=3, momentum=0.99, name='bn1')(conv1) conv1 = ELU(name='elu1')(conv1)#指數線性單元
    poo11 = MaxPooling2D(pool_size=(2,2), name='pool1')(conv1) conv2 = Conv2D(48, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv2')(poo11) conv2 = BatchNormalization(axis=3, momentum=0.99, name='bn2')(conv2) conv2 = ELU(name='elu2')(conv2)  # 指數線性單元
    poo12 = MaxPooling2D(pool_size=(2, 2), name='pool2')(conv2) conv3 = Conv2D(64, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv3')(poo12) conv3 = BatchNormalization(axis=3, momentum=0.99, name='bn3')(conv3) conv3 = ELU(name='elu3')(conv3)  # 指數線性單元
    poo13 = MaxPooling2D(pool_size=(2, 2), name='pool3')(conv3) conv4 = Conv2D(64, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv4')(poo13) conv4 = BatchNormalization(axis=3, momentum=0.99, name='bn4')(conv4) conv4 = ELU(name='elu4')(conv4)  # 指數線性單元
    poo14 = MaxPooling2D(pool_size=(2, 2), name='pool4')(conv4) conv5 = Conv2D(48, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv5')(poo14) conv5 = BatchNormalization(axis=3, momentum=0.99, name='bn5')(conv5) conv5 = ELU(name='elu5')(conv5)  # 指數線性單元
    poo15 = MaxPooling2D(pool_size=(2, 2), name='pool5')(conv5) conv6 = Conv2D(48, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv6')(poo15) conv6 = BatchNormalization(axis=3, momentum=0.99, name='bn6')(conv6) conv6 = ELU(name='elu6')(conv6)  # 指數線性單元
    poo16 = MaxPooling2D(pool_size=(2, 2), name='pool6')(conv6) conv7 = Conv2D(32, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv7')(poo16) conv7 = BatchNormalization(axis=3, momentum=0.99, name='bn7')(conv7) conv7 = ELU(name='elu7')(conv7)  # 指數線性單元
    poo17 = MaxPooling2D(pool_size=(2, 2), name='pool7')(conv7)

一共７層卷積層用於特徵提取dom

其中４，５，６，７用於ｓｓｄ的檢測ide

#分類
    classes4 = Conv2D(n_boxes[0] * n_classes, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes4')(conv4) classes5 = Conv2D(n_boxes[1] * n_classes, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes5')(conv5) classes6 = Conv2D(n_boxes[2] * n_classes, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes6')(conv6) classes7 = Conv2D(n_boxes[3] * n_classes, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes7')(conv7) #定位
    boxes4 = Conv2D(n_boxes[0] * 4, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes4')(conv4) boxes5 = Conv2D(n_boxes[1] * 4, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes5')(conv5) boxes6 = Conv2D(n_boxes[2] * 4, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes6')(conv6) boxes7 = Conv2D(n_boxes[3] * 4, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes7')(conv7) #生成anchor box
    anchors4 = AnchorBoxes(img_height, img_width, this_scale=scales[0], next_scale=scales[1], aspect_ratios=aspect_ratios[0], two_boxes_for_ar1= two_boxes_for_ar1, this_steps=steps[0], this_offsets=offsets[0], clip_boxes=clip_boxes, variances= variances, coords=coords, normalize_coords=normalize_coords, name='anchors4')(boxes4) anchors5 = AnchorBoxes(img_height, img_width, this_scale=scales[1], next_scale=scales[2], aspect_ratios=aspect_ratios[1], two_boxes_for_ar1= two_boxes_for_ar1, this_steps=steps[1], this_offsets=offsets[1], clip_boxes=clip_boxes, variances= variances, coords=coords, normalize_coords=normalize_coords, name='anchors5')(boxes5) anchors6 = AnchorBoxes(img_height, img_width, this_scale=scales[2], next_scale=scales[3], aspect_ratios=aspect_ratios[2], two_boxes_for_ar1= two_boxes_for_ar1, this_steps=steps[2], this_offsets=offsets[2], clip_boxes=clip_boxes, variances= variances, coords=coords, normalize_coords=normalize_coords, name='anchors6')(boxes6) anchors7 = AnchorBoxes(img_height, img_width, this_scale=scales[3], next_scale=scales[4], aspect_ratios=aspect_ratios[3], two_boxes_for_ar1= two_boxes_for_ar1, this_steps=steps[3], this_offsets=offsets[3], clip_boxes=clip_boxes, variances= variances, coords=coords, normalize_coords=normalize_coords, name='anchors7')(boxes7)

#softmax loss
    classes_softmax = Activation('softmax', name='classes_softmax')(classes_concat) #total loss
    predictions = Concatenate(axis=2, name='predictions')([classes_softmax, boxes_concat, anchors_concat])

２訓練部分函數

模型參數測試

img_height = 300 # 圖像的高度
img_width = 480 # 圖像的寬度
img_channels = 3 # 圖像的通道數
intensity_mean = 127.5 # 用於圖像歸一化, 將像素值轉爲 `[-1,1]`
intensity_range = 127.5 # 用於圖像歸一化, 將像素值轉爲 `[-1,1]`
n_classes = 5 # 正樣本的類別 (不包括背景)
scales = [0.08, 0.16, 0.32, 0.64, 0.96] # Anchor 的 scaling factors. 若是設置了這個值, 那麼 `min_scale` 和 `max_scale` 會被忽略
aspect_ratios = [0.5, 1.0, 2.0] # 每個 Anchor 的長寬比
two_boxes_for_ar1 = True # 是否產生兩個爲長寬比爲 1 的 Anchor
steps = None # 能夠手動設置 Anchor 的步長, 不建議使用
offsets = None # 能夠手動設置左上角 Anchor 的偏置, 不建議使用
clip_boxes = False # 是否將 Anchor 剪切到圖像邊界範圍內 
variances = [1.0, 1.0, 1.0, 1.0] # 能夠將目標的座標 scale 的參數, 建議保留 1.0
normalize_coords = True # 是否使用相對於圖像尺寸的相對座標

loss設定ui

adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0) ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0) model.compile(optimizer=adam, loss=ssd_loss.compute_loss)

圖像加強this

data_augmentation_chain = DataAugmentationConstantInputSize(random_brightness=(-48, 48, 0.5), random_contrast=(0.5, 1.8, 0.5), random_saturation=(0.5, 1.8, 0.5), random_hue=(18, 0.5), random_flip=0.5, random_translate=((0.03,0.5), (0.03,0.5), 0.5), random_scale=(0.5, 2.0, 0.5), n_trials_max=3, clip_boxes=True, overlap_criterion='area', bounds_box_filter=(0.3, 1.0), bounds_validator=(0.5, 1.0), n_boxes_min=1, background=(0,0,0))

encorder 操做

predictor_sizes = [model.get_layer('classes4').output_shape[1:3], model.get_layer('classes5').output_shape[1:3], model.get_layer('classes6').output_shape[1:3], model.get_layer('classes7').output_shape[1:3]] ssd_input_encoder = SSDInputEncoder(img_height=img_height, img_width=img_width, n_classes=n_classes, predictor_sizes=predictor_sizes, scales=scales, aspect_ratios_global=aspect_ratios, two_boxes_for_ar1=two_boxes_for_ar1, steps=steps, offsets=offsets, clip_boxes=clip_boxes, variances=variances, matching_type='multi', pos_iou_threshold=0.5, neg_iou_limit=0.3, normalize_coords=normalize_coords)

generator

train_generator = train_dataset.generate(batch_size=batch_size, shuffle=True, transformations=[data_augmentation_chain], label_encoder=ssd_input_encoder, returns={'processed_images', 'encoded_labels'}, keep_images_without_gt=False) val_generator = val_dataset.generate(batch_size=batch_size, shuffle=False, transformations=[], label_encoder=ssd_input_encoder, returns={'processed_images', 'encoded_labels'}, keep_images_without_gt=False)

訓練過程當中的設置

model_checkpoint = ModelCheckpoint(filepath='ssd7_epoch-{epoch:02d}_loss-{loss:.4f}_val_loss-{val_loss:.4f}.h5', monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1) csv_logger = CSVLogger(filename='ssd7_training_log.csv', separator=',', append=True) early_stopping = EarlyStopping(monitor='val_loss', min_delta=0.0, patience=10, verbose=1) reduce_learning_rate = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=8, verbose=1, epsilon=0.001, cooldown=0, min_lr=0.00001) callbacks = [model_checkpoint, csv_logger, early_stopping, reduce_learning_rate]

訓練設置

batch_size = 16

initial_epoch   = 0 final_epoch = 50 steps_per_epoch = 500

訓練的結果

訓練過程當中記錄的loss

能夠看到最後第46個epoch並不比45次的模型準確，程序就沒有保存46次訓練的權重。其餘也同理。

效果最好的是第48次的權重。

總體仍有降低的趨勢。能夠繼續多訓練寫epoch。使模型更準確。

這是咱們保存的權重文件

測試一下模型的效果

圖片１

圖像名: ./driving_datasets/1478899079843657225.jpg 人工標註的值: [[ 1  83 144 105 158] [ 1 124 142 138 153] [ 1 174 138 194 156] [ 1 183 139 235 178] [ 1 189 138 197 149] [ 1 209 137 219 147] [ 2 139 134 160 151]] 預測值: 類別 機率 xmin ymin xmax ymax [[ 1.     0.98 175.51 138.55 195.6  157.32] [ 1.     0.72 183.55 134.15 242.   175.66] [ 1.     0.59  80.93 142.11 108.24 158.76]]

一共人工標註有６個目標。咱們模型檢測到其中的３個。能夠看到模型對於較近的目標識別的仍是比較準確的。圖片中兩輛車（未檢測到的）距離較遠並且有樹木的遮擋。還有一輛距離較遠　目標比較小並且有些模糊。

再試試其餘圖片

能夠看到模型不是很準確　咱們多訓練些epoch.

訓練更多的epoch 來提高模型的準確度

initial_epoch   = 0 final_epoch = 100 steps_per_epoch = 1000

查看訓練的結果

能夠看到loss和以前相比與了明顯的降低。

測試


總結１模型的loss仍然有降低的趨勢，能夠多訓練些輪次、２模型對遠處特別小的目標很難識別，模糊的目標，遮擋的目標，還有陰影處的目標。３數據的標註有些問題，有些交目標未標註，這對咱們模型訓練過程當中會有干擾。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

SSD 車輛檢測 實現

SSD 車輛檢測　實現