論文學習-深度學習目標檢測2014至201901綜述-Deep Learning for Generic Object Detection A Survey

時間 2019-11-07

標籤論文學習深度目標檢測綜述 deep learning generic object detection survey 简体版

原文原文鏈接

目錄git

寫在前面

paper：https://arxiv.org/abs/1809.02165
github：https://github.com/hoya012/deep_learning_object_detection，A paper list of object detection using deep learning網絡

這篇綜述對深度學習目標檢測2014至201901取得的進展進行了總結，包括：框架

More than 250 key contributions are included in this survey, covering many aspects of generic object detection research: leading detection frameworks and fundamental subprob-lems including object feature representation, object proposal generation, context information modeling and training strategies; evaluation issues, specifically benchmark datasets, evaluation metrics, and state of the art performance.性能

本文的主要目的在於摘錄paper中的一些重要圖表和結論，做爲系統學習的索引，不作詳細的展開。學習

下面兩張圖來自github，分別爲paper list和performance table，紅色爲做者認爲必讀的paper。

this

目標檢測任務與挑戰

目標檢測任務的輸入是一張圖像，輸出是圖像中的物體位置和類別，以下圖所示，位置可經過Bounding Box描述，也可描述爲像素的集合。

爲了肯定圖片中物體的位置和類別，要面臨不少挑戰，一個好的檢測器要作到定位準確、分類準確還要效率高，須要對光照、形變、尺度、視角、尺寸、姿態、遮擋、模糊、噪聲等狀況魯棒，須要能容忍可能存在的較大的類內差別，又能區分開較小的類間差別，同時還要保證高效。

lua

目標檢測方法彙總

在2012年前，目標檢測方法主要是人工特徵工程+分類器，2012年後主要是基於DCNN的方法，以下圖所示：

.net

目標檢測的框架能夠分紅2類：設計

Two stage detection framework：含region proposal，先獲取ROI，而後對ROI進行識別和迴歸bounding box，以RCNN系列方法爲表明。
One stage detection framework：不含region proposal，將全圖grid化，對每一個grid進行識別和迴歸，以YOLO系列方法爲表明。

Pipeline對比與演化以下：

主幹網絡、檢測框架設計、大規模高質量的數據集是決定檢測性能的3個最重要的因素，決定了學到特徵的好壞以及特徵使用的好壞。

基礎子問題

這一節談論的重點包括：基於DCNN的特徵表示、候選區生成、上下文信息、訓練策略等。

基於DCNN的特徵表示

主幹網絡（network backbone）

ILSVRC（ImageNet Large Scale Visual Recognition Competition）極大促進了DCNN architecture的改進，在計算機視覺的各類任務中，每每將這些經典網絡做爲主幹網絡（backbone），再在其上作各類文章，經常使用在目標檢測任務中的DCNN architectures以下：

Methods For Improving Object Representation

物體在圖像中的尺寸是未知的，圖片中的不一樣物體尺寸也多是不一樣的，而DCNN越深層的感覺野越大，所以只在某一層上進行預測顯然是難以達到最優的，一個天然的想法是利用不一樣層提取到的信息進行預測，稱之爲multiscale object detection，可分紅3類：

Detecting with combined features of multiple CNN layers
Detecting at multiple CNN layers;
Combinations of the above two methods

直接看圖比較直觀：

嘗試對幾何變形進行建模也是改善Object Representation的一個方向，方法包括結合Deformable Part based Models (DPMs)的方法、Deformable Convolutional Networks (DCN)方法等。

Context Modeling

上下文信息能夠分爲3類：

Semantic context: The likelihood of an object to be found in some scenes but not in others;
Spatial context: The likelihood of finding an object in some position and not others with respect to other objects in the scene;
Scale context: Objects have a limited set of sizes relative to other objects in the scene.

DCNN經過學習不一樣抽象層級的特徵可能已經隱式地使用了contextual information，所以目前的state-of-art目標檢測方法並無顯式地利用contextual information，但近來也有一些顯式利用contextual information的DCNN方法，可分爲2類：Global context和Local context。

感受能夠在某種程度上當作是數據層面的集成學習。