目標檢測神文,非常全而且持續在更新。轉發自:https://handong1587.github.io/deep_learning/2015/10/09/object-detection.html,如有侵權聯繫刪除。
我會跟進原作者博客持續更新,加入自己對目標檢測領域的一些新研究及論文解讀。博客根據需求直接進行關鍵字搜索,例如2018,可找到最新論文。
Method |
backbone |
test size |
VOC2007 |
VOC2010 |
VOC2012 |
ILSVRC 2013 |
MSCOCO 2015 |
Speed |
OverFeat |
|
|
|
|
|
24.3% |
|
|
R-CNN |
AlexNet |
|
58.5% |
53.7% |
53.3% |
31.4% |
|
|
R-CNN |
VGG17 |
|
66.0% |
|
|
|
|
|
SPP_net |
ZF-5 |
|
54.2% |
|
|
31.84% |
|
|
DeepID-Net |
|
|
64.1% |
|
|
50.3% |
|
|
NoC |
|
|
73.3% |
|
68.8% |
|
|
|
Fast-RCNN |
VGG16 |
|
70.0% |
68.8% |
68.4% |
|
19.7%(@[0.5-0.95]), 35.9%(@0.5) |
|
MR-CNN |
|
|
78.2% |
|
73.9% |
|
|
|
Faster-RCNN |
VGG16 |
|
78.8% |
|
75.9% |
|
21.9%(@[0.5-0.95]), 42.7%(@0.5) |
198ms |
Faster-RCNN |
ResNet101 |
|
85.6% |
|
83.8% |
|
37.4%(@[0.5-0.95]), 59.0%(@0.5) |
|
YOLO |
|
|
63.4% |
|
57.9% |
|
|
45 fps |
YOLO |
VGG-16 |
|
66.4% |
|
|
|
|
21 fps |
YOLOv2 |
|
448x448 |
78.6% |
|
73.4% |
|
21.6%(@[0.5-0.95]), 44.0%(@0.5) |
40 fps |
SSD |
VGG16 |
300x300 |
77.2% |
|
75.8% |
|
25.1%(@[0.5-0.95]), 43.1%(@0.5) |
46 fps |
SSD |
VGG16 |
512x512 |
79.8% |
|
78.5% |
|
28.8%(@[0.5-0.95]), 48.5%(@0.5) |
19 fps |
SSD |
ResNet101 |
300x300 |
|
|
|
|
28.0%(@[0.5-0.95]) |
16 fps |
SSD |
ResNet101 |
512x512 |
|
|
|
|
31.2%(@[0.5-0.95]) |
8 fps |
DSSD |
ResNet101 |
300x300 |
|
|
|
|
28.0%(@[0.5-0.95]) |
8 fps |
DSSD |
ResNet101 |
500x500 |
|
|
|
|
33.2%(@[0.5-0.95]) |
6 fps |
ION |
|
|
79.2% |
|
76.4% |
|
|
|
CRAFT |
|
|
75.7% |
|
71.3% |
48.5% |
|
|
OHEM |
|
|
78.9% |
|
76.3% |
|
25.5%(@[0.5-0.95]), 45.9%(@0.5) |
|
R-FCN |
ResNet50 |
|
77.4% |
|
|
|
|
0.12sec(K40), 0.09sec(TitianX) |
R-FCN |
ResNet101 |
|
79.5% |
|
|
|
|
0.17sec(K40), 0.12sec(TitianX) |
R-FCN(ms train) |
ResNet101 |
|
83.6% |
|
82.0% |
|
31.5%(@[0.5-0.95]), 53.2%(@0.5) |
|
PVANet 9.0 |
|
|
84.9% |
|
84.2% |
|
|
750ms(CPU), 46ms(TitianX) |
RetinaNet |
ResNet101-FPN |
|
|
|
|
|
|
|
Light-Head R-CNN |
Xception* |
800/1200 |
|
|
|
|
31.5%@[0.5:0.95] |
95 fps |
Light-Head R-CNN |
Xception* |
700/1100 |
|
|
|
|
30.7%@[0.5:0.95] |
102 fps |
Papers
———————————————————————————————————-
Deep Neural Networks for Object Detection
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
R-CNN
Rich feature hierarchies for accurate object detection and semantic segmentation
Fast R-CNN
Fast R-CNN
A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection
Faster R-CNN
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R-CNN minus R
Faster R-CNN in MXNet with distributed implementation and data parallelization
Contextual Priming and Feedback for Faster R-CNN
An Implementation of Faster RCNN with Study for Region Sampling
Interpretable R-CNN
Light-Head R-CNN
Light-Head R-CNN: In Defense of Two-Stage Object Detector
Cascade R-CNN
Cascade R-CNN: Delving into High Quality Object Detection
MultiBox
Scalable Object Detection using Deep Neural Networks
Scalable, High-Quality Object Detection
SPP-Net
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Object Detectors Emerge in Deep Scene CNNs
segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection
Object Detection Networks on Convolutional Feature Maps
Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction
DeepBox: Learning Objectness with Convolutional Networks
MR-CNN
Object detection via a multi-region & semantic segmentation-aware CNN model
YOLO
You Only Look Once: Unified, Real-Time Object Detection
- arxiv: http://arxiv.org/abs/1506.02640
- code: http://pjreddie.com/darknet/yolo/
- github: https://github.com/pjreddie/darknet
- blog: https://pjreddie.com/publications/yolo/
- slides: https://docs.google.com/presentation/d/1aeRvtKG21KHdD5lg6Hgyhx5rPq_ZOsGjG5rJ1HP7BbA/pub?start=false&loop=false&delayms=3000&slide=id.p
- reddit: https://www.reddit.com/r/MachineLearning/comments/3a3m0o/realtime_object_detection_with_yolo/
- github: https://github.com/gliese581gg/YOLO_tensorflow
- github: https://github.com/xingwangsfu/caffe-yolo
- github: https://github.com/frankzhangrui/Darknet-Yolo
- github: https://github.com/BriSkyHekun/py-darknet-yolo
- github: https://github.com/tommy-qichang/yolo.torch
- github: https://github.com/frischzenger/yolo-windows
- github: https://github.com/AlexeyAB/yolo-windows
- github: https://github.com/nilboy/tensorflow-yolo
darkflow - translate darknet to tensorflow. Load trained weights, retrain/fine-tune them using tensorflow, export constant graph def to C++
Start Training YOLO with Our Own Data
- intro: train with customized data and class numbers/labels. Linux / Windows version for darknet.
- blog: http://guanghan.info/blog/en/my-works/train-yolo/
- github: https://github.com/Guanghan/darknet
YOLO: Core ML versus MPSNNGraph
TensorFlow YOLO object detection on Android
Computer Vision in iOS – Object Detection
YOLOv2
YOLO9000: Better, Faster, Stronger
darknet_scripts
Yolo_mark: GUI for marking bounded boxes of objects in images for training Yolo v2
LightNet: Bringing pjreddie’s DarkNet out of the shadows
YOLOv3
YOLOv3: An Incremental Improvement
AttentionNet: Aggregating Weak Directions for Accurate Object Detection
DenseBox
DenseBox: Unifying Landmark Localization with End to End Object Detection
SSD
SSD: Single Shot MultiBox Detector
- intro: ECCV 2016 Oral
- arxiv: http://arxiv.org/abs/1512.02325
- paper: http://www.cs.unc.edu/~wliu/papers/ssd.pdf
- slides: http://www.cs.unc.edu/%7Ewliu/papers/ssd_eccv2016_slide.pdf
- github(Official): https://github.com/weiliu89/caffe/tree/ssd
- video: http://weibo.com/p/2304447a2326da963254c963c97fb05dd3a973
- github: https://github.com/zhreshold/mxnet-ssd
- github: https://github.com/zhreshold/mxnet-ssd.cpp
- github: https://github.com/rykov8/ssd_keras
- github: https://github.com/balancap/SSD-Tensorflow
- github: https://github.com/amdegroot/ssd.pytorch
- github(Caffe): https://github.com/chuanqi305/MobileNet-SSD
What’s the diffience in performance between this new code you pushed and the previous code? #327
https://github.com/weiliu89/caffe/issues/327
DSSD
DSSD : Deconvolutional Single Shot Detector
Enhancement of SSD by concatenating feature maps for object detection
Context-aware Single-Shot Detector
Feature-Fused SSD: Fast Detection for Small Objects
https://arxiv.org/abs/1709.05054
FSSD
FSSD: Feature Fusion Single Shot Multibox Detector
https://arxiv.org/abs/1712.00960
Weaving Multi-scale Context for Single Shot Detector
ESSD
Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network
Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection
MDSSD: Multi-scale Deconvolutional Single Shot Detector for small objects
Inside-Outside Net (ION)
Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
Adaptive Object Detection Using Adjacency and Zoom Prediction
G-CNN: an Iterative Grid Based Object Detector
Factors in Finetuning Deep Model for object detection
Factors in Finetuning Deep Model for Object Detection with Long-tail Distribution
We don’t need no bounding-boxes: Training object class detectors using only human verification
HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection
A MultiPath Network for Object Detection
CRAFT
CRAFT Objects from Images
OHEM
Training Region-based Object Detectors with Online Hard Example Mining
S-OHEM: Stratified Online Hard Example Mining for Object Detection
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
R-FCN
R-FCN: Object Detection via Region-based Fully Convolutional Networks
arxiv: http://arxiv.org/abs/1605.06409
github: https://github.com/daijifeng001/R-FCN
github(MXNet): https://github.com/msracver/Deformable-ConvNets/tree/master/rfcn
github: https://github.com/Orpine/py-R-FCN
github: https://github.com/PureDiors/pytorch_RFCN
github: https://github.com/bharatsingh430/py-R-FCN-multiGPU
github: https://github.com/xdever/RFCN-tensorflow
R-FCN-3000 at 30fps: Decoupling Detection and Classification
Recycle deep features for better object detection
MS-CNN
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
Multi-stage Object Detection with Group Recursive Learning
Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection
PVANET
PVANet: Lightweight Deep Neural Networks for Real-time Object Detection
GBD-Net
Gated Bi-directional CNN for Object Detection
Crafting GBD-Net for Object Detection
StuffNet: Using ‘Stuff’ to Improve Object Detection
Generalized Haar Filter based Deep Networks for Real-Time Object Detection in Traffic Scene
Hierarchical Object Detection with Deep Reinforcement Learning
Learning to detect and localize many objects from few examples
Speed/accuracy trade-offs for modern convolutional object detectors
SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving
Feature Pyramid Network (FPN)
Feature Pyramid Networks for Object Detection
Action-Driven Object Detection with Top-Down Visual Attentions
Beyond Skip Connections: Top-Down Modulation for Object Detection
Wide-Residual-Inception Networks for Real-time Object Detection
Attentional Network for Visual Object Detection
Learning Chained Deep Features and Classifiers for Cascade in Object Detection
DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling
Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries
Spatial Memory for Context Reasoning in Object Detection
Accurate Single Stage Detector Using Recurrent Rolling Convolution
Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection
LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems
Point Linking Network for Object Detection
Perceptual Generative Adversarial Networks for Small Object Detection
Few-shot Object Detection
SMC Faster R-CNN: Toward a scene-specialized multi-object detector
Towards lightweight convolutional neural networks for object detection
RON: Reverse Connection with Objectness Prior Networks for Object Detection
Mimicking Very Efficient Network for Object Detection
Residual Features and Unified Prediction Network for Single Stage Detection
https://arxiv.org/abs/1707.05031
Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors
Recurrent Scale Approximation for Object Detection in CNN
DSOD
DSOD: Learning Deeply Supervised Object Detectors from Scratch
- intro: ICCV 2017. Fudan University & Tsinghua University & Intel Labs China
- arxiv: https://arxiv.org/abs/1708.01241
- github: https://github.com/szq0214/DSOD
RetinaNet
Focal Loss for Dense Object Detection
Focal Loss Dense Detector for Vehicle Surveillance
CoupleNet: Coupling Global Structure with Local Parts for Object Detection
Incremental Learning of Object Detectors without Catastrophic Forgetting
Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection
StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection
Dynamic Zoom-in Network for Fast Object Detection in Large Images
https://arxiv.org/abs/1711.05187
Zero-Annotation Object Detection with Web Knowledge Transfer
MegDet
MegDet: A Large Mini-Batch Object Detector
Single-Shot Refinement Neural Network for Object Detection
Receptive Field Block Net for Accurate and Fast Object Detection
An Analysis of Scale Invariance in Object Detection - SNIP
Feature Selective Networks for Object Detection
Learning a Rotation Invariant Detector with Rotatable Bounding Box
Scalable Object Detection for Stylized Objects
Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids
Deep Regionlets for Object Detection
Training and Testing Object Detectors with Virtual Images
Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video
- keywords: object mining, object tracking, unsupervised object discovery by appearance-based clustering, self-supervised detector adaptation
- arxiv: https://arxiv.org/abs/1712.08832
Spot the Difference by Object Detection
Localization-Aware Active Learning for Object Detection
Object Detection with Mask-based Feature Encoding
LSTD: A Low-Shot Transfer Detector for Object Detection
Domain Adaptive Faster R-CNN for Object Detection in the Wild
Pseudo Mask Augmented Object Detection
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Learning Region Features for Object Detection
Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection
Object Detection for Comics using Manga109 Annotations
Task-Driven Super Resolution: Object Detection in Low-resolution Images
Transferring Common-Sense Knowledge for Object Detection
Multi-scale Location-aware Kernel Representation for Object Detection
Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors
DetNet: A Backbone network for Object Detection
Robust Physical Adversarial Attack on Faster R-CNN Object Detector
AdvDetPatch: Attacking Object Detectors with Adversarial Patches
Physical Adversarial Examples for Object Detectors
Quantization Mimic: Towards Very Tiny CNN for Object Detection
Object detection at 200 Frames Per Second
Object Detection using Domain Randomization and Generative Adversarial Refinement of Synthetic Images
SNIPER: Efficient Multi-Scale Training
Soft Sampling for Robust Object Detection
Auto-Context R-CNN
Pooling Pyramid Network for Object Detection
Modeling Visual Context is Key to Augmenting Object Detection Datasets
Dual Refinement Network for Single-Shot Object Detection
Acquisition of Localization Confidence for Accurate Object Detection
CornerNet: Detecting Objects as Paired Keypoints
Unsupervised Hard Example Mining from Videos for Improved Object Detection
SAN: Learning Relationship between Convolutional Features for Multi-Scale Object Detection
A Survey of Modern Object Detection Literature using Deep Learning
Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages
Deep Feature Pyramid Reconfiguration for Object Detection
Non-Maximum Suppression (NMS)
A convnet for non-maximum suppression
Soft-NMS – Improving Object Detection With One Line of Code
Learning non-maximum suppression
Relation Networks for Object Detection
Adversarial Examples
Adversarial Examples that Fool Detectors
Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods
Weakly Supervised Object Detection
Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection
Weakly supervised object detection using pseudo-strong labels
Saliency Guided End-to-End Learning for Weakly Supervised Object Detection
Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection
Video Object Detection
Learning Object Class Detectors from Weakly Annotated Video
Analysing domain shift factors between videos and images for object detection
Video Object Recognition
Deep Learning for Saliency Prediction in Natural Video
T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos
Object Detection from Video Tubelets with Convolutional Neural Networks
Object Detection in Videos with Tubelets and Multi-context Cues
Context Matters: Refining Object Detection in Video with Recurrent Neural Networks
CNN Based Object Detection in Large Video Images
Object Detection in Videos with Tubelet Proposal Networks
Flow-Guided Feature Aggregation for Video Object Detection
Video Object Detection using Faster R-CNN
Improving Context Modeling for Video Object Detection and Tracking
http://image-net.org/challenges/talks_2017/ilsvrc2017_short(poster).pdf
Temporal Dynamic Graph LSTM for Action-driven Video Object Detection
Mobile Video Object Detection with Temporally-Aware Feature Maps
Impression Network for Video Object Detection
Spatial-Temporal Memory Networks for Video Object Detection
3D-DETNet: a Single Stage Video-Based Vehicle Detector
Object Detection in Videos by Short and Long Range Object Linking
Object Detection in Video with Spatiotemporal Sampling Networks
Optimizing Video Object Detection via a Scale-Time Lattice
Object Detection on Mobile Devices
Pelee: A Real-Time Object Detection System on Mobile Devices
Object Detection in 3D
Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks
Complex-YOLO: Real-time 3D Object Detection on Point Clouds
Object Detection on RGB-D
Learning Rich Features from RGB-D Images for Object Detection and Segmentation
Differential Geometry Boosts Convolutional Neural Networks for Object Detection
A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation
Zero-Shot Object Detection
Zero-Shot Detection
Zero-Shot Object Detection
Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts
Zero-Shot Object Detection by Hybrid Region Embedding
Salient Object Detection
This task involves predicting the salient regions of an image given by human eye fixations.
Best Deep Saliency Detection Models (CVPR 2016 & 2015)
Large-scale optimization of hierarchical features for saliency prediction in natural images
Predicting Eye Fixations using Convolutional Neural Networks
Saliency Detection by Multi-Context Deep Learning
DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection
SuperCNN: A Superpixelwise Convolutional Neural Network for Salient Object Detection
- paper: www.shengfenghe.com/supercnn-a-superpixelwise-convolutional-neural-network-for-salient-object-detection.html
Shallow and Deep Convolutional Networks for Saliency Prediction
Recurrent Attentional Networks for Saliency Detection
Two-Stream Convolutional Networks for Dynamic Saliency Prediction
Unconstrained Salient Object Detection
Unconstrained Salient Object Detection via Proposal Subset Optimization
- intro: CVPR 2016
- project page: http://cs-people.bu.edu/jmzhang/sod.html
- paper: http://cs-people.bu.edu/jmzhang/SOD/CVPR16SOD_camera_ready.pdf
- github: https://github.com/jimmie33/SOD
- caffe model zoo: https://github.com/BVLC/caffe/wiki/Model-Zoo#cnn-object-proposal-models-for-salient-object-detection
DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection
Salient Object Subitizing
- intro: CVPR 2015
- intro: predicting the existence and the number of salient objects in an image using holistic cues
- project page: http://cs-people.bu.edu/jmzhang/sos.html
- arxiv: http://arxiv.org/abs/1607.07525
- paper: http://cs-people.bu.edu/jmzhang/SOS/SOS_preprint.pdf
- caffe model zoo: https://github.com/BVLC/caffe/wiki/Model-Zoo#cnn-models-for-salient-object-subitizing
Deeply-Supervised Recurrent Convolutional Neural Network for Saliency Detection
Saliency Detection via Combining Region-Level and Pixel-Level Predictions with CNNs
Edge Preserving and Multi-Scale Contextual Neural Network for Salient Object Detection
A Deep Multi-Level Network for Saliency Prediction
Visual Saliency Detection Based on Multiscale Deep CNN Features
A Deep Spatial Contextual Long-term Recurrent Convolutional Network for Saliency Detection
Deeply supervised salient object detection with short connections
Weakly Supervised Top-down Salient Object Detection
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
Visual Saliency Prediction Using a Mixture of Deep Neural Networks
A Fast and Compact Salient Score Regression Network Based on Fully Convolutional Network
Saliency Detection by Forward and Backward Cues in Deep-CNNs
Supervised Adversarial Networks for Image Saliency Detection
Group-wise Deep Co-saliency Detection
Towards the Success Rate of One: Real-time Unconstrained Salient Object Detection