[Arxiv1706] Few-Example Object Detection with Model Communication 論文筆記

時間 2019-11-17

標籤 arxiv1706 arxiv example object detection model communication 論文筆記简体版

原文原文鏈接

https://arxiv.org/pdf/1706.08249.pdfweb

Few-Example Object Detection with Model Communication，Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Mengapp

亮點dom

本文僅僅經過每一個類別3-4個bounding box標註便可實現物體檢測，並與其它使用大量training examples的方法性能可比
主要方法是：multi-modal learning (多模型同時訓練) ＋ self-paced learning (curriculum learning)

相關工做函數

這裏介紹幾個比較容易混淆的概念，以及與他們相關的方法性能

弱監督物體檢測：數據集的標籤是不可靠的，如（x，y），y對於x的標記是不可靠的。這裏的不可靠能夠是標記不正確，多種標記，標記不充分，局部標記等。

標籤是圖像級別的類別標籤[7][8][9][10][11][18][30][31][32][33][34]

半監督物體檢測：半監督學習使用大量的未標記數據，以及同時使用標記數據，來進行模式識別工做。

一些訓練樣本只有類別標籤，另一些樣本有詳細的物體框和類別標註[4][5][6]

須要大量標註 (e.g., 50% of the full annotations)

每一個類別只有幾個物體框標註（Few-Example Object Detection with Model Communication)[12][35]

和few-shot learning 的區別：是否使用未標註數據學習

經過視頻挖掘位置標註，此類方法主要針對會移動的物體[2][3][29][1]

Webly supervised learning for object detection: reduce the annotation cost by leveraging web data

方法學習

Basic detector: Faster RCNN & RFCN優化

Object proposal method: selective search & edge boxesthis

Annotations: when we randomly annotate approximately four images for each class, an image may contain several objects, and we annotate all the object bounding boxes.spa

參數更新：
更新vj：對上述損失函數進行求導，能夠獲得vj的解
3d

對同一張圖像i同一個模型j，若是有多個樣本使得vj＝1，則只選擇使Lc最小的那個樣本置爲1，其餘置爲0。gamma促使模型之間共享信息，由於vj爲1時，閾值變大，圖像更容易被選擇到。

更新wj：與其它文章方法相同

更新yuj：爲更新yuj咱們須要從一組bounding box找到知足如下條件的解，

很難直接找到最優化的解。文中採用的方案是：將全部模型預測出的結果輸入nms，並經過閾值只保留分數高的結果，餘下的組成yuj。

去除難例：we employ a modified NMS (intersection/max(area1,area2)) to filter out the nested boxes, which usually occurs when there are multiple overlapping objects. If there are too many boxes (≥ 4) for one specific class or too many classes (≥ 4) in the image, this image will be removed. Images in which no reliable pseudo objects are found are filtered out.

實驗

Compared with the-state-of-the-art (4.2 images per class is annotated)

VOC 2007: -1.1mAP, correct localization +0.9% compared with [21]
VOC 2012: -2.5mAP compared with [21], correct localization +9.8%
ILSVRC 2013: -2.4mAP compared with [21]
COCO 2014: +1.3 mAP compared with [22]

[20] V. Kantorov, M. Oquab, M. Cho, and I. Laptev, 「Contextlocnet: Context-aware deep network models for weakly supervised localization,」 in European Conference on Computer Vision, 2016.
[21] A. Diba, V. Sharma, A. Pazandeh, H. Pirsiavash, and L. Van Gool, 「Weakly supervised cascaded convolutional networks,」 2017
[22] Y. Zhu, Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, 「Soft proposal networks for weakly supervised object localization,」 in International Conference on Computer Vision, 2017.

Ablation study