[CVPR2017] Weakly Supervised Cascaded Convolutional Networks論文筆記

https://www.csee.umbc.edu/~hpirsiav/papers/cascade_cvpr17.pdf網絡

Weakly Supervised Cascaded Convolutional Networks, Ali Diba, Vivek Sharma, Ali Pazandeh, Hamed Pirsiavash and Luc Van Goolapp

  

亮點dom

  • 經過多任務疊加(分類,分割)提升了多物體弱監督檢測的正確率
  • 經過利用segmentation篩選純淨的proposals,獲得了更魯棒的結果
  • 爲弱監督分割任務設計比較魯棒的loss
    • 只考慮全局的分類結果和置信度對高的部分
    • 經過loss的weights關注到最須要關注的部分

相關工做 ui

 

One of the most common approaches [7] consists of the following steps:spa

 

  • generates object proposals,
  • extracts features from the proposals,
  • applies multiple instance learning (MIL) to the features and finds the box labels from the weak bag (image) labels. 

弱監督物體檢測難點: 弱監督物體檢測對初始化要求很高,很差的初始化可能會使網絡陷入局部最優解,解決的辦法主要有如下幾個:設計

  • improve the initialization [31, 9, 28, 29]
  • regularizing the optimization strategies [4, 5, 7]
  • [17] employ an iterative self-learning strategy to employ harder samples to a small set of initial samples
  • [15] use a convex relaxation of soft-max loss 

Majority of the previous works [25, 32] use a large collection of noisy object proposals to train their object detector. In contrast, our method only focuses on a very few clean collection of object proposals that are far more reliable, robust, computationally efficient, and gives better performanceorm

方法blog

Two-stage: proposal and image classification (conv1 till con5, global pooling) + multiple instance learning (2fc, score layer)ip

 

 

1. image classification: CNN with global average pooling (GAP) [36]中引入,將分類過程當中fc層的weights做爲原來convolutional layer輸出的權重並將全部頻道加權獲得的圖做爲class activation map。在這一步中,還產生一個分類的loss LGAPci

[36]  B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning deep features for discriminative localization. In CVPR, 2016. 3, 4, 5, 6, 7, 8

 

 

2. multiple instance learning

Proposal: edgeboxs [37] is used to generate an initial set of object proposals. Then we threshold the class activation map [36] to come up with a mask. Finally, we choose the initial boxes with largest overlap with the mask.

 

 

Three-stage:  more information about the objects’ boundary learned in a segmentation task can lead to acquisition of a better appearance model and then better object localization.

  • 主要思想:分割監督信號幫助提高定位準確率。
  • 弱分割監督信號:上一級獲得的mask

 

實驗結果

 

PASCAL VOC 2007

  • +3.3% classification compared with [18]
  • +1.6% correct localization compared with [27]
  • +0.6% compared with [6]

PASCAL VOC 2010

  • +3.3% compared with [6]

PASCAL VOC 2012

  • +8.8% compared with [18]
  • ILSVRC 2013
  • +5.5% compared with [18]

Object detection training

  • PASCAL VOC 2007 test set: Faster RCNN trained by the pseudo ground-truth (GT) bounding boxes generated by our cascaded networks performs slightly better than our transfered model. (+0.3%)

[6] H. Bilen and A. Vedaldi. Weakly supervised deep detection networks. In CVPR, 2016. 6, 7, 8

[18] D. Li, J.-B. Huang, Y. Li, S. Wang, and M.-H. Yang. Weakly supervised object localization with progressive domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2, 6, 7

[27] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 5, 6

相關文章
相關標籤/搜索