Weakly Supervised Deep Detection Networks,Hakan Bilen,Andrea Vedaldiweb
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Bilen_Weakly_Supervised_Deep_CVPR_2016_paper.pdfwindows
亮點app
- 把弱監督檢測問題解釋爲proposal排序的問題,經過比較全部proposal的類別分數獲得一個比較正確的排序,這種思想與檢測中評測標準的計算方法一致
相關工做ide
The MIL strategy results in a non-convex optimization problem; in practice, solvers tend to get stuck in local optimathis
such that the quality of the solution strongly depends on the initialization. idea
- developing various initialization strategies [19, 5, 32, 4]
- [19] propose a self-paced learning strategy
- [5] initialize object locations based on the objectness score.
- [4] propose a multi-fold split of the training data to escape local optima.
- on regularizing the optimization problem [31, 1].
- [31] apply Nesterov’s smoothing technique to the latent SVM formulation
- [1] propose a smoothed version of MIL that softly labels object instances instead of choosing the highest scoring ones.
- Another line of research in WSD is based on the idea of identifying the similarity between image parts.
- [31] propose a discriminative graph-based algorithm that selects a subset of windows such that each window is connected to its nearest neighbors in positive images.
- [32] extend this method to discover multiple co-occurring part configurations.
- [36] propose an iterative technique that applies a latent semantic clustering via latent Semantic Analysis (pLSA)
- [2] propose a formulation that jointly learns a discriminative model and enforces the similarity of the selected object regions via a discriminative convex clustering algorithm
方法spa
本文采用的方法很是簡單易懂,主要分爲如下三部:rest
- 將特徵和region proposal的結果輸入spatial pyramid pooling層,取出與區域相關的特徵向量,並輸入兩個fc層
- 分類:fc層的輸出經過softmax分類器,計算出這一區域類別
- 檢測:fc層的輸出經過softmax分類器,與上面不一樣的是歸一化的時候不是用類別歸一化,而是用全部區域的分數進行歸一化,經過區域之間的對比找到包含該類別信息最多的區域
- 某區域r屬於某類別c的得分,爲後兩部分的積
- 全圖的類別得分,爲全部區域屬於該類別的得分之和
訓練的loss function以下orm
最後一項是一個校準項(按照理解輕微更改了,感受論文notation有點問題),其目的是經過拉近feature的距離約束解的平滑性(即與正確解相近的proposal也應該獲得高分)。blog
實驗結果
本文根據basenet不一樣給出了4種model:S (VGG-F), M (VGG-M-1024), L (VGG-VD16)和Ens(前三種ensemble的模型)
- Ablation:
- Object proposal
- Baseline mAP: Selective Search S 31.1%, M 30.9%, L 24.3%, Ens. 33.3%
- Edge Box: +0~1.2%
- Edge Box + Edge Box Score: +1.8~5.9%
- Spatial regulariser (compared with Edge Box + Edge Box Score) mAP +1.2~4.4%
- VOC2007
- mAP on test: S +2.9%, M +3.3%, L +3.2%, Ens. +7.7% compared with [36] + context
- CorLoc on trainval: S +5.7%, M +7.6%, L +5%, Ens. +9.5% compared with [36]
- Classification AP on test: S +7.9% compared with VGG-F, M +6.5% compared with VGG-M-1024, L +0.4% compared with VGG-VD16, Ens. -0.3% compared with VGG-VD16
- VOC2010
- mAP on test: +8.8% compared with [4]
- CorLoc on trainval: +4.5% compared with [4]
缺點
本文有一個明顯的缺點是隻考慮了一張圖中某類別物體只出現一次的狀況(regulariser中僅限制了最大值及其周圍的框),這一點在文中給出的failure cases中也有所體現。