優秀！港大同濟伯克利提出Sparse R-CNN: 目標檢測新範式

做者孫培澤git

轉自知乎，已獲受權轉載，請勿二次轉載github

https://zhuanlan.zhihu.com/p/310058362windows

本文主要介紹一下咱們最近的一篇工做：

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

沿着目標檢測領域中 Dense 和 Dense-to-Sparse 的框架，Sparse R-CNN創建了一種完全的 Sparse 框架，脫離 anchor box，reference point，Region Proposal Network(RPN)等概念，無需Non-Maximum Suppression(NMS) 後處理，在標準的 COCO benchmark 上使用 ResNet-50 FPN 單模型在標準 3x training schedule 達到了 44.5 AP 和 22 FPS。微信

論文連接：https://msc.berkeley.edu/research/autonomous-vehicle/sparse_rcnn.pdf網絡
項目連接：https://github.com/PeizeSun/SparseR-CNNapp

01 框架

Motivation編輯器

咱們先簡單回顧一下目標檢測領域中主流的兩大類方法。

第一大類是從非Deep時代就被普遍應用的dense detector，例如DPM，YOLO，RetinaNet，FCOS。在dense detector中，大量的object candidates例如sliding-windows，anchor-boxes， reference-points等被提早預設在圖像網格或者特徵圖網格上，而後直接預測這些candidates到gt的scaling/offest和物體類別。
第二大類是dense-to-sparse detector，例如，R-CNN家族。這類方法的特色是對一組sparse的candidates預測迴歸和分類，而這組sparse的candidates來自於dense detector。

這兩類框架推進了整個領域的學術研究和工業應用。目標檢測領域看似已經飽和，然而dense屬性的一些固有侷限總讓人難以滿意：

NMS 後處理
many-to-one 正負樣本分配
prior candidates的設計

因此，一個很天然的思考方向就是：能不能設計一種完全的sparse框架？最近，DETR給出了一種sparse的設計方案。

candidates是一組sparse的learnable object queries，正負樣本分配是one-to-one的optimal bipartite matching，無需nms直接輸出最終的檢測結果。

然而，DETR中每一個object query都和全局的特徵圖作attention交互，這本質上也是dense。

而咱們認爲，sparse的檢測框架應該體如今兩個方面：sparse candidates和sparse feature interaction。基於此，咱們提出了Sparse R-CNN。

Sparse R-CNN拋棄了anchor boxes或者reference point等dense概念，直接從a sparse set of learnable proposals出發，沒有NMS後處理，整個網絡異常乾淨和簡潔，能夠看作是一個全新的檢測範式。

02 函數

Sparse R-CNN性能

Sparse R-CNN的object candidates是一組可學習的參數，N*4，N表明object candidates的個數，通常爲100～300，4表明物體框的四個邊界。這組參數和整個網絡中的其餘參數一塊兒被訓練優化。

That's it，徹底沒有dense detector中成千上萬的枚舉。這組sparse的object candidates做爲proposal boxes用以提取Region of Interest(RoI)，預測迴歸和分類。

這組學習到的proposal boxes能夠理解爲圖像中可能出現物體的位置的統計值，這樣coarse的表徵提取出來的RoI feature顯然不足以精肯定位和分類物體。

因而，咱們引入一種特徵層面的candidates，proposal features，這也是一組可學習的參數，N*d，N表明object candidates的個數，與proposal boxes一一對應，d表明feature的維度，通常爲256。

這組proposal features與proposal boxes提取出來的RoI feature作一對一的交互，從而使得RoI feature的特徵更有利於定位和分類物體。

相比於原始的2-fc Head，咱們的設計稱爲Dynamic Instance Interactive Head.

Sparse R-CNN的兩個顯著特色就是sparse object candidates和sparse feature interaction，既沒有dense的成千上萬的candidates，也沒有dense的global feature interaction。Sparse R-CNN能夠看做是目標檢測框架從dense到dense-to-sparse到sparse的一個方向拓展。

Architecture Design

Sparse R-CNN的網絡設計原型是R-CNN家族。

Backbone是基於ResNet的FPN。
Head是一組iterative的Dynamic Instance Interactive Head，上一個head的output features和output boxes做爲下一個head的proposal features和proposal boxes。Proposal features在與RoI features交互以前作self-attention。
訓練的損失函數是基於optimal bipartite matching的set prediction loss。

從Faster R-CNN(40.2 AP)出發，直接將RPN替換爲a sparse set of learnable proposal boxes，AP降到18.5；引入iterative結構提高AP到32.2；引入dynamic instance interaction最終提高到42.3 AP。

Performance

咱們沿用了Detectron2的3x training schedule，所以將Sparse R-CNN和Detectorn2中的detectors作比較（不少方法沒有報道3x的性能，因此沒有列出)。

同時，咱們也列出了一樣不須要NMS後處理的DETR和Deformable DETR的性能。Sparse R-CNN在檢測精度，推理時間和訓練收斂速度都展示了至關有競爭力的性能。

Conclusion

R-CNN和Fast R-CNN出現後的一段時期內，目標檢測領域的一個重要研究方向是提出更高效的region proposal generator。Faster R-CNN和RPN做爲其中的佼佼者展示出普遍而持續的影響力。

Sparse R-CNN首次展現了簡單的一組可學習的參數做爲proposal boxes便可達到comparable的性能。咱們但願咱們的工做可以帶給你們一些關於end-to-end object detection的啓發。

備註：目標檢測

目標檢測交流羣

2D、3D目標檢測等最新資訊，若已爲CV君其餘帳號好友請直接私信。

我愛計算機視覺

微信號:aicvml

QQ羣:805388940

微博知乎:@我愛計算機視覺

投稿:amos@52cv.net

網站:www.52cv.net

點點【在看】分享技術成果

本文分享自微信公衆號 - 我愛計算機視覺（aicvml）。
若有侵權，請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」，歡迎正在閱讀的你也加入，一塊兒分享。