運動物體檢測論文（1）

時間 2019-11-06

標籤運動物體檢測論文简体版

原文原文鏈接

接下來就是我要介紹的論文

Zhou D, Frémont V, Quost B, et al. Moving Object Detection and Segmentation in Urban Environments from a Moving Platform ☆[J]. Image & Vision Computing, 2017, 68.

這是一篇2017 的論文，發表在HAL，HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientifc research documents

文章摘要：

This paper proposes an effective approach to detect and segment moving objects from two time-consecutive stereo frames, which leverages the uncertainties in camera motion estimation and in disparity computation. First, the relative camera motion and its uncertainty are computed by tracking and matching sparse features in four images（是雙目相機）. Then, the motion likelihood at each pixel is estimated by taking into account the ego-motion uncertainty and disparity in computation procedure. Finally, the motion likelihood, color and depth cues are combined in the graph-cut framework for moving object segmentation. The efficiency of the proposed method is evaluated on the KITTI benchmarking datasets, and our experiments show that the proposed approach is robust against both global (camera motion) and local (optical flow) noise. Moreover, the approach is dense as it applies to all pixels in an image, and even partially occluded moving objects can be detected successfully. Without dedicated tracking strategy, our approach achieves high recall and comparable precision on the KITTI benchmarking sequences.

文章提出了一種基於雙目視覺中時間連續兩幀中檢測和分割出運動物體的有效方法，該方法利用了相機運動估計和視差計算中的不肯定性。

首先，經過跟蹤和匹配四個圖像中的稀疏特徵來計算相對相機運動及其不肯定性。而後，將每一個像素處的運動似然考慮到自車運動的不肯定性和視察估計中。最後，將運動似然，顏色和深度信息，組合在用於運動對象分割的圖形切割框架中。在KITTI基準數據集上評估了所提方法的效率，而且咱們的實驗代表，所提出的方法對全局（相機運動）和局部（光流）噪聲具備魯棒性。此外，該方法是密集的，由於它適用於圖像中的全部像素，而且甚至能夠成功地檢測到部分遮擋的移動對象。若是沒有專門的跟蹤策略，咱們的方法能夠在KITTI基準測試序列上實現高召回率和可比較的精確度。

介紹

Making the vehicles to automatically perceive and understand their 3D environment is a challenging and important task，Due to the improvement of the sensor tech- nologies, processing techniques and researchers’ contributions, several Advanced Driver Assistance Systems (ADASs) have been developed for various purposes such as forward collision warning systems, parking assist systems, blind spot detection systems and adaptive cruise control systems

文中說到科研人員一直以來都在挑戰的一個任務，就是使車輛可以感知和理解這個3D環境，，固然隨着傳感器技術的不斷進步以及研究者們的貢獻，ADAS有了很大的進展，並舉例有碰撞報警，泊車輔助，盲區檢測，以及自適應巡航系統。

當前更爲流行的好比SLAM和SFM系統都很好的應用在ADAS系統以及自動駕駛中，好比比較經常使用且流行的ORB-SLAM

R. Mur-Artal, J. Montiel, and J. D. Tardos, \Orb-slam: a versatile and accu-rate monocular slam system," Robotics, IEEE Transactions on, vol. 31, no. 5,600 pp. 1147{1163, 2015.

可是呢，這些系統都假設是靜態的環境，他們必需要面對一些複雜的城市環境和動態的物體，所以，有效且有效地檢測移動物體對於這種系統的準確性來講是一個相當重要的問題。

moving objects are considered as outliers and RANSAC strategy is applied to get rid of them efficiently. However, this strategy will fail when the moving objects are the dominant part of the image. Thus, efficiently and effectively detecting moving objects turns out to be a crucial issue for the accuracy of such systems.

In this article, we focus on the specific problem of moving object detection. We propose a detection and segmentation system based on two time-consecutive stereo images. The key idea is to detect the moving pixels by compensating the image changes caused by the global camera motion. The uncertainty of the camera motion is also considered to obtain reliable detection results. Furthermore, color and depth information is also employed to remove some false detection

此文章重點解決移動對象檢測的具體問題。提出了一種基於時間連續立體圖像的兩幀圖像移動物體的檢測和分割系統。關鍵思想是經過補償由全局相機運動引發的圖像變化來檢測運動像素。攝像機運動的不肯定性也被認爲是得到可靠的檢測結果。此外，還使用顏色和深度信息來消除一些錯誤檢測！！！（什麼是經過補償相機的全局運動引發的圖像變換來檢測相機運動）

移動物體檢測一直以來都是研究的熱點，其中背景減除法是最經常使用的一種物體檢測方法。說了一些單目視覺上的移動物體檢測方法，主要仍是上面介紹的那些方法。

可是本文使用的雙目，相比於單目攝像頭，雙目（stereo vision system SVS）提供了深度信息和視差信息。

Dense or sparse depth/disparity maps computed by global [10] or semi-global [11] matching approaches can be used to build 3D information on the environment. Theoretically, by obtaining the 3D information, any kind of motion can be detected, even the case of degenerate motion mentioned above. In [12], 3D point clouds are reconstructed from linear stereo vision systems first and then objects are detected based on a spectral clustering technique from the 3D points. Common used methods for Moving Object Detection (MOD) in stereo rig can be divided into sparse feature based [13, 14] and dense scene flow-based approaches [15, 16, 17]

經過全局[10]或半全局[11]匹配方法計算的密集或稀疏深度/視差圖可用於重構環境的3D信息。理論上，經過得到3D信息，即便是在自車運動退化的狀況，也能夠檢測任何類型的運動。在[12]中，首先從線性立體視覺系統重建3D點雲，而後基於來自3D點的光譜聚類技術檢測物體。在立體相機中用於運動物體檢測（MOD）的經常使用方法能夠分爲基於稀疏特徵的[13,14]和基於密集場景流的方法[15,16,17]。

[10]L. Wang and R. Yang, \Global stereo matching leveraged by sparse ground control points," in Computer Vision and Pattern Recognition (CVPR), Conference on, pp. 3033{3040, IEEE, 2011.

[11] H. Hirschmuller, \Accurate and efficient stereo processing by semi-global matching and mutual information," in Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 2, pp. 807{814, 2005.

[12] S. Moqqaddem, Y. Ruichek, R. Touahni, and A. Sbihi, \Objects detection and tracking using points cloud reconstructed from linear stereo vision," Current Advancements in Stereo Vision, p. 161, 2012.

[13] B. Kitt, B. Ranft, and H. Lategahn, \Detection and tracking of independently moving objects in urban environments," in Intelligent Transportation Systems, 13th International IEEE Conference on, pp. 1396{1401, IEEE, 2010.

[14] P. Lenz, J. Ziegler, A. Geiger, and M. Roser, \Sparse scene flow segmentation for moving object detection in urban environments," in Intelligent Vehicles Symposium (IV),IEEE, pp. 926{932, 2011.

[15] A. Talukder and L. Matthies, \Real-time detection of moving objects from moving vehicles using dense stereo and optical flow," in Intelligent Robots and Systems, Proceedings. International Conference on, vol. 4, pp. 3718{3725, IEEE, 2004.

[16] V. Romero-Cano and J. I. Nieto, \Stereo-based motion detection and tracking from a moving platform," in Intelligent Vehicles Symposium, IEEE, pp. 499 IEEE, 2013.

[17] C. Rabe, T. M¨uller, A. Wedel, and U. Franke, \Dense, robust, and accurate motion field estimation from stereo image sequences in real-time," in European conference on computer vision, pp. 582{595, Springer, 2010

當在移動物體對象上檢測到不多的特徵時，基於稀疏特徵的方法就會失敗。此時，可使用基於密集光流的方法。在[15]中，基於當前場景深度和自我運動，預測和計算兩個連續幀之間的光流。從預測的光流場和測量獲得的光流場之間的差別，較大的非零區域被分類爲潛在的移動物體。儘管該運動檢測方案提供了密集的結果，可是因爲感知任務中涉及的噪聲，系統可能易於產生大量的誤檢測。經過考慮3D場景流[18]或2D真實光流[16]的不肯定性，已經開發了其餘改進方法[18]和[16]來限制誤檢測。然而，這些方法粗略地模擬了從其餘傳感器好比 (GPS or IMU)得到的自我運動的不肯定性。

使用來自單目相機的對極幾何結構不能在其運動退化時檢測移動物體。（退化的解釋：3D點沿着由兩個相機中心和點自己造成的極線平面移動，而其2D投影沿着極線移動。）

假設已經標定好的雙目相機，We denote b as the calibrated baseline for the stereo head.

Additionally, the left and right rectified images have identical focal length f and principal point coordinates as p0 = (u0; v0)T.

下圖給出了兩幀連續的從t-1到t時刻的雙目相機圖像。假設世界座標系的原點在時間t-1與左攝像機的局部座標系重合。

the X-axis points to the right and the Y -axis points downwards （X軸方向向右，Y軸向下）

在t-1時刻，從靜態背景點提取的像素的位置是

在t時刻獲取的位置是

其中K是相機的內在參數矩陣，R，tr是相對相機旋轉和平移（姿式），Zt-1表明t-1中幀中3D點X的深度。

爲了檢測圖像中的運動物體，一個直截了當的想法是經過方程式補償相機運動。根據公式

（1）

而後，殘差圖像被計算爲在運動中補償的當前和先前的差值，突出顯示屬於運動對象的像素和與運動偏差估計有關的像素。爲了清楚起見，咱們首先定義三種不一樣的基於流的表達式：

全局圖像運動光流（ Global Image Motion Flow GIMF）表示僅由相機運動引發的預測圖像變化，可使用等式（1）計算

測量光流（ Measured Optical Flow MOF）表示使用圖像處理技術估計的實際密集光流[23]。

C. Liu, Beyond pixels: exploring new representations and applications for motion analysis. PhD thesis, Massachusetts Institute of Technology, 2009.

殘餘圖像運動光流（ Residual Image Motion Flow RIMF）用於測量MOF和GIMF之間的差別

RIMF可用於區分該像素是否和移動和非移動物體相關的像素。爲了計算RIMF，應首先計算MOF和GIMF。注意計算後者須要關於相機運動（自我運動）和像素深度值的信息。文中沒有說明計算密集光流[23]和視差圖[24]的問題：

[23] C. Liu, Beyond pixels: exploring new representations and applications for motion analysis. PhD thesis, Massachusetts Institute of Technology, 2009.

[24] A. Geiger, M. Roser, and R. Urtasun, \Efficient large-scale stereo matching,"in Asian Conference on Computer Vision, pp. 25{38, Springer, 2010

[25]C. Vogel, K. Schindler, and S. Roth, \3d scene flow estimation with a piecewise rigid scene model," International Journal of Computer Vision, vol. 115, no. 1, pp. 1{28, 2015.

更確切地說，咱們利用[25]中提出的方法來計算密集光流和密集視差圖。而後咱們直接將它們用做咱們系統的輸入。整個系統能夠經過如下三個步驟進行總結：

1. Moving Pixel Detection 移動像素檢測。在該步驟中，經過補償由相機運動引發的圖像變化來檢測運動像素。爲了改善檢測結果，考慮了相機運動的不肯定性。

2. Moving object segmentation移動對象分割。在移動像素檢測以後，使用基於圖形切割的算法經過考慮顏色和視差信息來移除錯誤檢測。

3. Bounding box generation.邊界框生成。最後，經過使用UV視差圖分析爲每一個移動物體生成邊界框

圖1 雙目視覺下的座標系

圖2 Framework of the moving object detection and segmentation system

紅色部分用於計算每一個像素的運動似然。

藍色部分是基於圖形切割的運動對象分割。

綠色部分是爲每一個移動對象生成邊界框的後處理。

首先介紹Moving Pixel Detection 移動像素檢測

從圖1雙目連續兩幀的四個圖像來看，在t-1時刻和t時刻的圖像，在t-1時刻左圖像I_(t-1,L)被當作是參考圖像，如下是定義

接着是自車運動估計和不肯定性計算

給定兩個連續幀的四個圖像中的一組對應點，能夠經過使用非線性最小化方法最小化重投影偏差的總和來估計相機的相對姿態。

首先，重建前一幀的3D特徵點。經過三角測量和使用相機內在參數。而後使用以下的相機運動將這些3D點從新投影到當前幀的圖像上、

（2）

其中

是經過前一幀圖像重構後的3D點計算出的當前幀圖像上的像素點。

是前一幀圖像上的像素點。

該向量是表明了六個自由度的相對位姿（是兩個幀上同一點的相對位姿）

P rl 和 P rr 是3D點投影到左右相機上的像素的座標(non-homogeneous coordinates)

一般，能夠經過最小化測量和預測的加權平方偏差來得到最佳相機運動矢量Θ^ ，公式以下：

（3）

是使用跟蹤和匹配策略的當前幀中的匹配點

表明根據協方差矩陣Σ的曼哈頓距離

因此咱們根據以上這些可知，最優估計的運動向量 Θ^能夠根據（3）公式求得，但結果是依賴於圖像之間的匹配和跟蹤的精度的。

（4）

因爲篇幅限制，因此接下來的內容就請查看《運動i物體檢測論文（2）》，那麼能夠根據這個框架圖能夠總結一下文章的思想，在雙目視覺中，因爲咱們能夠根據雙目相機求得特徵點對應的深度信息，因此咱們使用上述的公以求得上一幀圖像中的特徵點，在當前幀圖像上的位置，，那麼根據該點的位移值便是咱們上文中說到的全局圖像運動光流（ Global Image Motion Flow GIMF）以後再利用KLT光流法求得咱們的測量光流（測量光流（ Measured Optical Flow MOF）），那麼這兩個光流值對於靜態物體而言，這兩個值是相等，而對於動態移動物體是有偏差的這也就是咱們上文中說到的殘餘圖像運動光流（ Residual Image Motion Flow RIMF），主要思想就是經過這個偏差來判斷該特徵點是屬於靜態特徵點仍是動態特徵點，固然文中仍是使用一些其餘方法來提升檢測的精度，可是主要的思想就是如此。接下來的文章是文中關於其餘一些技術上的說明。

有興趣的小夥伴能夠關注微信公衆號，加入QQ或者微信羣，和你們一塊兒交流分享吧（該羣主要是與點雲三維視覺相關的交流分享羣，歡迎你們加入並分享）算法

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。