本博客僅爲作者記筆記之用,不對之處,望見諒,歡迎批評指正。
更多相關博客請查閱:http://blog.csdn.net/weixin_39779106;
如需轉載,請附上本文鏈接:http://blog.csdn.net/weixin_39779106/article/details/79689208
原文摘要 | 翻譯 |
---|---|
In this work, we propose VLocNet, a new convolutional neural network architecture for 6-DoF global pose regression and odometry estimation from consecutive monocular images. | 本文提出了VLocNet,是一種用於對連續圖像進行6自由度全局姿態迴歸和里程計估計的CNN網絡。 |
Our multitask model incorporates hard parameter sharing, thus being compact and enabling real-time inference, in addition to being end-to-end trainable. | 我們的多任務模型實現了參數分享,因此除了可以進行端到端的訓練之外,同還十分的緊湊可以實時運行。 |
We propose a novel loss function that utilizes auxiliary learning to leverage relative pose information during training, thereby constraining the search space to obtain consistent pose estimates. | 我們提出了一種新的損失函數,通過輔助學習在訓練時利用相對位姿信息,從而約束搜索空間以獲得一致的姿態估計。 |
even our single task model exceeds the performance of state-of-the-art deep architectures for global localization, while achieving competitive performance for visual odometry estimation. | 本文所提出的單任務模型就已經超過了目前性能最好的基於深度學習框架的全局定位系統,同時在視覺里程計估計方面也取得了相當有競爭力的表現。 |
Furthermore, we present extensive experimental evaluations utilizing our proposed Geometric Consistency Loss that show the effectiveness of multitask learning and demonstrate that our model is the first deep learning technique to be on par with, and in some cases outperforms state-of-theart SIFT-based approaches. | 此外,我們通過大量實驗評估利了使用本文提出的幾何一致性損失函數,表明了多任務學習的有效性。實驗結果同時表明了我們所提出的模型是目前唯一一個可以與基於SIFT算法性能相提並論的基於深度學習的算法,甚至在某些情況下,我們所提出的算法的性能優於基於SIFT的方法。 |
原文摘要 | 翻譯 |
---|---|
From a robot’s learning perspective, it is unlucrative and unscalable to have multiple specialized single-task models as they inhibit both intertask and auxiliary learning. This has lead to a recent surge in research targeted towards frameworks for learning unified models for a range of tasks across different domains. | 從機器人學習的角度來看,擁有多個針對單一任務訓練的模型是毫無收益,也是不可擴展的,因爲這樣做抑制了任務之間相互學習以及輔助學習,因此最近的研究熱潮更多的關注於針對不同任務訓練統一的模型。 |
An evident advantage is the resulting compact model size in comparison to having multiple task-specific models. Auxiliary learning approaches on the other hand, aim at maximizing the prediction of a primary task by supervising the model to additionally learn a secondary task. | 與訓練多個特定任務的模型相比,訓練統一模型最顯而易見的優勢在於其結構更加緊湊(模型尺寸更小)。另一方面,輔助學習可以通過監督模型額外學習次要任務來最大程度提高主要任務預測的精度。 |
For instance, in the context of localization, humans often describe their location to each other with respect to some reference landmark in the scene and giving their position relative to it. Here, the primary task is to localize and the auxiliary task is to be able to identify landmarks. | 例如對於定位問題而言,人們通常會根據場景中的一些參考地標來描述彼此的位置,如給出相對於這些地標的位置。在這裏,主要任務是定位,輔助任務是識別地標。 |
Similarly, we can leverage the complementary relative motion information from odometry to constrict the search space while training the global localization model. | 同樣,我們可以利用里程計中的相對運動信息來限制搜索空間的同時訓練全局定位模型。 |
個人理解1:本段主要講述了爲什麼要使用輔助學習,因爲現有的模型通常是針對單一任務訓練的,而機器人系統本身爲多任務系統,將多個單一任務模型並行使用的性能不如利用輔助訓練得到的多任務模型(一個模型完成多項任務)。作者舉了兩個例子,即定位和地標識別之間可以互補,從而提高兩者的精度,以及里程計信息也可以利用相對運動信息限制全局定位搜索空間。
個人理解2:本段同樣提出了多任務模型的難題,1.首先需要確定如何構建網絡框架從而可以實現多任務學習;2.由於現有不同任務針對的網絡具有不同的屬性和不同的收斂速度,如何實現聯合優化。接下來作者將提出針對這兩個問題的解決方案。
原文摘要 | 翻譯 |
---|---|
In this work, we address the problem of global pose regression by simultaneously learning to estimate visual odometry as an auxiliary task. We propose the VLocNet architecture consisting of a global pose regression sub-network and a Siamese-type relative pose estimation sub-network. Our network based on the residual learning framework, takes two consecutive monocular images as input and jointly regresses the 6-DoF global pose as well as the 6-DoF relative pose between the images. We incorporate a hard parameter sharing scheme to learn inter-task correlations within the network and present a multitask alternating optimization strategy for learning shared features across the network. Furthermore, we devise a new loss function for global pose regression that incorporates the relative motion information during training and enforces the predicted poses to be geometrically consistent with respect to the true motion model. | 本文主要解決的是全局位姿迴歸以及同時實現視覺里程計的問題,其中全局重定位是主任務,視覺里程計是輔助任務。本文提出的VLocNet結構包含了一個全局位姿迴歸子網絡和一個連體式(Siamese-type)相對姿態估計子網絡。本文所提出的網絡基於殘差學習框架,將兩個兩個連續單目圖片作爲輸入,共同迴歸得到6自由度全局位姿以及圖像之間的6自由度相對位姿。我們引入了一個硬參數共享方案來學習網絡內部任務間的相關性,同時提出了一個多任務交替優化策略,用於學習整個網絡中的共享特徵。此外,我們設計了一種新的全局位姿迴歸損失函數,該函數包含了訓練期間的相對運動信息,並強制預測的位姿在幾何上與真實運動模型相一致。 |
We present extensive experimental evaluations on both indoor and outdoor datasets comparing the proposed method to state-ofthe-art approaches for global pose regression and visual odometry estimation. We empirically show that our proposed VLocNet architecture achieves state-of-the-art performance compared to existing CNN-based techniques. To the best of our knowledge, our presented approach is the first deep learning-based localization method to perform on par with local feature-based techniques. Moreover, our work is the first attempt to show that a joint multitask model can precisely and efficiently outperform its task-specific counterparts for global pose regression and visual odometry estimation. | 通過大量實驗(包括使用大量室內室外數據集)本文對比了所提出的方法以及目前最好的用於全局位姿迴歸和視覺里程計估計的方法,直觀地展示了我們提出的VLocNet框架的性能要優於現有的基於CNN的方法。據我們所知,本文提出的方法是首次實現基於深度學習方法與基於局部視覺特徵方法的性能相媲美。此外,通過同時實現全局位姿迴歸和視覺里程估計,我們的工作首次嘗試證明了聯合多任務模型可以精確而有效地勝過多單一任務模型的並行。 |