論文筆記：Learning Dynamic Memory Networks for Object Tracking

時間 2019-11-25

標籤論文筆記 learning dynamic memory networks object tracking 简体版

原文原文鏈接

Learning Dynamic Memory Networks for Object Tracking html

ECCV 2018
Updated on 2018-08-05 16:36:30
git

Code: https://github.com/skyoung/MemTrack (Tensorflow Implementation)算法

【Note】This paper is developed based on Siamese Network and DNC（Nature-2016）, please check these two papers for details to better understand this paper. 網絡

DNC: http://www.javashuo.com/article/p-shqbueiu-be.html Paper: http://www.nature.com/nature/journal/vaop/ncurrent/pdf/nature20101.pdf app

Siamese Network based tracker: http://www.javashuo.com/article/p-cswwzple-s.html Paper: Fully-Convolutional Siamese Network for Object Trackingide

Another tracking paper which also utilizes memory network: MAVOT: Memory-Augmented Video Object Tracking, arXiv 函數

=================================學習

Motivation：想利用動態記憶網絡（Dynamic Memory Network）來動態的更新 target template，以使得基於孿生網絡的跟蹤算法能夠更好的掌握目標的 feature，能夠學習到更好的 appearance model，從而實現更加準確的定位。this

Method：主要是基於 Dynamic Memory Network 來實現目標物體的準確更新。經過動態的存儲和讀寫 tracking results，來結合原始的 object patch，基於 Siamese Network Tracker 進行跟蹤，速度能夠達到：50 FPS。

Approach Details：

Dynamic Memory Networks for Tracking：

1. Feature Extraction：

　　本文的特徵提取方面，借鑑了 SiamFC；此處不細說。

2. Attention Scheme：

　　本文介紹 Attention 機制引入的動機爲：Since the object information in the search image is needed to retrieve the related template for matching, but the object location is unknown at first, we apply an attention mechanism to make the input of LSTM concentrate more on the target. 簡單來說，就是爲了更好的肯定所要跟蹤的目標的位置，以更加方便的提取 proposals。

　　做者採用大小爲 6*6*256 的 square patch 以滑動窗口的方式，對整個 search image 進行 patch 的劃分。爲了進一步的減小 square patch 的大小，咱們採用了一種 average pooling 的方法：

那麼，通過 attend 以後的 feature vector，能夠看作是這些特徵向量的加權組合（the weighted sum of the feature vectors）：

其中，L 是圖像塊的個數，加權的權重能夠經過 softmax 函數計算出來，計算公式以下：

其中，這個就是 attention network，輸入是：LSTM 的 hidden state $h_{t-1}$，以及 a square patch。另外的 W 以及 b 都是能夠學習的網絡權重和誤差。

下圖展現了相關的視覺效果：

3. LSTM Memory Controller

此處，該網絡的控制也是經過 lstm 來控制的，即：輸入是上一個時刻的 hidden state，以及當前時刻從 attention module 傳遞過來的 attended feature vector，輸出一個新的 hidden state 來計算 memory control signals，即：read key, read strength, bias gates, and decay rate。

4. Memory Reading && Memory Writting && Residual Template Learning：

==>> 咱們能夠從以下的這兩個視角來看點這個 read 和 write 的問題：

對於 Read，給定 LSTM 的輸入信號，咱們能夠得到 Read Key 及其對應的 read strength，而後根據這個 vector 和 memory 中的記憶片斷，進行 read weight 的計算，而後肯定是否讀取對應的 template；

具體來講：

（1） read key 及其 read strength 的計算能夠用以下的公式：

（2）read weight：

（3）the template is retrieved from memory:

（4）最終模板的學習，能夠經過以下公式計算得出：

對於 Write，給定 LSTM 的輸入信號，咱們能夠計算 BiasGates 的三個值，從而知道衰減率（decay rate），能夠計算出擦除因子（erase factor），咱們根據得到的 write weight，來控制是否將 new templates 寫入到 memory 中，以及寫入多少的問題。、

（1）The write weight：

（2）The write gate：

（3）The allocation weight:

（4）最終模板的寫入以及寫入多少的控制：

==>> Experimental Results: