論文連接:https://arxiv.org/pdf/1502.03044.pdfhtml
代碼連接:https://github.com/kelvinxu/arctic-captions & https://github.com/yunjey/show-attend-and-tell & https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflowgit
主要貢獻github
在這篇文章中,做者將「注意力機制(Attention Mechanism)」引入了神經機器翻譯(Neural Image Captioning)領域,提出了兩種不一樣的注意力機制:‘Soft’ Deterministic Attention Mechanism & ‘Hard’ Stochastic Attention Mechanism。下圖展現了"Show, Attend and Tell"模型的總體框架。框架
注意力機制的關鍵點在於,如何從圖像的特徵向量ai中計算獲得上下文向量zt。對於每個位置i,注意力機制可以產生一個權重eti。在Hard Attention機制中,權重αti所扮演的角色是圖像區域向量ai在t時刻被選中做爲解碼器的信息的機率,有且只有一個區域會被選中,爲此,引入變量st,i,當區域i被選中時爲1,不然爲0;在Soft Attention機制中,權重αti所扮演的角色是圖像區域向量ai在t時刻輸入解碼器的信息中所佔的比例。(參考Attention機制論文閱讀——Soft和Hard Attention,Multimodal —— 看圖說話(Image Caption)任務的論文筆記(二)引入attention機制)
spa
實驗細節.net
To create the annotations ai used by our decoder, we used the Oxford VGGnet pretrained on ImageNet without finetuning.翻譯
In our experiments we use the 14×14×512 feature map of the fourth convolutional layer before max pooling. This means our decoder operates on the flattened 196×512 (i.e L × D) encoding.code
The initial memory state and hidden state of the LSTM are predicted by an average of the annotation vectors fed through two separate MLPs(init,c and init,h).htm
版權聲明:本文爲博主原創文章,歡迎轉載,轉載請註明做者及原文出處!blog