論文連接:https://arxiv.org/pdf/1411.4555.pdfgit
代碼連接:https://github.com/karpathy/neuraltalk & https://github.com/karpathy/neuraltalk2 & https://github.com/zsdonghao/Image-Captioninggithub
主要貢獻網絡
在這篇文章中,做者借鑑了神經機器翻譯(Neural Machine Translation)領域的方法,將「編碼器-解碼器(Encoder-Decoder)」模型引入了神經圖像標註(Neural Image Captioning)領域,提出了一種端到端(end-to-end)的模型解決圖像標註問題。下面展現了從論文中截取的兩幅圖片,第一幅圖片是NIC模型的概述,第二幅圖片描述了網絡的細節。NIC網絡採用卷積神經網絡(CNN)做爲編碼器,長短時間記憶網絡(LSTM)做爲解碼器。學習
實驗細節優化
Hence, it is natural to use a CNN as an image 「encoder」, by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences.ui
An 「encoder」 RNN reads the source sentence and transforms it into a rich fixed-length vector representation, which in turn in used as the initial hidden state of a 「decoder」 RNN that generates the target sentence.編碼
It is a neural net which is fully trainable using stochastic gradient descent.lua
The model is trained to maximize the likelihood of the target description sentence given the training image.spa
版權聲明:本文爲博主原創文章,歡迎轉載,轉載請註明做者及原文出處!翻譯