代碼連接:https://github.com/karpathy/neuraltalk & https://github.com/karpathy/neuraltalk2 & https://github.com/zsdonghao/Image-Captioninggithub
在這篇文章中,做者借鑑了神經機器翻譯(Neural Machine Translation)領域的方法,將「編碼器-解碼器(Encoder-Decoder)」模型引入了神經圖像標註(Neural Image Captioning)領域,提出了一種端到端(end-to-end)的模型解決圖像標註問題。下面展現了從論文中截取的兩幅圖片,第一幅圖片是NIC模型的概述,第二幅圖片描述了網絡的細節。NIC網絡採用卷積神經網絡(CNN)做爲編碼器,長短時間記憶網絡(LSTM)做爲解碼器。學習
Hence, it is natural to use a CNN as an image 「encoder」, by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences.ui
An 「encoder」 RNN reads the source sentence and transforms it into a rich fixed-length vector representation, which in turn in used as the initial hidden state of a 「decoder」 RNN that generates the target sentence.編碼
It is a neural net which is fully trainable using stochastic gradient descent.lua
The model is trained to maximize the likelihood of the target description sentence given the training image.spa