Watch,Listen,and Describe:Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

時間 2021-01-02

原文原文鏈接

這是NAACL2018的一篇關於video caption（CV與NLP結合）的文章，paper鏈接https://arxiv.org/abs/1804.05448，一作是加州大學聖塔芭芭拉分校（UCSB）的PHD，作者的homepage http://www.cs.ucsb.edu/~xwang/，code還沒有被released出來（作者沒有release code的習慣）。個人瞎扯：看這

>>阅读原文<<