從Transformers學習跨模態編碼器表示《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》

目錄 一、文獻摘要介紹 二、網絡框架介紹 三、實驗分析 四、結論 一、文獻摘要介紹 Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly, the alignment and relationships between
相關文章
相關標籤/搜索