從Transformers學習跨模態編碼器表示《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》

時間 2021-01-13

原文原文鏈接

目錄一、文獻摘要介紹二、網絡框架介紹三、實驗分析四、結論一、文獻摘要介紹 Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly, the alignment and relationships between

>>阅读原文<<