萬字長文帶你一覽ICLR2020最新Transformers進展

文章目錄 1. Self-atention 的變體 Long-Short Range Attention Tree-Structured Attention with Subtree Masking Hashed Attention eXtra Hop Attention 2. 訓練目標 Discriminative Replacement Task Word and Sentence Struc
相關文章
相關標籤/搜索