Synthesizer: Rethinking Self-Attention in Transformer Models

時間 2021-01-12

標籤 NLP 機器學習深度學習简体版

原文原文鏈接

Synthesizer: Rethinking Self-Attention in Transformer Models 這篇論文通過替換 Q × K T Q \times K^{T} Q×KTattention矩陣，發現Self-Attention中query-key-value dot product attention並不是不可或缺的。作者分別提出了Dense SynSynthesizer

>>阅读原文<<