Synthesizer: Rethinking Self-Attention in Transformer Models

Synthesizer: Rethinking Self-Attention in Transformer Models 這篇論文通過替換 Q × K T Q \times K^{T} Q×KTattention矩陣,發現Self-Attention中query-key-value dot product attention並不是不可或缺的。作者分別提出了Dense SynSynthesizer
相關文章
相關標籤/搜索