從Attention到Transformer再到bert的理解

時間 2020-12-30

標籤算法學習總結 nlp 简体版

原文原文鏈接

1.最原始的attention [1] Lin, Zhouhan, et al. 」A structured self-attentive sentence embedding.」 arXiv preprint arXiv:1703.03130 (2017). 2. attention is all you need QKV：其實也是一個attention，求出來的還是權重，只是計算的方式不同:

>>阅读原文<<