Attention is all you need 2020-05-15

Attention is all you need Abstract Transformer : 無recurrence和convolutions,只基於attention Introduction Recurrent models 是seq2seq model,h_t = f (position,h_t-1);不能並行運算, RNN 長期忘記,transformer: averaging att
相關文章
相關標籤/搜索