attention is all you need筆記

時間 2021-07-14

標籤論文筆記 attention筆記简体版

原文原文鏈接

傳統 encoder 輸入：符號序列 x1,…,xn 輸出：連續表示z1,…,zn decoder 輸入：連續表示z1,…,zn 輸出：符號序列y1,….ym 作者原創 transformer 使用堆疊的self-attention和point-wise,全連接層。（左encoder，右decoder） transformer結構圖 encoder 由6個相同層堆疊而成。每個層有2個子層，首先

>>阅读原文<<