Distilling transformers into simple neural networks with unlabeled transfer data論文解讀

Distilling transformers into simple neural networks with unlabeled transfer data 論文地址:https://arxiv.org/pdf/1910.01769.pdf motivation 一般來說,蒸餾得到的student模型與teacher模型的準確率還存在差距。文章利用大量in-domain unlabeled t
相關文章
相關標籤/搜索