Distilling transformers into simple neural networks with unlabeled transfer data論文解讀

時間 2021-01-02

標籤 NLP 自然語言處理深度學習简体版

原文原文鏈接

Distilling transformers into simple neural networks with unlabeled transfer data 論文地址：https://arxiv.org/pdf/1910.01769.pdf motivation 一般來說，蒸餾得到的student模型與teacher模型的準確率還存在差距。文章利用大量in-domain unlabeled t

>>阅读原文<<