On the Efficacy of Knowledge Distillation

時間 2021-07-14

標籤 Knowledge Distillation 简体版

原文原文鏈接

Motivation 實驗觀察到：並不是性能越好的teacher就能蒸餾(教)出更好的student，因此本文想梳理出影響蒸餾性能的因素推測是容量不匹配的原因，導致student模型不能夠mimic teacher，反而帶偏了主要的loss 之前解決該問題的做法是逐步的進行蒸餾，但是效果也不好。左邊Teacher爲WRN k-1，k是深度，Student是WRN16-1和DN40-12(Den

>>阅读原文<<