1804.03235-Large scale distributed neural network training through online distillation.md

時間 2021-01-13

原文原文鏈接

1804.03235-Large scale distributed neural network training through online distillation.md 現有分佈式模型訓練的模式分佈式SGD 並行SGD：大規模訓練中，一次的最長時間取決於最慢的機器異步SGD：不同步的數據，有可能導致權重更新向着未知方向並行多模型：多個集羣訓練不同的模型，再組合最終模型，但是會

>>阅读原文<<