TensorFlow第八步 Nesterov's accelerated gradient descent+L2 regularization

L2 regularization: C=C0+lambda/n/2*sum(w^2)python Nesterov's accelerated gradient descentdom http://www.javashuo.com/article/p-snymsggd-cn.html學習 看上面一張圖仔細想一下就能夠明白,Nesterov動量法和經典動量法的差異就在B點和C點梯度的不一樣。spa
相關文章
相關標籤/搜索