Tweeted By @DeepMind

on 2021-02-03 (UTC)
research

Why does Stochastic Gradient Descent generalise well in deep networks?

Our team shows that if the learning rate is small but finite, the mean iterate of random shuffling SGD stays close to the path of gradient flow, but on a modified loss landscape https://t.co/JUAzPujWfP pic.twitter.com/NtlbVaMfLC
— DeepMind (@DeepMind) February 3, 2021

Tweeted By @DeepMind

Tags