Tweeted By @hardmaru

on 2021-04-12 (UTC)
research

It turns out that vanilla optimizers such as Nesterov momentum and Adam work just as fine for large batch sizes.

Paper by @zacharynado, @jmgilmer, Chris Shallue, @_arohan_ and George Dahl conducted extensive ablations training vision and language models.https://t.co/KU8pkgAPfn https://t.co/u23KigmXTb
— hardmaru (@hardmaru) April 12, 2021

Tweeted By @hardmaru

Tags