Tweeted By @unsorsodicorda
Impressive 172 pp. paper from @DeepMind & @GoogleAI: train deep nets on ImageNet with SGD w/o batch norm, and even w/o skip connection if you substitute SGD with a better optimizer s.a. K-FAC or Shampoo. Shocking! And probably very useful for theory. https://t.co/CoPGVN3Csl pic.twitter.com/hmJ3N2ZMl1
— andrea panizza (@unsorsodicorda) October 10, 2021