Tweeted By @jaschasd
Takeaways: 1) A prescription for adjusting SGD hyperparameters with width 2) Generalization strictly improves with width 3) Test accuracy is predicted surprisingly well by a single scalar (eq 4) 4) There is a critical width, beyond which optimal hyperparameters are unachievable
— Jascha (@jaschasd) May 12, 2019