Think you understand weight decay? Think again. We found three distinct mechanisms by which it achieves a regularization effect, depending on the architecture and optimization algorithm. New paper w/ @Guodzh, Chaoqi Wang, and Bowen Xu.https://t.co/BXN9tedEPz
— Roger Grosse (@RogerGrosse) October 30, 2018