Ceshine's Data Science Tweet Collection

by jeremyphoward on 2019-04-10 (UTC).

Weight decay doesn't regularize, if you use batchnorm. Well it does, but not how you think. See this paper from @RogerGrosse's team. Originally mentioned by van Laarhoven (2017) and explored by Hoffer et al (2018).

Are there any other papers about this?https://t.co/IocFSvSkoU
— Jeremy Howard (@jeremyphoward) April 10, 2019

learning

by jeremyphoward on 2019-04-11 (UTC).

"Pretty nice" is quite the understatement. This is a wonderful in-depth discussion of the weird interactions between batchnorm, weight decay, and learning rate, including a fascinating experiment that shows that you can entirely replace weight decay with learning rate changes, https://t.co/DuCaJRRp91
— Jeremy Howard (@jeremyphoward) April 11, 2019

learning

Tags