Weight decay doesn't regularize, if you use batchnorm. Well it does, but not how you think. See this paper from @RogerGrosse's team. Originally mentioned by van Laarhoven (2017) and explored by Hoffer et al (2018).
— Jeremy Howard (@jeremyphoward) April 10, 2019
Are there any other papers about this?https://t.co/IocFSvSkoU