Homepage
Close
Menu

Site Navigation

  • Home
  • Archive(TODO)
    • By Day
    • By Month
  • About(TODO)
  • Stats
Close
by jeremyphoward on 2019-04-10 (UTC).

Weight decay doesn't regularize, if you use batchnorm. Well it does, but not how you think. See this paper from @RogerGrosse's team. Originally mentioned by van Laarhoven (2017) and explored by Hoffer et al (2018).

Are there any other papers about this?https://t.co/IocFSvSkoU

— Jeremy Howard (@jeremyphoward) April 10, 2019
learning
by jeremyphoward on 2019-04-11 (UTC).

"Pretty nice" is quite the understatement. This is a wonderful in-depth discussion of the weird interactions between batchnorm, weight decay, and learning rate, including a fascinating experiment that shows that you can entirely replace weight decay with learning rate changes, https://t.co/DuCaJRRp91

— Jeremy Howard (@jeremyphoward) April 11, 2019
learning

Tags

learning tutorial misc nlp rstats gan ethics research dataviz survey python tool security kaggle video thought bayesian humour tensorflow w_code bias dataset pytorch cv tip application javascript forecast swift golang rl jax julia gnn causal surey diffusion
© Copyright Philosophy 2018 Site Template by Colorlib