Three Mechanisms of Weight Decay Regularization
10 Apr 2019, Prathyush SPWeight decay doesn’t regularize, if you use batchnorm. Well it does, but not how you think. See this paper from @RogerGrosse’s team. Originally mentioned by van Laarhoven (2017) and explored by Hoffer et al (2018).
For more details, visit the source.