1/5 Self-Distillation loop (feeding predictions as new target values & retraining) improves test accuracy. But why? We show it induces a regularization that progressively limits # of basis functions used to represent the solution. https://t.co/570qXFmlGj w/@farajtabar P.Bartlett pic.twitter.com/b79Q6ZSxlS
β Hossein Mobahi (@TheGradient) February 14, 2020