Ceshine's Data Science Tweet Collection

by TheGradient on 2020-02-14 (UTC).

1/5 Self-Distillation loop (feeding predictions as new target values & retraining) improves test accuracy. But why? We show it induces a regularization that progressively limits # of basis functions used to represent the solution. https://t.co/570qXFmlGj w/@farajtabar P.Bartlett pic.twitter.com/b79Q6ZSxlS
— Hossein Mobahi (@TheGradient) February 14, 2020

research learning

by TheGradient on 2020-02-14 (UTC).

2/5 Knowledge distillation by @geoffreyhinton @OriolVinyalsML @JeffDean originally motivated to transfer knowledge from large to smaller networks. Self-distillation is special case with identical architectures; predictions of model are fed back to itself as new target values.
— Hossein Mobahi (@TheGradient) February 14, 2020

learning

by ericjang11 on 2020-02-16 (UTC).

self-distillation takes many useful forms in ML research (Q-learning, quantization, teacher-student architectures). Awesome fundamental work! https://t.co/6trYyiANxZ
— Eric Jang 🇺🇸🇹🇼 (@ericjang11) February 16, 2020

research learning

Tags