Tweeted By @TheGradient

on 2020-02-14 (UTC)
learning

a group with 2 other tweets.

2/5 Knowledge distillation by @geoffreyhinton @OriolVinyalsML @JeffDean originally motivated to transfer knowledge from large to smaller networks. Self-distillation is special case with identical architectures; predictions of model are fed back to itself as new target values.
— Hossein Mobahi (@TheGradient) February 14, 2020

Tags