Tweeted By @TheGradient
2/5 Knowledge distillation by @geoffreyhinton @OriolVinyalsML @JeffDean originally motivated to transfer knowledge from large to smaller networks. Self-distillation is special case with identical architectures; predictions of model are fed back to itself as new target values.
— Hossein Mobahi (@TheGradient) February 14, 2020