Tweeted By @karpathy

on 2019-01-07 (UTC)
misc thought

The raw value of a loss (in a multitask setting) does not reflect how much your model "cares" about that component. E.g. an L1 loss can report arbitrarily large loss value based on loss scale but the gradient will always be \in {-1,1}. The grad magnitude is what actually matters.
— Andrej Karpathy (@karpathy) January 7, 2019

Tweeted By @karpathy

Tags