Tweeted By @Tim_Dettmers
In a few equations how it relates to grad clipping. This is the most important part of LAMB/LARS to understand.
— Tim Dettmers (@Tim_Dettmers) July 6, 2021
Grad clip:
if norm(grad) > max_norm:
grad *= norm(grad)/max_norm
Update clip:
if norm(update) > max_unorm:
update *= norm(update)/max_unorm