Tweeted By @karpathy
Badly tuned LR decay schedules are an excellent way to silently shoot yourself in the foot. Models can often look like they are converging but it's just LR getting too low too fast. FixedLR (+optional warmup) with 1 manual decay of 10X on plateau is a safe strong baseline.
— Andrej Karpathy (@karpathy) August 27, 2021