Tweeted By @karpathy

on 2018-10-16 (UTC)
misc

a group with 2 other tweets.

good post & links! Touches on gradient accumulation, gradient checkpointing (no, not the normal checkpointing), the nearly unambiguous superiority of distributed data parallel container in PyTorch, and the overall importance of understanding what's under the hood. https://t.co/2WYZRz9a2X
— Andrej Karpathy (@karpathy) October 16, 2018

Tags