Tweeted By @karpathy
This paper gives me anxiety. BatchNorm is the most deviously subtly complex layer in deep learning. Many issues (silently) root cause to it. Yet it is ubiquitous because it works well (it multi-task helps optimization/regularization) and can be fused to affines at inference time. https://t.co/3EC2Abm8Ry
— Andrej Karpathy (@karpathy) May 18, 2021