Tweeted By @karpathy
Great paper and thread!
— Andrej Karpathy (@karpathy) November 13, 2021
- 😮that super simple MSE loss works vs. BEiT-style dVAE (multi-modal) cross-entropy
- <3 efficiency of asymmetric encoder/decoder
- 👏detailed training recipes
- +1 v curious about dataset size scaling
- bit of lack of commentary on test-time protocol https://t.co/MQFAvrqBvr