Tweeted By @karpathy
Nice new paper improving image generation and (generative) unsupervised representation learning https://t.co/OYB21fe3sm uses ViT instead of CNN to improve VQGAN into a new "ViT-VQGAN" image patch tokenizer. Tokens are then fed into a GPT for image generation, or linear probing. pic.twitter.com/cepvrv55Jk
— Andrej Karpathy (@karpathy) October 8, 2021