Tweeted By @ak92501
When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations
— AK (@ak92501) June 4, 2021
pdf: https://t.co/GYknaVoNAM
abs: https://t.co/kaUxIdMVNQ
+5.3% and +11.0% top-1 accuracy on ImageNet for ViT-B/16 and MixerB/16, with the simple Inception-style preprocessing pic.twitter.com/EI1ZSUccUn