Tweeted By @tanmingxing
Happy to introduce CoAtNet: combining convolution and self-attention in a principled way to obtain better capacity and better generalization.
— Mingxing Tan (@tanmingxing) June 10, 2021
88.56% top-1 with ImageNet21K (13M imgs), matching ViT-huge with JFT (300M imgs).
Paper: https://t.co/AQE33LuzSr pic.twitter.com/YEly0cSaTp