Tweeted By @martin_gorner
This looks like the Vision Transformers architecture we have been waiting for: MaxViT https://t.co/WbzgJ50PjB
— Martin Görner (@martin_gorner) October 11, 2022
1/ State of the Art accuracy on ImageNet (no pre-training on huge datasets)
2/ Linear complexity wrt. image size (thanks to a clever attention design) pic.twitter.com/5bW0N7n3s5