Tweeted By @jeremyphoward
How Do Vision Transformers Work?
— Jeremy Howard (@jeremyphoward) February 19, 2022
"...we propose AlterNet, a model in which Conv blocks at the end of a stage are replaced with MSA blocks. AlterNet outperforms CNNs not only in large data regimes but also in small data regimes." https://t.co/edPXnu0cn8