Tweeted By @ak92501
Do Vision Transformers See Like Convolutional Neural
— AK (@ak92501) August 20, 2021
Networks?
pdf: https://t.co/5Yz5F2PZwO
abs: https://t.co/bpHO2rOYDv
find striking differences between the two architectures, such as ViT having more uniform representations across all layers pic.twitter.com/0KT0KE16f9