Tweeted By @jb_cordonnier
We added the visualisation of the attention patterns of Vision Transformer!https://t.co/zKZGRLCTmp
— Jean-Baptiste Cordonnier (@jb_cordonnier) December 7, 2020
Some heads learn translation equivarient attention to extract patches at fixed shifts. Other heads rely on color similarity or maybe more semantic features deeper in the network. pic.twitter.com/fHIXRQR1fS