Tweeted By @jb_cordonnier

on 2020-12-07 (UTC)
research cv

We added the visualisation of the attention patterns of Vision Transformer!https://t.co/zKZGRLCTmp

Some heads learn translation equivarient attention to extract patches at fixed shifts. Other heads rely on color similarity or maybe more semantic features deeper in the network. pic.twitter.com/fHIXRQR1fS
— Jean-Baptiste Cordonnier (@jb_cordonnier) December 7, 2020

Tweeted By @jb_cordonnier

Tags