Tweeted By @ak92501

on 2021-11-03 (UTC)
research

Can Vision Transformers Perform Convolution?
abs: https://t.co/rsHhON89sV

a single ViT layer with image patches as the input can perform any convolution operation constructively, where the multi-head attention mechanism and the relative positional encoding play essential roles pic.twitter.com/Qw1RqqEfjV
— AK (@ak92501) November 3, 2021

Tweeted By @ak92501

Tags