Tweeted By @ak92501
Focal Self-attention for Local-Global Interactions in
— AK (@ak92501) July 2, 2021
Vision Transformers
pdf: https://t.co/2mFN1OQzVG
largest Focal Transformer yields 58.7/58.9 box mAPs and 50.9/51.3 mask mAPs on COCO mini-val/test-dev, and 55.4 mIoU on ADE20K for semantic segmentation pic.twitter.com/ij7VYIbcQR