Tweeted By @_akhaliq
GIT: A Generative Image-to-text Transformer for Vision and Language
— AK (@_akhaliq) May 30, 2022
abs: https://t.co/iFly0pcoXM
model surpasses the human performance for the first time on TextCaps (138.2 vs. 125.5 in CIDEr) pic.twitter.com/vn9LV98Dwr