Tweeted By @_akhaliq
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
— AK (@_akhaliq) September 15, 2022
abs: https://t.co/Wjx8nicsLJ pic.twitter.com/qLNPpomFdR
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
— AK (@_akhaliq) September 15, 2022
abs: https://t.co/Wjx8nicsLJ pic.twitter.com/qLNPpomFdR