Tweeted By @ak92501
VLMO: Unified Vision-Language Pre-Training with
— AK (@ak92501) November 4, 2021
Mixture-of-Modality-Experts
abs: https://t.co/Rv9o8aFIdI
introduce Mixture-of-Modality-Experts Transformer,
where each block contains a pool of modality-specific experts and a shared self attention layer pic.twitter.com/4k0YFlvgsR