Tweeted By @GoogleAI
Introducing the Multimodal Bottleneck Transformer, a novel transformer-based model for multimodal fusion that restricts cross-modal attention flow to achieve state-of-the-art results on video classification tasks with less compute. Read more ↓ https://t.co/BXMVgap0ID pic.twitter.com/Pb8b3j1A5N
— Google AI (@GoogleAI) March 15, 2022