Tweeted By @NandoDF

on 2022-03-13 (UTC)
research

This paper has a very clear presentation of different attention architectures in transformers. I’d be thankful if people could share their experience in trying multi-query vs standard multi-head attention. Thanks https://t.co/aY1AW5etWI
— Nando de Freitas 🏳️‍🌈 (@NandoDF) March 13, 2022

Tweeted By @NandoDF

Tags