Tweeted By @ml_review
Synthesizer: Rethinking Self-Attention in Transformer Models
— ML Review (@ml_review) May 5, 2020
By @ytay017 @dara_bahri @MetzlerDonald
(1) random alignment matrices perform surprisingly well
(2) learning attention weights from (query-key) interactions are not so importanthttps://t.co/pGwh83gilU pic.twitter.com/4LlWGaEAIB