Tweeted By @tanmingxing
Nystromformer: a new linear self-attention.
— Mingxing Tan (@tanmingxing) February 10, 2021
It turns out a simple Nyström method is quite effective in approximating the full attention, outperforming reformer/linformer/performer by +3% accuracy on LRA. @YoungXiong1
Paper: https://t.co/fTBHZ1F9Lr
Code: https://t.co/BjxpzgIMVG https://t.co/OWzflBMrCt pic.twitter.com/PmYepSLR7r