Tweeted By @evolvingstuff
Reformer: The Efficient Transformer
— Thomas Lahore (@evolvingstuff) January 20, 2020
"we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L^2) to O(L log L), where L is the length of the sequence"
paper: https://t.co/3o1scnoCCT
code: https://t.co/OjLbTyILln