Tweeted By @hardmaru
Another Transformer variant with lower computational complexity, suitable for long-range tasks, is Sparse Sinkhorn Attention (https://t.co/qWp2AJVdkd) by Yi Tay et al.
— hardmaru (@hardmaru) April 8, 2020
A GitHub Colab reimplementation in PyTorch (https://t.co/B5FcGuTZhy) also combined it with ideas from Reformer. https://t.co/WSwZuSRyPb pic.twitter.com/54fJrRbhEA