Tweeted By @hillbig
Reformer introduces two techniques for transformer 1) using LSH to reduce the cost of key-query dot production from O(L^2) to O(L) 2) RevNet-based (original transformer satisfies the condition) computation of intermediate activations at backpropagation. https://t.co/nhINaHq4O5
— Daisuke Okanohara (@hillbig) November 16, 2019