Tweeted By @hardmaru
Self-attention Does Not Need O(n²) Memory
— hardmaru (@hardmaru) December 13, 2021
“We provide a practical implementation for accelerators that requires O(√n) memory, is numerically stable, and is within a few percent of the runtime of the standard implementation of attention.”https://t.co/iMQvCMnWgi