Tweeted By @hardmaru

on 2021-12-13 (UTC)
research

Self-attention Does Not Need O(n²) Memory

“We provide a practical implementation for accelerators that requires O(√n) memory, is numerically stable, and is within a few percent of the runtime of the standard implementation of attention.”https://t.co/iMQvCMnWgi
— hardmaru (@hardmaru) December 13, 2021

Tweeted By @hardmaru

Tags