Tweeted By @hardmaru
Reformer: The Efficient Transformer
— hardmaru (@hardmaru) December 28, 2019
They present techniques to reduce the time and memory complexity of Transformer, allowing batches of very long sequences (64K) to fit on one GPU. Should pave way for Transformer to be really impactful beyond NLP domainhttps://t.co/8IVrrSqAvq pic.twitter.com/3YwD4A5JQs