Tweeted By @ak92501
Memorizing Transformers
— AK (@ak92501) March 18, 2022
abs: https://t.co/T4xmmbcOMI
extension to Transformer architecture, called kNN-augmented attention, which dramatically increases the length of the context that a language model can attend to by using k-nearest-neighbor lookup into a large external memory pic.twitter.com/IKLNTvKols