Tweeted By @ml_review
Augmenting Self-attention with Persistent Memory
— ML Review (@ml_review) August 25, 2019
w/ @exgrv @GuillaumeLample
Replaces the feed-forward layer with persistent memory vectors.
Reduces the memory footprint of a transformer while preserving performance.https://t.co/2BcXQupBjt pic.twitter.com/2GlSGF9bzW