Tweeted By @hardmaru
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
— hardmaru (@hardmaru) January 11, 2019
Up to 1800x faster than vanilla Transformer during evaluation. New SoTA results on Wikipedia (enwik8, text8, WikiText-103), One Billion Words, and PennTree Bank. 🔥https://t.co/xmpTB43t29