Tweeted By @evolvingstuff
REALLY cool improvement upon Transformer networks that makes use of recurrence and a relative positional encodings! Up to 1800x faster!
— Thomas Lahore (@evolvingstuff) January 10, 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Contexthttps://t.co/eV4iy1kOPT
TensorFlow & PyTorch: https://t.co/MqZAZKlhEn pic.twitter.com/TjrjOomYnb