Homepage
Close
Menu

Site Navigation

  • Home
  • Archive(TODO)
    • By Day
    • By Month
  • About(TODO)
  • Stats
Close
by evolvingstuff on 2019-01-10 (UTC).

REALLY cool improvement upon Transformer networks that makes use of recurrence and a relative positional encodings! Up to 1800x faster!

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Contexthttps://t.co/eV4iy1kOPT

TensorFlow & PyTorch: https://t.co/MqZAZKlhEn pic.twitter.com/TjrjOomYnb

— Thomas Lahore (@evolvingstuff) January 10, 2019
nlpw_coderesearch
by hardmaru on 2019-01-11 (UTC).

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Up to 1800x faster than vanilla Transformer during evaluation. New SoTA results on Wikipedia (enwik8, text8, WikiText-103), One Billion Words, and PennTree Bank. 🔥https://t.co/xmpTB43t29

— hardmaru (@hardmaru) January 11, 2019
researchnlp
by hardmaru on 2019-01-11 (UTC).

They also released PyTorch and TF implementations with pretrained models: https://t.co/VrE2nR5wBY

— hardmaru (@hardmaru) January 11, 2019
nlpw_coderesearch
by jeremyphoward on 2019-01-11 (UTC).

Yeah I think it says a lot more about the problems of academic review than problems with the paper

— Jeremy Howard (@jeremyphoward) January 11, 2019
nlpresearch
by Smerity on 2019-01-11 (UTC).

Excited for the Transformer-XL codebase! It also extends my AWD-LSTM `https://t.co/F18vD0XbfY` script to download the One Billion Words + text8 datasets (original grabbed WikiText-2, WikiText-103, enwik8 and PTBC) whilst keeping the most important part ;)https://t.co/aWkksPxmdr pic.twitter.com/6UaO6O2BDh

— Smerity (@Smerity) January 11, 2019
nlpw_coderesearch
by hardmaru on 2019-01-11 (UTC).

Compared to RNNs, the Transformer family of architectures seemingly scale to hundreds of millions of parameters with relative ease. pic.twitter.com/Xo7Ng5b2Gj

— hardmaru (@hardmaru) January 11, 2019
nlpresearch
by hardmaru on 2019-01-17 (UTC).

Transformer-XL: Combining Transformers and RNNs Into a State-of-the-art Language Model

Blog post by @HorevRani giving an overview of the model and key concepts such as the recurrence mechanism and the relative positional encoding scheme.https://t.co/ORv18GkZBv https://t.co/l1OJKvUNyc

— hardmaru (@hardmaru) January 17, 2019
learning

Tags

learning tutorial misc nlp rstats gan ethics research dataviz survey python tool security kaggle video thought bayesian humour tensorflow w_code bias dataset pytorch cv tip application javascript forecast swift golang rl jax julia gnn causal surey diffusion
© Copyright Philosophy 2018 Site Template by Colorlib