Homepage
Close
Menu

Site Navigation

  • Home
  • Archive(TODO)
    • By Day
    • By Month
  • About(TODO)
  • Stats
Close
by chipro on 2019-05-17 (UTC).

SOTA for PTB without extra data is 46.54 (Transformer-XL 54.5). On paperswithcode, all top models on WikiText-103 & 1billion are transformer, and all top models on small datasets are lstm. Could just be hp but could also be something else https://t.co/Vtms96ScKd

— Chip Huyen (@chipro) May 17, 2019
nlpresearch
by jeremyphoward on 2019-05-17 (UTC).

AWD-LSTM benefits from all the work done on regularization by @Smerity . Not sure there's the same richness of regularization available just yet for transformer architectures? It's particularly important for small datasets

— Jeremy Howard (@jeremyphoward) May 17, 2019
nlpthought
by m__dehghani on 2019-05-17 (UTC).

The "recurrent inductive bias" of RNNs usually helps them be more data efficient, compared to vanilla Transformer. If you introduce such a bias to Transformers (like recurrence in depth in Universal Transformers), they generalize better on small datasets: https://t.co/gWzKXz8xRU

— Mostafa Dehghani (@m__dehghani) May 17, 2019
nlpresearch

Tags

learning tutorial misc nlp rstats gan ethics research dataviz survey python tool security kaggle video thought bayesian humour tensorflow w_code bias dataset pytorch cv tip application javascript forecast swift golang rl jax julia gnn causal surey diffusion
© Copyright Philosophy 2018 Site Template by Colorlib