Ceshine's Data Science Tweet Collection

by srush_nlp on 2020-04-02 (UTC).

Another point of reference from @Smerity https://t.co/H9KcMKcdKd
— Sasha Rush (@srush_nlp) April 2, 2020

nlp research

by srush_nlp on 2020-04-03 (UTC).

Winning response. Props for author for responding.

I will accept other submissions if others are motivated to find a different solution.https://t.co/Hft21inqhL

Very interesting explanation for why this is so difficult, and why it should arguably not be used in the future. pic.twitter.com/IyvtzVNkDH
— Sasha Rush (@srush_nlp) April 3, 2020

nlp research

by ylecun on 2020-04-03 (UTC).

The Transformer-XL results from Google Brain on language modeling could not be reproduced by some top NLP researchers (and the authors are not helping).@srush_nlp offers a bounty for whoever can reproduce the results.
(I assume the authors are excluded from the challenge!). https://t.co/ssnMjSVxdd
— Yann LeCun (@ylecun) April 3, 2020

nlp research

by Tim_Dettmers on 2020-04-08 (UTC).

How can you successfully train transformers on small datasets like PTB and WikiText-2? Are LSTMs better on small datasets? I ran 339 experiments worth 568 GPU hours and came up with some answers. I do not have time to write a blog post, so here a twitter thread instead. 1/n
— Tim Dettmers (@Tim_Dettmers) April 8, 2020

research

by Tim_Dettmers on 2020-04-08 (UTC).

The key insight is the following: In the small dataset regime, it is all about dataset augmentation. The analog in computer vision is that you get much better results, particularly on small datasets, if you do certain dataset augmentations. This also regularizes the model.
— Tim Dettmers (@Tim_Dettmers) April 8, 2020

research nlp

by Tim_Dettmers on 2020-04-08 (UTC).

The most dramatic performance gain comes from discrete embedding dropout: You embed as usual, but now with a probability p you zero the entire word vector. This is akin to masked language modeling but the goal is not to predict the mask — just regular LM with uncertain context.
— Tim Dettmers (@Tim_Dettmers) April 8, 2020

research tip nlp

by Tim_Dettmers on 2020-04-08 (UTC).

The second most important factor is regular input dropout: You take the embeddings and dropout elements with probability p. This also has a data augmentation effect very similar to dropping out random pixels for images. What is a good way to think about this? 1/2
— Tim Dettmers (@Tim_Dettmers) April 8, 2020

nlp research tip

Tags