Tweeted By @Tim_Dettmers

on 2020-04-08 (UTC)
research

a group with 6 other tweets.

How can you successfully train transformers on small datasets like PTB and WikiText-2? Are LSTMs better on small datasets? I ran 339 experiments worth 568 GPU hours and came up with some answers. I do not have time to write a blog post, so here a twitter thread instead. 1/n
— Tim Dettmers (@Tim_Dettmers) April 8, 2020

Tags