Tweeted By @Tim_Dettmers
The second most important factor is regular input dropout: You take the embeddings and dropout elements with probability p. This also has a data augmentation effect very similar to dropping out random pixels for images. What is a good way to think about this? 1/2
— Tim Dettmers (@Tim_Dettmers) April 8, 2020