Tweeted By @AlecRad
By the way - I think a valid (if extreme) take on GPT-2 is "lol you need 10,000x the data, 1 billion parameters, and a supercomputer to get current DL models to generalize to Penn Treebank."
— Alec Radford (@AlecRad) February 17, 2019