Tweeted By @jeremyphoward
AWD-LSTM benefits from all the work done on regularization by @Smerity . Not sure there's the same richness of regularization available just yet for transformer architectures? It's particularly important for small datasets
— Jeremy Howard (@jeremyphoward) May 17, 2019