SOTA for PTB without extra data is 46.54 (Transformer-XL 54.5). On paperswithcode, all top models on WikiText-103 & 1billion are transformer, and all top models on small datasets are lstm. Could just be hp but could also be something else https://t.co/Vtms96ScKd
— Chip Huyen (@chipro) May 17, 2019