Tweeted By @chipro
SimpleBooks is a longterm dependency dataset that is 90% the size of WikiText-103 but has 1/3 vocab and 1/4 OOV. I created it last year to test, benchmark, & do tutorials for word-level language models but didn't publish it bc small datasets get 0 love 😅 https://t.co/3TNA2xoz5Z https://t.co/8zCdf1ovdd
— Chip Huyen @ NeurIPS (@chipro) December 2, 2019