Tweeted By @ak92501
Deduplicating Training Data Makes Language Models Better
— AK (@ak92501) July 15, 2021
pdf: https://t.co/w8J8NZ5v7t
abs: https://t.co/4Woo78QjST
Deduplication allows us to train models that emit memorized text ten times less frequently and require fewer train steps to achieve the same or better accuracy pic.twitter.com/sMMud34rj7