Tweeted By @marian_nmt

on 2019-02-27 (UTC)
dataset nlp

a group with 41 other tweets.

Another comment on the GPT-2 data: the WMT 2019 training data this year for English-German consists of 28GB of English and 58GB(!!!) of German plain text news data with document boundaries. So, similar to @OpenAI Webtext, news-domain but bilingual: https://t.co/EHOD3ZvGL7
— Marian NMT (@marian_nmt) February 27, 2019

Tags