Tweeted By @Tim_Dettmers
Turns out a lot of open-domain QA datasets have test set leakage. If you control for it, model performance drops by a mean absolute of 63%. Yikes! If we missed this for such a long time, I wonder if there are problems with other NLP datasets too. https://t.co/uPT2uYqou7
— Tim Dettmers (@Tim_Dettmers) August 7, 2020