Ceshine's Data Science Tweet Collection

by Thom_Wolf on 2019-05-03 (UTC).

First, there was growing evidence that beam-search is highly sensitive to the length of the output. Best results are obtained when the output length is predicted from the input before decoding (https://t.co/imPuU2t6hp, https://t.co/WkxOsscAe7 at EMNLP 2018) [4/9]
— Thomas Wolf (@Thom_Wolf) May 3, 2019

nlp research

by Thom_Wolf on 2019-05-03 (UTC).

Last in this recent trend of work is https://t.co/D3IOZ8CMNQ in which @universeinanegg & co show that the distribution of words in BS/greedy decoded texts is very different from the one in human texts.
Clearly BS/greedy fail to reproduce distributional aspects of human text [7/9] pic.twitter.com/7blkmtLPjB
— Thomas Wolf (@Thom_Wolf) May 3, 2019

research nlp

by Thom_Wolf on 2019-05-03 (UTC).

Finally, here is a gist showing how to code top-k and nucleus sampling in PyTorch:https://t.co/aDOlWLI3aq
[9/9]
— Thomas Wolf (@Thom_Wolf) May 3, 2019

nlp

Tags