First, there was growing evidence that beam-search is highly sensitive to the length of the output. Best results are obtained when the output length is predicted from the input before decoding (https://t.co/imPuU2t6hp, https://t.co/WkxOsscAe7 at EMNLP 2018) [4/9]
— Thomas Wolf (@Thom_Wolf) May 3, 2019