"Whether to go with a decoder-only or encoder-decoder transformer?"
— Mostafa Dehghani (@m__dehghani) May 12, 2022
It turned out that this question on the architecture of the model is not actually that important!
You just need the right objective function and a simple prompting to switch mode during pretraining/finetuning. pic.twitter.com/lanaYmHynW