Tweeted By @ak92501
Transformers Generalize Linearly
— AK (@ak92501) September 27, 2021
abs: https://t.co/ud0iUEYDyx
Transformers fail to generalize hierarchically across a wide variety of grammatical mapping tasks, but they exhibit an even stronger preference for linear generalization than comparable recurrent networks pic.twitter.com/VzbM2SQTZl