Tweeted By @maithra_raghu
How do representations evolve as they go through the transformer? How does the Masked Language Model objective affect these compared to Language Models? How much do different tokens change and influence other tokens?
— Maithra Raghu (@maithra_raghu) November 2, 2019
Answers in the paper by @lena_voita: https://t.co/F2wKUVqeeO! pic.twitter.com/PhPp0X50vi