Tweeted By @decodyng
To balance out yesterday's much more buzzy paper, today I read about analyzing the contributions of each head within Transformer's multi-headed attention, to understand what each is doing, and how necessary it is to performance. https://t.co/1ltTBhgkKP
— Cody Wild (@decodyng) May 25, 2019