Tweeted By @OriolVinyalsML
A good example of our field moving (too) fast: the all attention layer was actually released in the original transformer paper from 2017, with similar findings.
— Oriol Vinyals (@OriolVinyalsML) August 25, 2019
Code: https://t.co/DdiM0PMziS
Paper: https://t.co/sejTvvAMh6 (only in v3) https://t.co/Q5QzStUZei