Tweeted By @OriolVinyalsML

on 2019-08-25 (UTC)
research

A good example of our field moving (too) fast: the all attention layer was actually released in the original transformer paper from 2017, with similar findings.

Code: https://t.co/DdiM0PMziS
Paper: https://t.co/sejTvvAMh6 (only in v3) https://t.co/Q5QzStUZei
— Oriol Vinyals (@OriolVinyalsML) August 25, 2019

Tweeted By @OriolVinyalsML

Tags