Tweeted By @OriolVinyalsML
Universal Transformers propose to augment Transformers with Recurrence in depth and Adaptive Computation Time. This model outperforms Vanilla Transformers in MT / bAbI / LA / LTE.
— Oriol Vinyals (@OriolVinyalsML) July 12, 2018
Paper: https://t.co/U2YAeuO6EO
Code: Soon in https://t.co/KSuQAkn5Jh pic.twitter.com/lCKfsEAswG