Tweeted By @m__dehghani

on 2019-05-17 (UTC)
nlp research

a group with 2 other tweets.

The "recurrent inductive bias" of RNNs usually helps them be more data efficient, compared to vanilla Transformer. If you introduce such a bias to Transformers (like recurrence in depth in Universal Transformers), they generalize better on small datasets: https://t.co/gWzKXz8xRU
— Mostafa Dehghani (@m__dehghani) May 17, 2019

Tags