Tweeted By @RichardSocher
Hypothesis: any general function approximator that is powerful enough (eg has enough parameters) and can be efficiently trained will eventually perform similarly.
— Richard Socher (@RichardSocher) November 6, 2019
The underlying substrate (LSTM, CNNs, Transformers etc) won’t matter as much as the data and objective functions. pic.twitter.com/K01mAEBcOO