Tweeted By @RichardSocher

on 2019-11-06 (UTC)
thought

Hypothesis: any general function approximator that is powerful enough (eg has enough parameters) and can be efficiently trained will eventually perform similarly.
The underlying substrate (LSTM, CNNs, Transformers etc) won’t matter as much as the data and objective functions. pic.twitter.com/K01mAEBcOO
— Richard Socher (@RichardSocher) November 6, 2019

Tweeted By @RichardSocher

Tags