Tweeted By @hardmaru
Compared to RNNs, the Transformer family of architectures seemingly scale to hundreds of millions of parameters with relative ease. pic.twitter.com/Xo7Ng5b2Gj
— hardmaru (@hardmaru) January 11, 2019
Compared to RNNs, the Transformer family of architectures seemingly scale to hundreds of millions of parameters with relative ease. pic.twitter.com/Xo7Ng5b2Gj
— hardmaru (@hardmaru) January 11, 2019