Tweeted By @hardmaru
The Evolved Transformer: They perform architecture search on Transformer's stackable cells for seq2seq tasks.
— hardmaru (@hardmaru) February 1, 2019
“A much smaller, mobile-friendly, Evolved Transformer with only ~7M parameters outperforms the original Transformer by 0.7 BLEU on WMT14 EN-DE.”https://t.co/ABtfdTGIYl pic.twitter.com/Rso7GUiDe9