Tweeted By @Smerity

on 2021-02-26 (UTC)
research nlp

What happens when you mix the SHA-RNN with the SRU, similar to the QRNN? 2.5-10x less training time and darn close to SotA results on the enwik8, WikiText-103, and Billion Word language modeling datasets.
Impressive work from @taolei15949106 at @asapp!
See https://t.co/aNCqhTLnn6 https://t.co/eD3mWPJnwo
— Smerity (@Smerity) February 26, 2021

Tweeted By @Smerity

Tags