Tweeted By @Smerity
What happens when you mix the SHA-RNN with the SRU, similar to the QRNN? 2.5-10x less training time and darn close to SotA results on the enwik8, WikiText-103, and Billion Word language modeling datasets.
— Smerity (@Smerity) February 26, 2021
Impressive work from @taolei15949106 at @asapp!
See https://t.co/aNCqhTLnn6 https://t.co/eD3mWPJnwo