Tweeted By @Smerity
The other advantage of QRNNs that I usually bring up is that a complex recurrence function such as the LSTM's isn't always an advantage. In the Transformer-XL paper they show that a QRNN is better at their context metric than an LSTM with the same parameter budget. pic.twitter.com/QtupPkwDWQ
— Smerity (@Smerity) August 6, 2019