Tweeted By @evolvingstuff
A Tensorized Transformer for Language Modeling
— Thomas Lahore (@evolvingstuff) June 25, 2019
"Multi-linear attention can .... obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition"https://t.co/f9rZpKBXmG pic.twitter.com/dewh2tGDDy