Tweeted By @rsalakhu
Mixtape: breaking the softmax bottleneck that limits expressiveness of neural language models.
— Russ Salakhutdinov (@rsalakhu) December 12, 2019
A network with Mixtape Output Layer is only 35% slower than softmax-based network, while outperforming softmax in perplexity & translation quality #NeurIPS2019https://t.co/ZxpIqtJomX