Tweeted By @Thom_Wolf
Actually, re-reading the Adaptive Sparse Transformers by Gonçalo M. Correia, Vlad Niculae and André F.T. Martins https://t.co/vPq8duBp4k, I found this nice observation of a BPE-merging head that I can't resist sharing with you as well. Isn't that a sweet head?👇
— Thomas Wolf (@Thom_Wolf) January 4, 2020
[3/3] pic.twitter.com/tFMsPIrdgg