Tweeted By @Miles_Brundage
P.S. See also the ALBERT paper, which shows stronger results on these metrics for a similarly sized model, using a different approach (using parameters better to begin with vs. compressing a big model later): https://t.co/4VY2gbuFQu
— Miles Brundage (@Miles_Brundage) September 28, 2019