ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
— ML Review (@ml_review) September 27, 2019
Parameter reduction techniques:
(i) Vocabulary embedding matrix factorization. Separates hidden layers size from vocabulary embedding
(ii) cross-layer parameter sharinghttps://t.co/dUamPpHLt1 pic.twitter.com/ehcwdYN6mE