Tweeted By @ak92501
Primer: Searching for Efficient Transformers for Language Modeling
— AK (@ak92501) September 20, 2021
abs: https://t.co/JM9v7pNoSI
github: https://t.co/xhA7uGyC7H
Experiments show Primer’s gains over Transformer increase as compute scale grows and follow a power law with respect to quality at optimal model sizes pic.twitter.com/CXq1yYMfUA