Tweeted By @evolvingstuff
Reducing BERT Pre-Training Time from 3 Days to 76 Minutes
— Thomas Lahore (@evolvingstuff) April 2, 2019
"we propose the LAMB optimizer, which helps us to scale the batch size to 65536 without losing accuracy"https://t.co/WRRAh7zQFC