Tweeted By @seb_ruder
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding:
— Sebastian Ruder (@seb_ruder) October 12, 2018
SOTA on 11 tasks. Main additions:
- Bidirectional LM pretraining w/ masking
- Next-sentence prediction aux task
- Bigger, more data
It seems LM pretraining is here to stay.https://t.co/lV8TkBXxY5