New 'BERT' model from Google just turned up on the https://t.co/ryDQeo2HU2 – Huge improvements on MNLI, CoLA, SST, ... pic.twitter.com/FBC4RokARF
— Sam Bowman (@sleepinyourhat) October 9, 2018
New 'BERT' model from Google just turned up on the https://t.co/ryDQeo2HU2 – Huge improvements on MNLI, CoLA, SST, ... pic.twitter.com/FBC4RokARF
— Sam Bowman (@sleepinyourhat) October 9, 2018
@GoogleAI's BERT (by Jacob Devlin and others) just rocked our @stanfordnlp SQuAD1.1 benchmark for human-level performance on reading comprehension. Key idea is masked language models to enable pre-trained deep bidirectional representations. Likely big advancement for NLP! pic.twitter.com/9Z4P8f81NH
— Pranav Rajpurkar (@pranavrajpurkar) October 12, 2018
BERT is super impressive!
— Thomas Wolf (@Thom_Wolf) October 12, 2018
Amazing development of the nice OpenAI GPT!
Human level already reached on the recent SWAG dataset (EMNLP'18)!
I'm wondering if we should consider the task "solved" or if we could/should update such an adversarially generated dataset? pic.twitter.com/GIJUFrJpUu
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding:
— Sebastian Ruder (@seb_ruder) October 12, 2018
SOTA on 11 tasks. Main additions:
- Bidirectional LM pretraining w/ masking
- Next-sentence prediction aux task
- Bigger, more data
It seems LM pretraining is here to stay.https://t.co/lV8TkBXxY5
It's amazing how fast #NLProc is moving these days.
— Sebastian Ruder (@seb_ruder) October 12, 2018
We have now reached super-human performance on SWAG, a commonsense task that will only be introduced at @emnlp2018 in November!
We need even more challenging tasks!
BERT: https://t.co/jJmVoH1632
SWAG: https://t.co/jblbPLLvj6 pic.twitter.com/n3ufh6hue2
This is the most important step in NLP in months — big! Make sure to read the BERT paper even if you are doing CV etc! Simple, but lots of compute. What does it mean for NLP? We do not know yet, but it will change how we do NLP and think about it for sure https://t.co/3N5LhFHsSj
— Tim Dettmers (@Tim_Dettmers) October 12, 2018
I wrote an in-depth analysis of how GPUs would compare against TPUs for training BERT. I conclude that current GPUs are about 30-50% slower than TPUs for this task https://t.co/BG8mIqQWMj
— Tim Dettmers (@Tim_Dettmers) October 17, 2018
A Keras implementation of BERT -- a new transformer architecture with strong performance across a range of language tasks. https://t.co/OznxM3h51Y
— François Chollet (@fchollet) October 30, 2018
Code and pretrained weights for BERT are out now.
— Sebastian Ruder (@seb_ruder) October 31, 2018
Includes scripts to reproduce results. BERT-Base can be fine-tuned on a standard GPU; for BERT-Large, a Cloud TPU is required (as max batch size for 12-16 GB is too small).https://t.co/CWv8GMZiX5
We have released @TensorFlow code+models for BERT, a brand new pre-training technique which is now state-of-the-art on a wide array of natural language tasks. It can also be used on many new tasks with minimal changes and quick training! https://t.co/rLR6U7uiPj
— Google AI (@GoogleAI) November 2, 2018
The multilingual (many languages, one encoder) version of @GoogleAI's BERT appears to be online! Happy to see results on our new XNLI cross-lingual transfer dataset, too!https://t.co/2YL9hSUb5j
— Sam Bowman (@sleepinyourhat) November 5, 2018
Here is an op-for-op @PyTorch re-implementation of @GoogleAI's BERT model by @sanhestpasmoi, @timrault and I.
— Thomas Wolf (@Thom_Wolf) November 5, 2018
We made a script to load Google's pre-trained models and it performs about the same as the TF implementation in our tests (see the readme).
Enjoy!https://t.co/dChmNPGPKO