I've added FP16 training to our PyTorch BERT repo to easily fine-tune BERT-large on GPU.
— Thomas Wolf (@Thom_Wolf) November 12, 2018
The repo has become a showcase of all the tools you can use to train huge NNs 🙂
Got >91 F1 on SQuAD training BERT-large a few hours on 4-GPUs.
Should take less than a day on 1-(recent)-GPU pic.twitter.com/bMhII36gT0