Tweeted By @PiotrCzapla
Thank you, It should be really useful as according to this paper https://t.co/TnYFcCMRRR , the unsupervised finetuning and layer wise LR , and one-cycle are crucial for BERT performance. They mange to beat ULMFiT on IMDB with BERT-Base!
— Piotr Czapla (@PiotrCzapla) November 29, 2019