The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) https://t.co/ENjHsIQ1HY
— Nando de Freitas (@NandoDF) May 29, 2019
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) https://t.co/ENjHsIQ1HY
— Nando de Freitas (@NandoDF) May 29, 2019
What happens when you remove most of BERT's heads? Answer: surprisingly little! Check out @pmichelX's new preprint on pruning heads from multi-head attention models, with interesting analysis and 81% inference-time accuracy gains on BERT-based models! https://t.co/XtnUoP9stc
— Graham Neubig (@gneubig) May 28, 2019
This is really cool - a transformer network that uses insertions and deletions as its primary operations. Roughly same performance, but up to 5x more efficient!
— Thomas Lahore (@evolvingstuff) May 28, 2019
Levenshtein Transformerhttps://t.co/yWKmj1m2Mk pic.twitter.com/f0o4ic9U17
Training Language GANs from Scratch
— hardmaru (@hardmaru) May 27, 2019
Latest work that attempts to train a language model entirely using GAN discriminator's loss function rather than maximum likelihood loss. Large population batch size and dense reward signal make REINFORCE less unhappy.https://t.co/KeEMtBjCFq pic.twitter.com/7058pHP1nU
Plotting the sentiment of the Game of Thrones audience as the finale unfolds, on a $5 budget, with Google Cloud Platform and Keras https://t.co/urLYAtGOJS pic.twitter.com/aRk4jqfU3I
— François Chollet (@fchollet) May 24, 2019
This is a great tutorial: building a Transformer with the Keras functional API in TensorFlow 2.0. https://t.co/mxgAvG8tSR
— François Chollet (@fchollet) May 24, 2019
BERT Rediscovers the Classical NLP Pipeline by I. Tenney, D. Das & E. Pavlic is 4 pages of great insightshttps://t.co/Cq61giOvof
— Thomas Wolf (@Thom_Wolf) May 24, 2019
Such a constant source of fascinating papers from Ellie Pavlick & her collaborators!
Here's BERT correcting his prediction along the model depth🤯 pic.twitter.com/470iLBBEJW
I've started putting together flowcharts for solving various NLP problems with https://t.co/j3J6mQ9xJf (and beyond). Obviously none of this is foolproof – it's just a summary of our usual advice. Here's the first one for Named Entity Recognition!
— Ines Montani 〰️ (@_inesmontani) May 21, 2019
📥 PDF: https://t.co/OPVLAIR7Fu pic.twitter.com/PNGDDSiFYJ
Brand new library for transfer learning in NLP from @PeterMartigny and Feedly team. I am inspired that they built this on top of our #NLProc book. https://t.co/gPVP3VXo5f pic.twitter.com/SQSKac5XXm
— Delip Rao (@deliprao) May 20, 2019
Currently working on the coming NAACL "Transfer Learning in NLP" tutorial with @seb_ruder @mattthemathman and @swabhz. Pretty excited!
— Thomas Wolf (@Thom_Wolf) May 18, 2019
And I've discovered you can write a Transformer model like GPT-2 in less than 40 lines of code now!
40 lines of code & 40 GB of data... pic.twitter.com/VVABKHNLB7
A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model https://t.co/ZKt3zfsU6z #deeplearning #machinelearning #ml #ai #neuralnetworks #datascience #pytorch
— PyTorch Best Practices (@PyTorchPractice) May 17, 2019
The "recurrent inductive bias" of RNNs usually helps them be more data efficient, compared to vanilla Transformer. If you introduce such a bias to Transformers (like recurrence in depth in Universal Transformers), they generalize better on small datasets: https://t.co/gWzKXz8xRU
— Mostafa Dehghani (@m__dehghani) May 17, 2019