There are a lot of models in the transformers🤗 repo. Feeling lost? I know I was, so I made a little high-level summary of the differences between each model. https://t.co/y5awjLbOrS
— Sylvain Gugger (@GuggerSylvain) June 5, 2020
There are a lot of models in the transformers🤗 repo. Feeling lost? I know I was, so I made a little high-level summary of the differences between each model. https://t.co/y5awjLbOrS
— Sylvain Gugger (@GuggerSylvain) June 5, 2020
The demo program at ACL is great (https://t.co/f5Pz39L4gl). Some neat ones:
— Sasha Rush (@srush_nlp) June 3, 2020
jiant - https://t.co/lPmNsu9Awphttps://t.co/bnv94XM5Qu
SyntaxGym - https://t.co/shr8ZIXPJh
exBERT - https://t.co/TlNJUUoAOg
Stanza - https://t.co/dmGpNLiXjh
(props to @real_asli and Tsung-Hsien Wen) pic.twitter.com/Rbq4bZD82x
New: Cascaded Text Generation with Markov Transformers (https://t.co/KUQ4tAeH0n, Yuntian Deng)
— Sasha Rush (@srush_nlp) June 2, 2020
Beam Search Translation : Serial but fluent.
Non-Autoregresssive (NAT): Parallel but disfluent (and kind of hacky...)
Why not parallel, fast, autoregressive, and accurate?
/thread pic.twitter.com/sOngb5rNyT
Excited to welcome Longformer, the transformer for long-range document tasks, to transformers 🤗(thanks to @i_beltagy).
— Hugging Face (@huggingface) June 2, 2020
Try 1 of the 7 models from the model hub:https://t.co/vcTFStsLlX
or check out how to convert a pre-trained BERT to its "long" version:https://t.co/qYXdtMzPFX. pic.twitter.com/a3QqKIuzBf
SysNLP is a hot area with distillation, compression, deployment, green NLP...
— Sasha Rush (@srush_nlp) June 1, 2020
Hoping this sheds light on earlier (under-read) NLP Systems papers. Some random recs:
* KenLM (https://t.co/f1F1pwAzPE)
* CPU Seq2Seq (https://t.co/Hb8evAy3s2)
* CKY GPU (https://t.co/eiwoykDzIh)
Introducing PruneBERT, fine-*P*runing BERT's encoder to the size of a high-resolution picture (11MB) while keeping 95% of its original perf!
— Hugging Face (@huggingface) June 1, 2020
Based on our latest work on movement pruning: https://t.co/jDLpUmEtcp
Code and weights: https://t.co/RnWf0rrRJBhttps://t.co/13gFApbOsE
GPT-3 is terrifying because it's a tiny model compared to what's possible, trained in the dumbest way possible on a single impoverished modality on tiny data, yet the first version already manifests crazy runtime meta-learning—and the scaling curves 𝘴𝘵𝘪𝘭𝘭 are not bending! 😮 https://t.co/hQbW9znm3x
— 𝔊𝔴𝔢𝔯𝔫 (@gwern) May 31, 2020
NYU talks about the @SustaiNLP2020 competition for the most energy-efficient NLP model 🔥
— Thomas Wolf (@Thom_Wolf) May 30, 2020
The competition has just started and we designed it so you don’t need tens of GPUs to participate!
Join the challenge 🚀 https://t.co/1OVf1o1lb1
Scale *still* delivers! Congrats @OpenAI on showing very nice zero/few-shot language capabilities of GPT-3. #timelesstweet
— Oriol Vinyals (@OriolVinyalsML) May 29, 2020
Paper: https://t.co/SMT1n4eS1N
Endless Samples: https://t.co/arTp3Dxyo3 pic.twitter.com/LMfeR5EL4x
After going through GPT-3 paper, I have to remind myself that we can also do amazing things with small compute. https://t.co/pgWqQqgsJf
— hardmaru (@hardmaru) May 29, 2020
We helped BERT do better on several structured prediction tasks by learning from a language model that knows more about syntax, showcasing the benefits of structural biases in large-scale models.
— DeepMind (@DeepMind) May 29, 2020
Read about it here: https://t.co/zm418xTGhp
GPT-3 from @OpenAI got you interested in zero-shot and few-shot learning? You're lucky because our own @joeddav has just released a demo of zero-shot topic classification!
— Hugging Face (@huggingface) May 29, 2020
Test how the model can predict a topic it has NEVER been trained on: https://t.co/3xzdEVbuAG 🤯🤯🤯 pic.twitter.com/CBPh9FU4OP