Great talk by @AlecRad in @pabbeel's deep unsupervised learning class - "Language Models and their Uses": https://t.co/388W4HXc4o
— Miles Brundage (@Miles_Brundage) May 17, 2019
Great talk by @AlecRad in @pabbeel's deep unsupervised learning class - "Language Models and their Uses": https://t.co/388W4HXc4o
— Miles Brundage (@Miles_Brundage) May 17, 2019
AWD-LSTM benefits from all the work done on regularization by @Smerity . Not sure there's the same richness of regularization available just yet for transformer architectures? It's particularly important for small datasets
— Jeremy Howard (@jeremyphoward) May 17, 2019
SOTA for PTB without extra data is 46.54 (Transformer-XL 54.5). On paperswithcode, all top models on WikiText-103 & 1billion are transformer, and all top models on small datasets are lstm. Could just be hp but could also be something else https://t.co/Vtms96ScKd
— Chip Huyen (@chipro) May 17, 2019
My algo is to make search queries for the keywords in a prompt, plus the exact sequence of the last words in the prompt (trying different number of words to get at least one match), then stitch together result snippets by using last words as a continuity pivot. It works decently! pic.twitter.com/WZNSllGVRR
— François Chollet (@fchollet) May 16, 2019
Based on the observation that the GPT-2 medium-size model has memorized (and can spit back word-for-word) very long extracts from the web, such as the Gorilla Warfare meme, I had an idea for a very simple ML-less text generation algorithm. I spent the past 20 min implementing it. pic.twitter.com/9MiVnw4TqC
— François Chollet (@fchollet) May 16, 2019
BERT Rediscovers the Classical NLP Pipeline
— Thomas Lahore (@evolvingstuff) May 16, 2019
"regions responsible for each step appear in the expected sequence: POS tagging, parsing, NER, semantic roles, then coreference"https://t.co/6BB2BrPrUK pic.twitter.com/jxMtQndMVA
Translatotron is our experimental model for direct end-to-end speech-to-speech translation, which demonstrates the potential for improved translation efficiency, fewer errors, and better handling of proper nouns. Learn all about it below! https://t.co/CvXL4lbPvP
— Google AI (@GoogleAI) May 15, 2019
Exciting: This appears to be the first model to beat BERT on our GLUE benchmark without adding any additional training data. https://t.co/CpiLQbKhhQ
— Sam Bowman (@sleepinyourhat) May 12, 2019
🐣 New Tutorial, open-source code & demo!
— Thomas Wolf (@Thom_Wolf) May 9, 2019
Building a SOTA Conversational AI with transfer learning & OpenAI GPT models
-Code/pretrained model from our NeurIPS 2018 ConvAI2 competition model, SOTA on automatic track
-Detailed Tutorial w. code
-Cool demo https://t.co/fcXeNmhPKy👇 pic.twitter.com/KBhTGixKZP
Things transformers say: https://t.co/L3hgSz5Wfe
— Ilya Sutskever (@ilyasut) May 8, 2019
I like this GPT-2 post update: Data release for detection research, Bigger (117M -> 345M) model release for your creative works. Nice follow up from @OpenAI! https://t.co/kGKmk9XymY
— Delip Rao (@deliprao) May 4, 2019
“He and her team, which included Nanyun Peng and Percy Liang, tried to give their AI some creative wit, using insights from humor theory.” @hhexiy @percyliang. The Comedian Is in the Machine. AI Is Now Learning Puns | WIRED https://t.co/YOZUjFPoeE
— Stanford NLP Group (@stanfordnlp) May 3, 2019