Tag - dataset

by benhamner on 2020-09-29 (UTC).

We just launched a new research simulation competition on @kaggle! Write a bot or train an AI agent to control a football player https://t.co/DSGC4OU8Rp pic.twitter.com/h7cc39x0aJ
— Ben Hamner (@benhamner) September 29, 2020

kaggle dataset rl

by ylecun on 2020-09-24 (UTC).

Dynabench: a framework to test ML system by asking adversarial human annotators to break it.

A good way to evaluate the robustness (or brittleness) of ML systems beyond the traditional training set/test set paradigm. https://t.co/RhoY8C7hVW
— Yann LeCun (@ylecun) September 24, 2020

dataset tool

by kaggle on 2020-09-24 (UTC).

Want to show off your skills with TensorFlow and deep learning using TPUs? We'd love to see what you can build in response to this task on the Chinese MNIST dataset: https://t.co/Fp66iBWt5h
— Kaggle (@kaggle) September 24, 2020

kaggle dataset cv

by ak92501 on 2020-09-23 (UTC).

finetuning data efficient gans 100-shot-obama to cartoonset
github: https://t.co/CZt3YeJ5Gd
dataset: https://t.co/IzSvkrsLyk pic.twitter.com/4vfzvmYpjd
— AK (@ak92501) September 23, 2020

research dataset w_code cv gan

by MSFTResearch on 2020-08-31 (UTC).

Microsoft researchers propose a new natural language processing paradigm—one in which models are pretrained from scratch entirely within a specialized domain. Learn about the new BLURB benchmark, leaderboard, and state-of-the-art model for biomedical NLP: https://t.co/xHX7Z26pSa
— Microsoft Research (@MSFTResearch) August 31, 2020

nlp research dataset

by dadabots on 2020-08-29 (UTC).

Just finished assembling #DadaGP v1.0 --- a tokenized symbolic music dataset of 26181 GuitarPro songs. Totaling 115M tokens, about as big as WikiText-103. Includes GuitarPro5 encoder/decoder. Who wants to train a generator? #nlp #mir #languagemodel #transformer @huggingface pic.twitter.com/ocyrZwYHOg
— dadabots (@dadabots) August 29, 2020

dataset

by Emil_Hvitfeldt on 2020-08-27 (UTC).

New package announcement! Introducing the {friends} package 🙌

Includes the entire transcript of all the 10 seasons of the beloved American sitcom Friends

Perfect for getting your feet wet with network analysis 🌐 and text analysis 📚https://t.co/0D8DagYqnt #rstats #tidytext pic.twitter.com/HUHC02zzUq
— Emil Hvitfeldt (@Emil_Hvitfeldt) August 27, 2020

rstats nlp dataset

by viglovikov on 2020-08-24 (UTC).

New competition at @kaggle.

I am the host :)

A new type of task. Predicting the trajectories of different agents in the future.

I would guess that to win you will need to be creative. Stacking 100500 models may not help.

Feel free to join!#MachineLearning #SelfDriving https://t.co/cYgG14HU0u
— Vladimir Iglovikov (@viglovikov) August 24, 2020

kaggle dataset

by random_walker on 2020-08-24 (UTC).

We're releasing a corpus of over 1 million snapshots of English-language privacy policies from over 130,000 websites spanning two decades, with an accompanying paper: https://t.co/oc1b8w1ERk
By @RyanBMAmos, Günes Acar, @elenalucherini4, Mihir Kshirsagar, @jonathanmayer, and me. pic.twitter.com/7Kk9aLOOuh
— Arvind Narayanan (@random_walker) August 24, 2020

dataset nlp

by GoogleAI on 2020-08-19 (UTC).

To better understand the impact of noisy labels on #MachineLearning model training, we are announcing MentorMix, a new method to mitigate the impact of noisy labels, as well as a benchmark and dataset on real-world label noise. Learn more about it at: https://t.co/jbsxPz7xtv pic.twitter.com/FNQv7XPCWk
— Google AI (@GoogleAI) August 19, 2020

research dataset

by Tim_Dettmers on 2020-08-07 (UTC).

Turns out a lot of open-domain QA datasets have test set leakage. If you control for it, model performance drops by a mean absolute of 63%. Yikes! If we missed this for such a long time, I wonder if there are problems with other NLP datasets too. https://t.co/uPT2uYqou7
— Tim Dettmers (@Tim_Dettmers) August 7, 2020

research nlp dataset

by kaggle on 2020-08-03 (UTC).

️‍We've got a new Getting Started competition up! Check out "Contradictory, My Dear Watson: Detecting contradiction and entailment in multilingual text using TPUs"---we can't wait to see what you create 🕵️‍♀️🕵️‍♂️ https://t.co/8IUySU4TfM
— Kaggle (@kaggle) August 3, 2020

kaggle dataset

Tag: dataset

Tags