Incidents1M: a large-scale dataset of images with natural disasters, damage, and incidents
β AK (@ak92501) January 13, 2022
abs: https://t.co/ehdybJKTr5
project page: https://t.co/HnfvJnYcSM pic.twitter.com/6Q2vHRfCZm
Incidents1M: a large-scale dataset of images with natural disasters, damage, and incidents
β AK (@ak92501) January 13, 2022
abs: https://t.co/ehdybJKTr5
project page: https://t.co/HnfvJnYcSM pic.twitter.com/6Q2vHRfCZm
Weβre using KaoKore, a machine learning-friendly dataset, to decipher cursive and illustrations in historical Japanese art. Learn how machine learning can be used for humanities research and contribute to cultural preservation. https://t.co/nYSvuehWoR
β Boo-gle π» (@Google) September 30, 2021
We make all our work publicly available so that researchers and journalists can do their best work every day.
β Our World in Data (@OurWorldInData) September 30, 2021
β https://t.co/qdK1aoGr9U
We bring together the COVID data from around the world every day. Thousands of articles get written based on our team's daily work. pic.twitter.com/OSxY8mzb3Y
Today Iβm open sourcing my code for working with Twitter data
β Ryan J. Gallagher (on the job market!) (@ryanjgallag) September 27, 2021
It's designed to make advanced studies of social media easier by coordinating multiple API queries (stream, search, convos, quotes, user timelines) and organizing them using PostgreSQLhttps://t.co/2yYhfr612v
1/
What is your favorite tool for labeling data? Labelme (for image data) came to mind, but then going down the rabbit hole of this question, I learned that there is an entire "awesome-" GitHub repo of data labeling tools: https://t.co/w7ZApH9hT1
β Sebastian Raschka (@rasbt) September 16, 2021
For downloading large image datasets (1M+), I highly recommend https://t.co/U27VlPBfUK from @rom1504
β Boris Dayma π₯ (@borisdayma) September 15, 2021
You can even monitor performance and download errors with @weights_biases pic.twitter.com/ZvrHh6B8O0
LAION-400M: open-source dataset of 400 million image-text pairs
β AK (@ak92501) September 12, 2021
project page: https://t.co/IA8aNpXZ6a pic.twitter.com/f5IoLESnRx
Datasets: A Community Library for Natural Language Processing
β AK (@ak92501) September 8, 2021
abs: https://t.co/xEpY9oQ2a5
github: https://t.co/HvY6Nlf41c
650+ unique datasets, 250+ contributors, and has helped support a variety of novel crossdataset research projects and shared tasks pic.twitter.com/AdlB21Hu2c
Challenges and Opportunities in NLP Benchmarking
β Sebastian Ruder (@seb_ruder) August 23, 2021
Recent NLP models have outpaced the benchmarks to test for them. I provide an overview of challenges and opportunities in this blog post.https://t.co/NbVfcwGX8z
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
β Sebastian Ruder (@seb_ruder) August 6, 2021
This is an excellent overview of the QA landscape in NLP with numerous insightful observations by @annargrs @nlpmattg @IAugenstein.https://t.co/xD4Dmm886o pic.twitter.com/4oHABAPe40
Today we are releasing the Open Buildings Dataset, a new open-source dataset containing the locations and footprints of >500M buildings with coverage across Africa, which can support numerous scientific and humanitarian applications. Read more at https://t.co/ZAFeD3mWQt pic.twitter.com/hy9PVKx0Hy
β Google AI (@GoogleAI) July 28, 2021
There's a new version of the {{modeldata}} package on CRAN. https://t.co/p3DZGk3Xjn
β Max Kuhn (@topepos) July 15, 2021
There is a great data set (tate_text).
We're also going to remove the two OkCupid data sets in the next version. #rstats pic.twitter.com/sEU9M3srUU