Ceshine's Data Science Tweet Collection

by MeganRisdal on 2022-05-19 (UTC).

[Feature Selection Using Null Importances] by Kaggler and VP of Data Science @ @h2oai ogrellier HT @YifanX https://t.co/RMMvsgd6nC
— meg.ehh 🇨🇦 (@MeganRisdal) May 19, 2022

tip learning kaggle

In a group with 1 other tweets.

by MeganRisdal on 2022-05-19 (UTC).

Yesterday I asked some Kagglers about their favorite creative ideas with real-world applicability that they found through Competitions ...

They're very cool so I'm sharing them in a thread (add your own!) 👇
— meg.ehh 🇨🇦 (@MeganRisdal) May 19, 2022

tip kaggle learning

In a group with 1 other tweets.

by ak92501 on 2022-05-19 (UTC).

Dialog Inpainting: Turning Documents into Dialogs
abs: https://t.co/uVnXYgkKxu

Using inpainted data to pre-train ConvQA retrieval systems, advance sota across three benchmarks (QReCC, OR-QuAC, TREC CAsT) yielding up to 40% relative gains on standard evaluation metrics pic.twitter.com/eONAdbN1fg
— AK (@ak92501) May 19, 2022

research nlp

by hardmaru on 2022-05-17 (UTC).

Good discussion on this thread: Why do top speech/audio conferences like ICASSP and Interspeech have very high acceptance rates like 46%-48%?

IMO, low acceptance rates do not imply that the conference is any good. If anything, the opposite might be true:https://t.co/7EDybWXdi1 https://t.co/kebzEuYMz1 pic.twitter.com/AoKlp7Tjff
— hardmaru (@hardmaru) May 17, 2022

misc thought

by ak92501 on 2022-05-17 (UTC).

RTMV: A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis
abs: https://t.co/4kdIL9g29r pic.twitter.com/JmXpCqFPXx
— AK (@ak92501) May 17, 2022

research cv

by PyTorch on 2022-05-16 (UTC).

The Intel engineers in the PyTorch open-source community have created an new Intel® Extension for PyTorch* which maximizes deep learning inference and training performance on Intel CPUs. Get the extension to make use of these features today:@fanzhao_intel https://t.co/RMtyhRHeDE
— PyTorch (@PyTorch) May 16, 2022

tool pytorch

by rasbt on 2022-05-16 (UTC).

Stumbled upon this neat flowchart for choosing text classification methods. I usually eye-balled it, but using a samples/number ratio cut-off seems reasonable. I.e., with a samples/number < 1500 use a bag-of-words model, with a >= 1500, use a seq. model (https://t.co/gbLJEGBOJA) pic.twitter.com/2BmxTTx9Z4
— Sebastian Raschka (@rasbt) May 16, 2022

nlp tool learning

by chipro on 2022-05-16 (UTC).

My editors just shared with me the feedback from early reviewers and I'm in tears 😭

With the help of so many people, I worked really hard on this book. I'm grateful that people gave it a chance.

Read the book online: https://t.co/fxph4OYIsf

Pre-order: https://t.co/5RHFYzu7kq pic.twitter.com/vyhGbsih8A
— Chip Huyen (@chipro) May 16, 2022

misc learning

by gdb on 2022-05-14 (UTC).

ML bugs are so much trickier than bugs in traditional software because rather than getting an error, you get degraded performance (and it's not obvious a priori what ideal performance is).

So ML debugging works by continual sanity checking, e.g. comparing to various baselines.
— Greg Brockman (@gdb) May 14, 2022

misc tip

by CMastication on 2022-05-14 (UTC).

My adamance that “business logic belongs in ETL, not BI” is, fundamentally, the same as “create a metric layer.” And it’s like we’re all figuring out how to do that well as we go along. https://t.co/WUiYFUAwi1
— JD Long (@CMastication) May 14, 2022

misc thought

by PyTorch on 2022-05-13 (UTC).

Learn how our developer community solves real, everyday machine learning problems with PyTorch. From Advertising & Marketing to Travel and so much in between, get to know PyTorch’s features and capabilities. Read all about PyTorch’s Community Stories: https://t.co/ceOqYIL5fR
— PyTorch (@PyTorch) May 13, 2022

misc learning

by ericjang11 on 2022-05-13 (UTC).

A short thread about DeepMind's recent GATO paper. It trains a basic transformer on an impressive number of datasets pic.twitter.com/ncpP8aLFgs
— Eric Jang (@ericjang11) May 13, 2022

research nlp

Tags