Ceshine's Data Science Tweet Collection

by karpathy on 2019-09-20 (UTC).

We see more significant improvements from training data distribution search (data splits + oversampling factor ratios) than neural architecture search. The latter is so overrated :)
— Andrej Karpathy (@karpathy) September 20, 2019

thought

by karpathy on 2019-09-20 (UTC).

(likely an artifact of most of academia focused on finding models conditioned on standard datasets)
— Andrej Karpathy (@karpathy) September 20, 2019

thought

by mlpowered on 2019-09-20 (UTC).

One of the biggest tool gaps in ML right now is tin building utilities to more easily inspect and understand data.

I gave a talk about just this at:https://t.co/0MxmhAF3zl

It also quotes your great data in industry vs data in academia slide in the conclusion @karpathy https://t.co/DxLqUIDXam
— Emmanuel Ameisen (@mlpowered) September 20, 2019

nlp learning

by jeremyphoward on 2019-09-20 (UTC).

There was an interesting paper a year or two ago that compared different approaches to handling class imbalance. They found oversampling the rare class until it's equally frequent was the best approach in every dataset they tested, IIRC.
— Jeremy Howard (@jeremyphoward) September 20, 2019

misc

by jeremyjordan on 2019-09-20 (UTC).

i believe this is the paper you're referring to, yes?https://t.co/tQp6EwDz6E
— Jeremy Jordan (@jeremyjordan) September 20, 2019

research learning

by NandoDF on 2019-09-21 (UTC).

I think this is an important observation by @karpathy. Somehow the areas of active learning class imbalance in ML and experimental design in stats haven’t quite addressed this (please correct me if I’m wrong). We focus too much on standard datasets and iidiness. https://t.co/ktajEKTZjp
— Nando de Freitas (@NandoDF) September 21, 2019

thought

Tags