We see more significant improvements from training data distribution search (data splits + oversampling factor ratios) than neural architecture search. The latter is so overrated :)
— Andrej Karpathy (@karpathy) September 20, 2019
We see more significant improvements from training data distribution search (data splits + oversampling factor ratios) than neural architecture search. The latter is so overrated :)
— Andrej Karpathy (@karpathy) September 20, 2019
(likely an artifact of most of academia focused on finding models conditioned on standard datasets)
— Andrej Karpathy (@karpathy) September 20, 2019
One of the biggest tool gaps in ML right now is tin building utilities to more easily inspect and understand data.
— Emmanuel Ameisen (@mlpowered) September 20, 2019
I gave a talk about just this at:https://t.co/0MxmhAF3zl
It also quotes your great data in industry vs data in academia slide in the conclusion @karpathy https://t.co/DxLqUIDXam
There was an interesting paper a year or two ago that compared different approaches to handling class imbalance. They found oversampling the rare class until it's equally frequent was the best approach in every dataset they tested, IIRC.
— Jeremy Howard (@jeremyphoward) September 20, 2019
i believe this is the paper you're referring to, yes?https://t.co/tQp6EwDz6E
— Jeremy Jordan (@jeremyjordan) September 20, 2019
I think this is an important observation by @karpathy. Somehow the areas of active learning class imbalance in ML and experimental design in stats haven’t quite addressed this (please correct me if I’m wrong). We focus too much on standard datasets and iidiness. https://t.co/ktajEKTZjp
— Nando de Freitas (@NandoDF) September 21, 2019