When people who have never won a single @Kaggle competition tell me what skills I need to be good at Kaggling. pic.twitter.com/wXs0NSMIEi
— Bojan Tunguz (@tunguz) July 13, 2021
When people who have never won a single @Kaggle competition tell me what skills I need to be good at Kaggling. pic.twitter.com/wXs0NSMIEi
— Bojan Tunguz (@tunguz) July 13, 2021
The more I work in prod, the more I realize how hard it is to do prob & stats right.
— Chip Huyen (@chipro) July 8, 2021
Splitting data is easy. Sampling data correctly is hard.
Writing metrics is easy. Understanding what metrics measure is hard.
Monitoring numbers is easy. Interpreting them is very, very hard.
But a rough auto-scalable “template” for a healthy & efficient labeling workflow is slowly emerging along the lines of a finite state machine with a number of slots for specific roles, points of checks and balances and supporting infrastructure. Kinda. Maybe.
— Andrej Karpathy (@karpathy) July 8, 2021
The first step to mastering something is always to *have faith in your capability to learn new skills*. No matter how hard or mystifying the task seems.
— François Chollet (@fchollet) July 5, 2021
The pipeline myth won't die. Google execs are really trying to convince us that they just need to train more students, and it doesn't matter how many senior leaders they wrongly fire.
— Rachel Thomas (@math_rachel) July 5, 2021
Takeaway from @karpathy's CVPR talk:
— Chip Huyen (@chipro) June 24, 2021
The most successful ML projects in prod (Tesla, iPhone, Amazon drones, Zipline) are where you own the entire stack.
They iterate not just ML algorithms but also:
- how to collect/label data
- infrastructure
- hardware ML models run on
Something from the wired article about @timnitGebru's experience at Google that resonated: expecting Tech to self-regulate on ethical data practices is similar to asking energy companies to self regulate on pollution. The latter is clearly ridiculous, why isn't the former?
— Chris Holdgraf (@choldgraf) June 12, 2021
I'm regularly shocked by folks who think building some crappy app on top of their data is superior to just giving a link to a zipped csv of the freaking data. https://t.co/kPnOVQyvbV
— JD Long (@CMastication) June 8, 2021
One of the most frustrating parts of interviewing as a data scientist is that hiring managers have wildly different expectations for the skillset.
— ChrisAlbon.com (@chrisalbon) May 27, 2021
Some want a BI analyst, some want a scientist, some want an MLE etc.
I wish "data scientist" would title-split into sub-categories. https://t.co/F1TIe9irE3
An experience every data scientist either has had or will have is working for a product manager who has absolutely no concept of data but who insists their conclusions must be correct and it’s the fault of the data scientists for not being smart enough to prove them. https://t.co/oLElFHQEAf
— Emily G (not a newspaper) (@EmilyGorcenski) May 23, 2021
This paper gives me anxiety. BatchNorm is the most deviously subtly complex layer in deep learning. Many issues (silently) root cause to it. Yet it is ubiquitous because it works well (it multi-task helps optimization/regularization) and can be fused to affines at inference time. https://t.co/3EC2Abm8Ry
— Andrej Karpathy (@karpathy) May 18, 2021
+1. here's my ML consultancy for last few years.
— mat kelcey (@mat_kelcey) April 13, 2021
domain expert: we need ML to replace business rules
me: where is your eval?
DE: we don't do that
me: let's do that first
...
DE: oh! we can now see how we can improve by just reframing our question!
me: awesome, see you in a year.