2/ Data availability is the first thing to consider. How much data do you need? Is there data already available? If not, how hard is it to collect? How expensive is it to label?
— Josh Tobin (@josh_tobin_) June 14, 2019
2/ Data availability is the first thing to consider. How much data do you need? Is there data already available? If not, how hard is it to collect? How expensive is it to label?
— Josh Tobin (@josh_tobin_) June 14, 2019
3/ The most challenging of these is figuring out how much data you need. It’s problem dependent, but a rule of thumb (for vision): if you can fine-tune, at least 5k images. If not, at least 100k.
— Josh Tobin (@josh_tobin_) June 14, 2019
5/ Why? ML projects tend to scale poorly with the accuracy requirement. It might cost you 10-100x more to build a 99.99% accurate model than a 99.9% accurate one.
— Josh Tobin (@josh_tobin_) June 14, 2019
6/ Lastly, you’ll need to figure out how hard the learning problem itself is. The best thing to do is find published work on a similar problem. Don’t forget to look at whether their compute / inference budgets are compatible with yours :)
— Josh Tobin (@josh_tobin_) June 14, 2019
8/ One heuristic for assessing DL project feasibility I generally avoid: @AndrewYNg’s idea that “DL can do anything a human can do in <1 second”. Lots of counterexamples (humor, sarcasm, in-hand manipulation, generalization, etc), and doesn’t capture lots of things AI can do.
— Josh Tobin (@josh_tobin_) June 14, 2019
9/ (These ideas are adapted from a lecture I gave earlier this year @full_stack_dl)
— Josh Tobin (@josh_tobin_) June 14, 2019
Video: https://t.co/QlGaiHfkie
Good thread. One of the most important skills AI teams & PMs need to develop in industry is the ability to tell the difference between easy, hard, and impossible machine learning problems. https://t.co/gJUq4lbg2v
— Peter Skomoroch (@peteskomoroch) June 14, 2019