By the way - I think a valid (if extreme) take on GPT-2 is "lol you need 10,000x the data, 1 billion parameters, and a supercomputer to get current DL models to generalize to Penn Treebank."
— Alec Radford (@AlecRad) February 17, 2019
By the way - I think a valid (if extreme) take on GPT-2 is "lol you need 10,000x the data, 1 billion parameters, and a supercomputer to get current DL models to generalize to Penn Treebank."
— Alec Radford (@AlecRad) February 17, 2019
The causality of working hard goes the other way than I think people think about it. Clear alignment, growth trajectory, and good morale makes people work hard. Trying to affect the outcome directly is not very effective.
— Erik Bernhardsson (@fulhack) February 15, 2019
It’s mind-blowing how little the human condition has changed over the past two thousand years, despite all this technology. We strive for and worry about fundamentally the same things. So when you see something new promising radical change - be skeptical.
— Denny Britz (@dennybritz) February 15, 2019
Releasing a "restricted" model in this way has other (intentional or not) consequences - primarily that "AI is scary" is cat nip to reporters. Hence by acting like a good dual use citizen you can accidentally provoke the AI hype beast.https://t.co/B41sqFJ9sh
— Smerity (@Smerity) February 15, 2019
As everyone has a different point of view, it's just collisions everywhere :S
— Smerity (@Smerity) February 15, 2019
- When does a model go from "safe" to "dual use"?
- How much of a "dual use" delay do we need to add?
- Should we release to journalists first or researchers?
- How can small labs participate in PR?
The work that caused the kerfuffle was a large scale language model from @OpenAI. Think of it as a super powered version of the predictive text on your phone that has read more data and can generate fairly coherent text.https://t.co/NFnJe5HFlp
— Smerity (@Smerity) February 15, 2019
3 reasons I recommend learning math on an as-needed basis (as opposed to trying to complete a bunch of math pre-reqs before you start on the thing you care about):https://t.co/4pXm4lXmRj pic.twitter.com/8FN76IAIm9
— Rachel Thomas (@math_rachel) February 14, 2019
I disagree there's a glut of data scientists. I agree there's a glut of aspiring data scientists with unrealistic expectations. "I'm a basketball player; I don't want to do boring drills. I just want to dunk like I saw on TV. NBA, here I come!"
— Monica Rogati (@mrogati) February 14, 2019
Being a Data Analyst is perfect training for that 85% of data science that's not machine learning!
— Data Science Renee (@BecomingDataSci) February 14, 2019
Data scientists with strong SQL/data prep skills stand out, like @vboykis said, and ability to communicate via reports and verbally is also a huge differentiator when interviewing.
Basically, you have two kinds of problems: yourself, and the competition. The only problem you need to care about is the first one. If you solve it, the competition is nothing to worry about.
— François Chollet (@fchollet) February 13, 2019
If you don't, game over.
This shouldn't surprise anyone. If you have tabular data, a small number of variables, and a modest sample size there is *no* reason to expect ML to be superior. However, I don't think this is the scenario most have in mind when thinking about potential for ML in medicine https://t.co/hp0do1Ladz
— Andrew Beam (@AndrewLBeam) February 12, 2019
The DL CV community is having a "oh wait, bags of local features are a really strong baseline for classification" moment with the BagNet paper.
— Alec Radford (@AlecRad) February 11, 2019
This has always been clear for text classification due to n-gram baselines. It took an embarrassingly long time for nets to beat them.