Google: BERT now used on almost every English query https://t.co/dO4AaFkNdt
— Bojan Tunguz (@tunguz) October 17, 2020
Google: BERT now used on almost every English query https://t.co/dO4AaFkNdt
— Bojan Tunguz (@tunguz) October 17, 2020
Manipulated media is not only a risk when it goes viral:
— Rachel Thomas (@math_rachel) October 9, 2020
"if a proliferation of fake comments convinces the public that a majority feels some particular way about a hot topic, that’s a success. But even merely creating cynicism or confusion is a form of success too." - @noUpside pic.twitter.com/zogOsORCnY
As a sr data scientist who also does a lot of eng – “uh let me get back to you on that (after I do some research)” is still a VERY common answer to a lot of questions I get. I STILL use the same process with new problems: fail a lot, learn, iterate, break up into smaller problems https://t.co/5DRrPeHqrA
— Mikhail Popov (@bearloga) October 9, 2020
Programming: 10% writing code. 90% figuring out why it doesn’t work
— Ben Hamner (@benhamner) October 9, 2020
Analyzing data and ML: 1% writing code. 9% figuring out why code doesn’t work. 90% figuring out what’s wrong with the data
This paper has been cited 1163 times, except it DOES NOT EXIST.
— Dan Quintana (@dsquintana) October 9, 2020
This 'paper' was used in a style guide as a citation example, was included in some papers by accident, and then propagated from there, illustrating how some authors don't read *titles* let alone abstracts or papers pic.twitter.com/oJFMVnIYi8
Are we making meaningful progress on machine learning for EHR data?
— Andrew Beam (@AndrewLBeam) October 6, 2020
New preprint with @DavidRBellamy and Leo Celi tries to answer this question through the lens of benchmarks and the answer is, unfortunately, "probably not"
Paper: https://t.co/KjgjvC0eWQ
Some highlights 👇 pic.twitter.com/5xZf5iqJQs
Finding real world datasets is hard. All companies keep metric data private. But most of @Wikimedia's metrics are public!
— Chris Albon (@chrisalbon) October 6, 2020
Here are the live metrics for our system serving models that predict article quality, edit quality etc. Enjoy!https://t.co/O7nhT5hb5F pic.twitter.com/YTc4TVSG9x
Do you guys remember that killer documentary Inside Job, that exposed so many economists as the scientific authority for ridiculous derivatives and securities products that created the financial crisis? We need a analogous doc for big tech: https://t.co/4Z2NwLXPxi
— Cathy O'Neil (@mathbabedotorg) October 5, 2020
Great source of reading pointers, as usual!
— Andrej Karpathy (@karpathy) October 4, 2020
~75% of papers now use PyTorch, still positively trending. 1,000 companies are using Hugging Face's Transformers lib in prod, with 5M+ pip installs. https://t.co/FbcNuXLIic
Good advice! For classification models, a scatter plot of the cross-entropy loss vs. prediction entropy (~confidence) for individual examples can be very revealing.
— Sander Dieleman (@sedielem) October 2, 2020
More generally: study model behaviour for individual data points, don't look at aggregate statistics exclusively. https://t.co/8BPF780BzC
Don't cook up use cases for #AI when simple human input solved the problem. At best it is a poor substitute for human. At worst you are propagating #AI #bias ans amplifying and automating it https://t.co/uU6MQvbrRT
— Prof. Anima Anandkumar (@AnimaAnandkumar) October 1, 2020
Deploying ML systems isn't just about getting ML systems to the end-users.
— Chip Huyen (@chipro) September 29, 2020
It's about building an infrastructure so the team can be quickly alerted when something goes wrong, figure out what went wrong, test in production, roll-out/rollback updates.
It's fun!
(6/6)