Ceshine's Data Science Tweet Collection

by NateSilver538 on 2018-07-18 (UTC).

1. Going through some old data/code I last worked on 8 years ago. One thing I've learned since then is that when combining different datasets or doing other complicated data processing, it pays to be really anal-retentive about missing data or data that doesn't pass sanity checks
— Nate Silver (@NateSilver538) July 18, 2018

thought misc

by NateSilver538 on 2018-07-18 (UTC).

2. Say you have a sample size of 5,000 cases, and of those cases, a dozen or so are missing or look as though they're miscoded. You might think, "That's not bad. I'll just throw those cases out and I'll be fine". And you probably will be fine if that's all there is to it.
— Nate Silver (@NateSilver538) July 18, 2018

misc thought

by NateSilver538 on 2018-07-18 (UTC).

3. But more often than you might think, the missing/miscoded/"outlier" cases indicate a larger, more systematic problem with your code or with your data. My advice is to do due diligence on those cases BEFORE moving on to next steps, because errors tend to compound.
— Nate Silver (@NateSilver538) July 18, 2018

thought misc

by beaurue on 2018-07-19 (UTC).

And: document your decisions and comment in your code. Lather, rinse, repeat. If you don’t write it down, you will forget what you did.
— Elle 🌊 (@beaurue) July 19, 2018

misc

Tags