Homepage
Close
Menu

Site Navigation

  • Home
  • Archive(TODO)
    • By Day
    • By Month
  • About(TODO)
  • Stats
Close
by NateSilver538 on 2018-07-18 (UTC).

1. Going through some old data/code I last worked on 8 years ago. One thing I've learned since then is that when combining different datasets or doing other complicated data processing, it pays to be really anal-retentive about missing data or data that doesn't pass sanity checks

— Nate Silver (@NateSilver538) July 18, 2018
thoughtmisc
by NateSilver538 on 2018-07-18 (UTC).

2. Say you have a sample size of 5,000 cases, and of those cases, a dozen or so are missing or look as though they're miscoded. You might think, "That's not bad. I'll just throw those cases out and I'll be fine". And you probably will be fine if that's all there is to it.

— Nate Silver (@NateSilver538) July 18, 2018
miscthought
by NateSilver538 on 2018-07-18 (UTC).

3. But more often than you might think, the missing/miscoded/"outlier" cases indicate a larger, more systematic problem with your code or with your data. My advice is to do due diligence on those cases BEFORE moving on to next steps, because errors tend to compound.

— Nate Silver (@NateSilver538) July 18, 2018
thoughtmisc
by beaurue on 2018-07-19 (UTC).

And: document your decisions and comment in your code. Lather, rinse, repeat. If you don’t write it down, you will forget what you did.

— Elle 🌊 (@beaurue) July 19, 2018
misc

Tags

learning tutorial misc nlp rstats gan ethics research dataviz survey python tool security kaggle video thought bayesian humour tensorflow w_code bias dataset pytorch cv tip application javascript forecast swift golang rl jax julia gnn causal surey diffusion
© Copyright Philosophy 2018 Site Template by Colorlib