Homepage
Close
Menu

Site Navigation

  • Home
  • Archive(TODO)
    • By Day
    • By Month
  • About(TODO)
  • Stats
Close
by Smerity on 2019-09-19 (UTC).

Deep learning training tip that I realized I do but never learned from anyone - when tweaking your model for improving gradient flow / speed to converge, keep the exact same random seed (hyperparameters and weight initializations) and only modify the model interactions.

— Smerity (@Smerity) September 19, 2019
misctip
by Smerity on 2019-09-19 (UTC).

- Your model runs will have the exact same perplexity spikes (hits confusing data at the same time)
- You can compare timestamp / batch results in early training as a pseudo-estimate of convergence
- Improved gradient flow visibly helps the same init do better

— Smerity (@Smerity) September 19, 2019
thoughttip
by Smerity on 2019-09-19 (UTC).

This may or may not be a soft pseudo-science variant of the Lottery Ticket Hypothesis / @hardmaru et al's Weight Agnostic Neural Networks. Either way it has worked multiple times over multiple datasets for me and the results seem to generalize.

— Smerity (@Smerity) September 19, 2019
misc
by jeremyphoward on 2019-09-20 (UTC).

I do the opposite and also tell my students to. I find by keeping the randomness as I experiment I get a better intuitive feel for the modeling problem.
Very interesting to hear your very different approach! :)

— Jeremy Howard (@jeremyphoward) September 20, 2019
thoughttip

Tags

learning tutorial misc nlp rstats gan ethics research dataviz survey python tool security kaggle video thought bayesian humour tensorflow w_code bias dataset pytorch cv tip application javascript forecast swift golang rl jax julia gnn causal surey diffusion
© Copyright Philosophy 2018 Site Template by Colorlib