Ceshine's Data Science Tweet Collection

by Smerity on 2019-09-19 (UTC).

Deep learning training tip that I realized I do but never learned from anyone - when tweaking your model for improving gradient flow / speed to converge, keep the exact same random seed (hyperparameters and weight initializations) and only modify the model interactions.
— Smerity (@Smerity) September 19, 2019

misc tip

by Smerity on 2019-09-19 (UTC).

- Your model runs will have the exact same perplexity spikes (hits confusing data at the same time)
- You can compare timestamp / batch results in early training as a pseudo-estimate of convergence
- Improved gradient flow visibly helps the same init do better
— Smerity (@Smerity) September 19, 2019

thought tip

by Smerity on 2019-09-19 (UTC).

This may or may not be a soft pseudo-science variant of the Lottery Ticket Hypothesis / @hardmaru et al's Weight Agnostic Neural Networks. Either way it has worked multiple times over multiple datasets for me and the results seem to generalize.
— Smerity (@Smerity) September 19, 2019

misc

by jeremyphoward on 2019-09-20 (UTC).

I do the opposite and also tell my students to. I find by keeping the randomness as I experiment I get a better intuitive feel for the modeling problem.
Very interesting to hear your very different approach! :)
— Jeremy Howard (@jeremyphoward) September 20, 2019

thought tip

Tags