Tweeted By @RogerGrosse

on 2019-03-08 (UTC)
research

a group with 1 other tweets.

Interestingly, the hyperparameters seem to equilibrate over a shorter timescale than the weights, allowing us to learn a schedule. E.g., start with low dropout, then crank it up once the network starts overfitting. Works better than any fixed value! pic.twitter.com/Mw3mp7ph3f
— Roger Grosse (@RogerGrosse) March 8, 2019

Tags