Let's get started.
— David Page (@dcpage3) September 11, 2019
What does the original paper have to say? pic.twitter.com/SUehxk58Jt
Let's get started.
— David Page (@dcpage3) September 11, 2019
What does the original paper have to say? pic.twitter.com/SUehxk58Jt
Recent papers have studied the Hessian of the loss for deep nets experimentally:
— David Page (@dcpage3) September 11, 2019
(@leventsagun et al) https://t.co/JNJKeqZyvZ, https://t.co/Wbk3sSbIbr
(Papyan) https://t.co/l4QcB85nir.
(@_ghorbani et al) https://t.co/VUxknF5QkM compare what happens with and without BN.
So we have given precise experimental meaning to the statement that 'internal covariate shift' limits LRs and that BN works by preventing this...
— David Page (@dcpage3) September 11, 2019
...matching the intuition of the original paper!
More details here:https://t.co/09Li90gCFQ
— David Page (@dcpage3) September 11, 2019
This is the best distillation of recent (and old!) research on batchnorm I've seen.
— Jeremy Howard (@jeremyphoward) September 11, 2019
There is so much to learn about training mechanics by studying this thread and the links it contains. https://t.co/a1PeCy7M1s
Precisely.
— Yann LeCun (@ylecun) September 12, 2019
(I've since been told by my random matrix theory colleagues at Courant that the distribution of eigenvalues of a random covariance matrix can be obtained in a much simpler manner than with the replica symmetry breaking calculations used for this paper). https://t.co/zuOlcut3vB