Bayesian model averaging mitigates double descent! We have just posted this new result in section 7 of our paper on Bayesian deep learning with @Pavel_Izmailov: https://t.co/midasGNPYn. The result highlights the importance of *multi-modal* marginalization with Multi-SWAG. 1/3 pic.twitter.com/ZbhxGdjW5I— Andrew Gordon Wilson (@andrewgwils) April 28, 2020
This is the model from Report 13 of the Imperial College COVID-19 response team. It fits the death data jointly from 11 European countries to estimate the reproduction number and the effect of lockdowns. Such a remarkable piece of @mcmc_stan https://t.co/Hhmbt3KVcQ— Gilles Louppe (@glouppe) March 30, 2020
https://t.co/8dyihEQqk5 — careful and expensive MCMC Bayesian inference over NN parameters is *worse* than point estimates or low temperature posteriors.— Ilya Sutskever (@ilyasut) February 7, 2020
Supports @carlesgelada and @jacobmbuckman’s view that Bayesian NNs are not meaningful probably because the prior is wrong.
Bayesian methods are *especially* compelling for deep neural networks. The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. This difference will be greatest for underspecified models like DNNs. 1/18 https://t.co/YFxRK1Ho9H— Andrew Gordon Wilson (@andrewgwils) December 27, 2019