While DropOut has been considered to prevent the co-adaption among hidden neurons to regularize the model, DropOut actually makes gradient flows even when the activation functions are saturated and helps the optimization converges to the flat minima. https://t.co/YEH33wWqRZ
— Daisuke Okanohara (@hillbig) July 2, 2018