Tweeted By @hardmaru
Do We Need Zero Training Loss After Achieving Zero Training Error?
— hardmaru (@hardmaru) April 20, 2020
By not letting the training loss to go to zero, model will “random walk” with the same non-zero loss and drift into an area with a flat loss landscape that leads to better generalization.https://t.co/3hRep0ntPP pic.twitter.com/sMJPZLo4bO