Tweeted By @Thom_Wolf

on 2018-10-15 (UTC)
research learning survey

a group with 2 other tweets.

I've spent most of 2018 training models that could barely fit 1-4 samples/GPU.
But SGD usually needs more than few samples/batch for decent results.
I wrote a post gathering practical tips I use, from simple tricks to multi-GPU code & distributed setups: https://t.co/oLe6JlxcVw pic.twitter.com/pQTXQ9X7Ug
— Thomas Wolf (@Thom_Wolf) October 15, 2018

Tags