Tweeted By @PyTorch
code and pre-trained models to reproduce the recent paper "Scaling Neural Machine Translation" (https://t.co/mrRDmlwax1) where we train on up to 128 GPUs with half precision floating point operations as well as delayed batching.
— PyTorch (@PyTorch) June 16, 2018