Tweeted By @apaszke
Basically lazy pays graph construction + GPU cost (because you only start computing once you see all of it), while eager pays only the GPU cost (because GPU runs async while you queue the kernels).
— Adam Paszke (@apaszke) October 6, 2018