Lazy is reasonable in C++, where application-logic gets compiled and is in order of nano-seconds, it quickly breaks down in Python, where application logic is interpreted. Doing N for-loops queuing compute for a batch-size N will effectively introduce N * 1us overhead. per op
— Soumith Chintala (@soumithchintala) October 6, 2018