I'm so excited about our current projects on energy-efficient NLP!
β Thomas Wolf (@Thom_Wolf) August 30, 2019
Distilled models are very complementary to larger models. Training cost make headlines but as large-scale models reach production, inference time will likely account for most of a model's total environmental cost pic.twitter.com/K81ZeGg2Yy