ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parametershttps://t.co/PbXx7zrtZm pic.twitter.com/W5xMFsfH0U
— Sebastian Raschka (@rasbt) February 11, 2020
ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parametershttps://t.co/PbXx7zrtZm pic.twitter.com/W5xMFsfH0U
— Sebastian Raschka (@rasbt) February 11, 2020
Nice work and the accompanying library/codebase for model-parallelism in PyTorch looks really sweet!
— Thomas Wolf (@Thom_Wolf) February 11, 2020
👉 https://t.co/yqzaf5glBa https://t.co/sg4igQV6xI