Tweeted By @PyTorch
FairScale, a PyTorch extension for efficient large scale training, is releasing FullyShardedDataParallel, which shards model params across GPUs (+offload to CPU). Details: https://t.co/xshPfLeXyr. Inspired by DeepSpeed/@MSFTResearch, and made by @myleott @m1nxu @sam_shleifer pic.twitter.com/1ICMsJwtUP
— PyTorch (@PyTorch) February 25, 2021