Investigating the Limitations of the Transformers with
— AK (@ak92501) February 26, 2021
Simple Arithmetic Tasks
pdf: https://t.co/jGGlcZ3l8V
abs: https://t.co/GY8swbXHBu pic.twitter.com/KLpxmQRq6j
Investigating the Limitations of the Transformers with
— AK (@ak92501) February 26, 2021
Simple Arithmetic Tasks
pdf: https://t.co/jGGlcZ3l8V
abs: https://t.co/GY8swbXHBu pic.twitter.com/KLpxmQRq6j
What happens when you mix the SHA-RNN with the SRU, similar to the QRNN? 2.5-10x less training time and darn close to SotA results on the enwik8, WikiText-103, and Billion Word language modeling datasets.
— Smerity (@Smerity) February 26, 2021
Impressive work from @taolei15949106 at @asapp!
See https://t.co/aNCqhTLnn6 https://t.co/eD3mWPJnwo
This is beyond concerning - it's totally inappropriate. It structurally undermines the integrity of research. https://t.co/BYRepnqY5z
— Kate Crawford (@katecrawford) February 25, 2021
FairScale, a PyTorch extension for efficient large scale training, is releasing FullyShardedDataParallel, which shards model params across GPUs (+offload to CPU). Details: https://t.co/xshPfLeXyr. Inspired by DeepSpeed/@MSFTResearch, and made by @myleott @m1nxu @sam_shleifer pic.twitter.com/1ICMsJwtUP
— PyTorch (@PyTorch) February 25, 2021
NEW: it’s a while since I’ve done a big international Covid thread, but this one feels important.
— John Burn-Murdoch (@jburnmurdoch) February 25, 2021
The first six weeks of 2021 have gone rather well in terms of humanity’s fight against Covid.
As well as the rollout of vaccines, global cases halved(!) between Jan 11 and Feb 18 pic.twitter.com/bnoxNkUZsu
Something like half the appendix of the DALL-E paper (https://t.co/fIBdsdA3lQ) describes work the authors had to do on GPUs that they wouldn't have had to do on TPUs:
— James Bradbury (@jekbradbury) February 25, 2021
- scaling fp16 mixed precision
- reducing gradient all-reduce comms w/ PowerSGD
- manual optimizer sharding
Hierarchical variational autoencoders are getting more powerful every day. This paper looks at ways to convert a VAE into an image completion generative model. It seems we no longer need GANs or adversarial losses for this level of realism anymore? https://t.co/1pL8QTAKsC https://t.co/wNJPh9CdN4
— hardmaru (@hardmaru) February 25, 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
— AK (@ak92501) February 25, 2021
pdf: https://t.co/zq8nL4Kpab
abs: https://t.co/BuHoKo502s
github: https://t.co/b1SirihVe6 pic.twitter.com/ijv8bVQCbj
Zero-Shot Text-to-Image Generation
— AK (@ak92501) February 25, 2021
pdf: https://t.co/1jrZu1ibgE
abs: https://t.co/chlumVCi7H pic.twitter.com/DlQZkvDSiZ
I've added a. quick intro to ggfx for those curious about how to use it https://t.co/PNegGjQy9I
— Thomas Lin Pedersen (@thomasp85) February 24, 2021
DALL-E code & notebook
— AK (@ak92501) February 24, 2021
github: https://t.co/KW8Rl9lbes pic.twitter.com/Avv9dUqe7K
✅ feeling lost 98% of the time and not knowing if your approach makes sense
— Radek Osmulski (@radekosmulski) February 24, 2021
✅ identifying and trusting good advice among so much noise
✅ finding the time to study when being a parent, a student, an employee
✅ learning how to pick projects to work on for self-study