An Image is Worth 16x16 Words, What is a Video Worth?
— AK (@ak92501) March 26, 2021
pdf: https://t.co/vf53DlyFwy
abs: https://t.co/2wG2qCT53o
github: https://t.co/ZGpmrFK7qy pic.twitter.com/2yUWK0tP5g
An Image is Worth 16x16 Words, What is a Video Worth?
— AK (@ak92501) March 26, 2021
pdf: https://t.co/vf53DlyFwy
abs: https://t.co/2wG2qCT53o
github: https://t.co/ZGpmrFK7qy pic.twitter.com/2yUWK0tP5g
Multi-view subword regularization is simple but yields consistent improvements over pre-trained multilingual models. The best thing: It only needs to be applied during fine-tuning.
— Sebastian Ruder (@seb_ruder) March 16, 2021
Paper: https://t.co/gxTgbzVvWN
Code: https://t.co/FqUyZgEnOQ https://t.co/sTFxot6yan
WIT: Wikipedia-based Image Text Dataset for Multimodal
— AK (@ak92501) March 3, 2021
Multilingual Machine Learning
pdf: https://t.co/fblyzH2hGe
abs: https://t.co/tVgBdfOnQ5
github: https://t.co/NNkF3oheok pic.twitter.com/nnFUaPJaYU
DALL-E code & notebook
— AK (@ak92501) February 24, 2021
github: https://t.co/KW8Rl9lbes pic.twitter.com/Avv9dUqe7K
The new SOTA is in Transformers! DeBERTa-v2 beats the human baseline on SuperGLUE and up to a crazy 91.7% dev accuracy on MNLI task.
— Hugging Face (@huggingface) February 22, 2021
Beats T5 while 10x smaller!
DeBERTa-v2 contributed by @Pengcheng2020 from @MSFTResearch
Try it directly on the hub: https://t.co/HhlL5WrJxp pic.twitter.com/fcUUCiKE0z
Taming Transformers for High-Resolution Image Synthesis https://t.co/6zdyT0HaR0 impressive work/results! (also fun to see a shoutout and my minGPT code used for the transformer :)) pic.twitter.com/cApDT7Yf67
— Andrej Karpathy (@karpathy) February 21, 2021
Intermediate Layer Optimization for Inverse Problems using Deep Generative Models
— AK (@ak92501) February 16, 2021
pdf: https://t.co/kzM10WHfnq
abs: https://t.co/rj8xvuYbNM
github: https://t.co/eQiaBZYX2g pic.twitter.com/b1WiG1TCLc
High-Performance Large-Scale Image Recognition Without Normalization
— AK (@ak92501) February 12, 2021
pdf: https://t.co/THe2NfRI1K
abs: https://t.co/Z68FevANZP
github: https://t.co/Gvw5s5HZIh pic.twitter.com/PGrLhn5oyl
Nystromformer: a new linear self-attention.
— Mingxing Tan (@tanmingxing) February 10, 2021
It turns out a simple Nyström method is quite effective in approximating the full attention, outperforming reformer/linformer/performer by +3% accuracy on LRA. @YoungXiong1
Paper: https://t.co/fTBHZ1F9Lr
Code: https://t.co/BjxpzgIMVG https://t.co/OWzflBMrCt pic.twitter.com/PmYepSLR7r
Colorization Transformer
— AK (@ak92501) February 9, 2021
pdf: https://t.co/QrvdW2sZxJ
abs: https://t.co/D63vl4Sl7E
github: https://t.co/HaEyCaQbxH pic.twitter.com/QLa118WnDv
My awesome colleagues have now released #PyTorch version of StyleGAN2-ADA. (The initial release was in #TensorFlow )
— Ming-Yu Liu (@liu_mingyu) February 1, 2021
ADA uses a clever data augmentation to help address limit sample problems in #GAN training.https://t.co/75Yttri2KS
So cool to see Set Transformer being used for smart model ensembling. Kaggle competitions are becoming more interesting and Transformer models are getting more attention! https://t.co/6OKOutqOKQ pic.twitter.com/cMp2NSBCQu
— Alexandr Kalinin (@alxndrkalinin) January 26, 2021