Delta Tuning: A Comprehensive Study of Parameter
— AK (@ak92501) March 17, 2022
Efficient Methods for Pre-trained Language Models
abs: https://t.co/8cPUIX2Rfh pic.twitter.com/nPOErYQcfH
Delta Tuning: A Comprehensive Study of Parameter
— AK (@ak92501) March 17, 2022
Efficient Methods for Pre-trained Language Models
abs: https://t.co/8cPUIX2Rfh pic.twitter.com/nPOErYQcfH
Introducing the Multimodal Bottleneck Transformer, a novel transformer-based model for multimodal fusion that restricts cross-modal attention flow to achieve state-of-the-art results on video classification tasks with less compute. Read more ↓ https://t.co/BXMVgap0ID pic.twitter.com/Pb8b3j1A5N
— Google AI (@GoogleAI) March 15, 2022
“Model soups”: Averaging the weights of multiple models fine-tuned with different hyperparameter configurations improves accuracy and robustness, without increasing inference time! @mitchnw et al.https://t.co/QJ4f4MvTHu
— hardmaru (@hardmaru) March 13, 2022
This paper has a very clear presentation of different attention architectures in transformers. I’d be thankful if people could share their experience in trying multi-query vs standard multi-head attention. Thanks https://t.co/aY1AW5etWI
— Nando de Freitas 🏳️🌈 (@NandoDF) March 13, 2022
Temporal Difference Learning for Model Predictive Control
— AK (@ak92501) March 10, 2022
abs: https://t.co/WaBa3e5J0s
project page: https://t.co/30YjYkXjwW pic.twitter.com/Yz0Wg4lGDO
On the surprising tradeoff between ImageNet
— AK (@ak92501) March 10, 2022
accuracy and perceptual similarity
abs: https://t.co/FAWkG1OIX5
show that an inverse-U relationship exists between accuracy and PS across a number of settings pic.twitter.com/GHnj2MJUgP
EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers
— AK (@ak92501) March 9, 2022
abs: https://t.co/Ju4EJMasSZ pic.twitter.com/wZSb4v8ZVv
The (Un)Surprising Effectiveness of Pre-Trained Vision Models for Control
— AK (@ak92501) March 8, 2022
abs: https://t.co/kFVZx80f2u pic.twitter.com/Tm723A7aqC
DiT: Self-supervised Pre-training for Document Image Transformer
— AK (@ak92501) March 7, 2022
abs: https://t.co/OUQ94iQ6dY
achieves sota results on downstream tasks, e.g. document image classification (91.11 → 92.69), document layout analysis (91.0 → 94.9) and table detection (94.23 → 96.55) pic.twitter.com/uZWAMGh71s
TableFormer: Table Structure Understanding with Transformers
— AK (@ak92501) March 3, 2022
abs: https://t.co/RiMdYmdstj pic.twitter.com/gapwX8EgKz
HyperPrompt: Prompt-based Task-Conditioning of Transformers
— AK (@ak92501) March 3, 2022
abs: https://t.co/OOQLIlBIv3
HyperPrompt achieves sota performance on SuperGLUE for T5 models up to XXL pic.twitter.com/Ic1XlqZiqO
Introducing a new approach for training #ML models using noisy data that works by dynamically assigning importance weights to both individual instances and class labels, thus reducing the impact of noisy examples. Learn more about it at https://t.co/lKYl0fzeYD pic.twitter.com/ySCm1HAzKT
— Google AI (@GoogleAI) February 28, 2022