StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN
— AK (@ak92501) November 3, 2021
abs: https://t.co/TGdEjthZlk
github: https://t.co/7q09qdMTdf pic.twitter.com/sGcg5imUAM
StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN
— AK (@ak92501) November 3, 2021
abs: https://t.co/TGdEjthZlk
github: https://t.co/7q09qdMTdf pic.twitter.com/sGcg5imUAM
Can Vision Transformers Perform Convolution?
— AK (@ak92501) November 3, 2021
abs: https://t.co/rsHhON89sV
a single ViT layer with image patches as the input can perform any convolution operation constructively, where the multi-head attention mechanism and the relative positional encoding play essential roles pic.twitter.com/Qw1RqqEfjV
"When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute" by Tao Lei @taolei15949106 - Outstanding Paper at EMNLP https://t.co/7IR25d9Sz2
— Sasha Rush (@srush_nlp) October 30, 2021
(Tao's work is always must read. Combines algorithmic cleverness with practical engineering and experiments.)
We've trained a system to answer grade-school math problems with double the accuracy of a fine-tuned GPT-3 model.
— OpenAI (@OpenAI) October 29, 2021
Multistep reasoning is difficult for today's language models. We present a new technique to help. https://t.co/JRXUYZOSg7
With @YiTayML, @anuragarnab, @giffmana, and @ashVaswani, we wrote up a paper on "the efficiency misnomer": https://t.co/yM6XeykB30
— Mostafa Dehghani (@m__dehghani) October 27, 2021
TL;DR:
"No single cost indicator is sufficient for making an absolute conclusion when comparing the efficiency of different models". pic.twitter.com/EaZ4nVBWEz
Parameter Prediction for Unseen Deep Architectures
— hardmaru (@hardmaru) October 27, 2021
Their graph hypernetwork can predict all 24M parameters of a ResNet-50, achieving 60% CIFAR-10 accuracy, and 50% Top-5 accuracy on ImageNet. A forward pass takes only a fraction of a second, even on a CPU!https://t.co/qRJiDTVRoH
s2s-ft: Fine-Tuning Pretrained Transformer Encoders
— AK (@ak92501) October 27, 2021
for Sequence-to-Sequence Learning
abs: https://t.co/AwURzk5Stg pic.twitter.com/hXAVat6q89
Image-Based CLIP-Guided Essence Transfer
— AK (@ak92501) October 26, 2021
abs: https://t.co/wFGwhJxRCZ
github: https://t.co/38xy9RrC9x
new method creates a blending operator that is optimized to be simultaneously additive in both latent spaces pic.twitter.com/WqKURD8ny8
Self-Supervised Learning by Estimating Twin Class Distributions
— AK (@ak92501) October 25, 2021
abs: https://t.co/LA6IagSCTv
github: https://t.co/QkBgV8FcRU pic.twitter.com/RW5OLHfb3W
SOFT: Softmax-free Transformer with Linear Complexity
— AK (@ak92501) October 25, 2021
abs: https://t.co/EralXVH5CZ
github: https://t.co/4miqmwAGcA
introduced a softmax-free self-attention mechanism for linearizing Transformer’s complexity in space and time pic.twitter.com/85Mw5MJOUc
Here is a one-year perspective on @chrmanning's question (data courtesy of @SemanticScholar). Very interesting result: EMNLP has much fewer little-cited papers, but Findings has more very-highly-cited papers. Findings high-risk, sometimes high reward. 1/2 https://t.co/BnouEtU03e pic.twitter.com/d4Fj9Tv5YB
— Graham Neubig (@gneubig) October 21, 2021
FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling
— AK (@ak92501) October 20, 2021
abs: https://t.co/FDZTByplIY
FlexMatch outperforms FixMatch by 14.32% and 24.55% on CIFAR-100 and STL-10 datasets respectively, when there are only 4 labels per class pic.twitter.com/uMYQ171WoL