It's so intriguing to see the inductive biases that self-attention is able to learn from unlabelled visual data. https://t.co/4dk61zmaI2
— hardmaru (@hardmaru) May 1, 2021
It's so intriguing to see the inductive biases that self-attention is able to learn from unlabelled visual data. https://t.co/4dk61zmaI2
— hardmaru (@hardmaru) May 1, 2021
Emerging Properties in Self-Supervised Vision Transformers https://t.co/UbIaCJLi9Q
— Ankur Handa (@ankurhandos) April 30, 2021
object segmentation emerges out of ViT networks trained with self-supervision. This information is directly accessible in the self-attention modules of the last block. pic.twitter.com/KCCpIg87z9
Visformer: The Vision-friendly Transformer
— AK (@ak92501) April 27, 2021
pdf: https://t.co/koVlCmMnuU
abs: https://t.co/1YTeVdr2Fg
github: https://t.co/wHVArHwjtv pic.twitter.com/mouLZ9DT94
https://t.co/riRO1eqBQw uses Open AI CLIP for finding red sky in video from https://t.co/7HyH8Eex1i https://t.co/rhInmyiyLL pic.twitter.com/fZa9iR1bpJ
— AK (@ak92501) April 17, 2021
Image Super-Resolution via Iterative Refinement
— AK (@ak92501) April 16, 2021
pdf: https://t.co/gzc7bPfYqy
abs: https://t.co/GNRDXh2I6P pic.twitter.com/eViT6nX6L3
pytorchvideo: A deep learning library for video understanding research
— AK (@ak92501) April 15, 2021
github: https://t.co/oXc1YiASd4
RepVGG: Making VGG-style ConvNets Great Again
— Andrej Karpathy (@karpathy) April 11, 2021
paper: https://t.co/Y5WfgvqxHO
PyTorch code: https://t.co/ydk0RUf6JU
👌Spells out the benefits of very simple/uniform/fast (latency, not FLOPS) deployment architectures. A lot of complexity often due to optimization, not architecture. pic.twitter.com/8GliE4JDiq
InfinityGAN: Towards Infinite-Resolution Image Synthesis
— AK (@ak92501) April 9, 2021
pdf: https://t.co/HrNqpbxnhA
abs: https://t.co/aZX57FbWS5
project page: https://t.co/KhKCzlpC8k pic.twitter.com/tE7UMTsCyE
SiT: Self-supervised vIsion Transformer
— AK (@ak92501) April 9, 2021
pdf: https://t.co/LUmX5fyCbn
abs: https://t.co/ms4ksWdHnD pic.twitter.com/0ALi1YPLTF
Towards General Purpose Vision Systems
— AK (@ak92501) April 5, 2021
pdf: https://t.co/lYmA9BIa3n
abs: https://t.co/KjXW1aQGBB
project page: https://t.co/U37GTpAxeI pic.twitter.com/PhyPZsAniH
Language-based Video Editing via Multi-Modal Multi-Level Transformer
— AK (@ak92501) April 5, 2021
pdf: https://t.co/APK6dVUCyO
abs: https://t.co/IGCqPC2zWH pic.twitter.com/j9XOJQZCOb
Happy to introduce EfficientNetV2: Smaller Models and Faster Training
— Mingxing Tan (@tanmingxing) April 2, 2021
Achieved faster training and inference speed, AND also with better parameters efficiency.
Arxiv: https://t.co/YHWEb8pHmR
Thread 1/4 pic.twitter.com/LY2oZ4tSbN