Tag - cv

by _akhaliq on 2022-06-18 (UTC).

iBoot: Image-bootstrapped Self-Supervised Video Representation Learning
abs: https://t.co/dkZUd4QC81 pic.twitter.com/pJFpxd7ckU
— AK (@_akhaliq) June 18, 2022

research cv

by _akhaliq on 2022-06-18 (UTC).

Disentangling visual and written concepts in CLIP
abs: https://t.co/VsyuDV4HNI
project page: https://t.co/2hTQnhR2o1 pic.twitter.com/LbWpnpTTHT
— AK (@_akhaliq) June 18, 2022

research cv nlp

by svlevine on 2022-06-17 (UTC).

Deep nets can be overconfident (and wrong) on unfamiliar inputs. What if we directly teach them to be less confident? The idea in RCAD ("Adversarial Unlearning") is to generate images that are hard, and teach it to be uncertain on them: https://t.co/lJf5aVv3Jr

A thread: pic.twitter.com/pHj76WHnUA
— Sergey Levine (@svlevine) June 17, 2022

research cv

by PyTorch on 2022-06-17 (UTC).

Action recognition is a challenging task but has seen improvements through multimodality. In this latest blog post, learn how @Disney uses PyTorch to improve activity recognition through multimodal approaches. https://t.co/Xqy79N7Eh9 pic.twitter.com/t5kJq5msxx
— PyTorch (@PyTorch) June 17, 2022

application cv pytorch

by _akhaliq on 2022-06-17 (UTC).

Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning
abs: https://t.co/SunKnQH3NJ
project page: https://t.co/MgIrCZlQJv pic.twitter.com/yMHmZmk3J2
— AK (@_akhaliq) June 17, 2022

research cv

by _akhaliq on 2022-06-15 (UTC).

Efficient Decoder-free Object Detection with Transformers
abs: https://t.co/YW4QcqztiW

experiments on the MS COCO benchmark demonstrate that DFFT_SMALL outperforms DETR by 2.5% AP with 28% computation cost reduction and more than 10× fewer training epochs pic.twitter.com/ICOvqgA8xQ
— AK (@_akhaliq) June 15, 2022

research cv

by _akhaliq on 2022-06-15 (UTC).

Peripheral Vision Transformer
abs: https://t.co/c6R8BfNDPS

propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data pic.twitter.com/S78e7WXDKh
— AK (@_akhaliq) June 15, 2022

research cv

by _akhaliq on 2022-06-08 (UTC).

DETR++: Taming Your Multi-Scale Detection Transformer
abs: https://t.co/kOQ5V4vC3C

DETR++, a new architecture that improves detection results by 1.9% AP on MS COCO 2017, 11.5% AP on RICO icon detection, and 9.1% AP on RICO layout extraction over existing baselines pic.twitter.com/Kt3EQRXwuH
— AK (@_akhaliq) June 8, 2022

research cv

by rasbt on 2022-06-06 (UTC).

Yes, machine learning is everywhere. But this one application where it really delivers, my ugly handwriting and all. (Fun fact: my students implemented and live-demoed sth similar as their class project). pic.twitter.com/6VU8BsGcFX
— Sebastian Raschka (@rasbt) June 6, 2022

application cv

by _akhaliq on 2022-06-03 (UTC).

EfficientFormer: Vision Transformers at MobileNet Speed
abs: https://t.co/lfdbJdS46J

EfficientFormer-L1, achieves 79.2% top-1 accuracy on ImageNet-1K with only 1.6 ms inference latency on iPhone 12, which is even a bit faster than MobileNetV2 (1.7 ms, 71.8% top-1) pic.twitter.com/4zcvunXe57
— AK (@_akhaliq) June 3, 2022

research cv

by _akhaliq on 2022-05-30 (UTC).

X-ViT: High Performance Linear Vision Transformer without Softmax
abs: https://t.co/A6HZ2vXKDB pic.twitter.com/kArY0Tm4VE
— AK (@_akhaliq) May 30, 2022

research cv

by _akhaliq on 2022-05-30 (UTC).

GIT: A Generative Image-to-text Transformer for Vision and Language
abs: https://t.co/iFly0pcoXM

model surpasses the human performance for the first time on TextCaps (138.2 vs. 125.5 in CIDEr) pic.twitter.com/vn9LV98Dwr
— AK (@_akhaliq) May 30, 2022

research cv nlp

Tag: cv

Tags