Incidents1M: a large-scale dataset of images with natural disasters, damage, and incidents
β AK (@ak92501) January 13, 2022
abs: https://t.co/ehdybJKTr5
project page: https://t.co/HnfvJnYcSM pic.twitter.com/6Q2vHRfCZm
Incidents1M: a large-scale dataset of images with natural disasters, damage, and incidents
β AK (@ak92501) January 13, 2022
abs: https://t.co/ehdybJKTr5
project page: https://t.co/HnfvJnYcSM pic.twitter.com/6Q2vHRfCZm
QuadTree Attention for Vision Transformers
β AK (@ak92501) January 11, 2022
abs: https://t.co/ImFIQAQsZn
4.0% improvement in feature matching on ScanNet, about 50% flops reduction in stereo matching, 0.4-1.5%
improvement in top-1 accuracy on ImageNet classification, 1.2-1.8% improvement on COCO object detection pic.twitter.com/0yxa0VnMCO
Impressive results via data distillation (aka reducing a large dataset to a synthetic, smaller one). Here, the researchers represent the 50k images in CIFAR-10 via just 10 images. A model trained on these 10 img achieves 64% accuracy on the orig test set https://t.co/MZHHSaDcK5 pic.twitter.com/3LRn4xAOse
β Sebastian Raschka (@rasbt) December 29, 2021
Vision Transformer for Small-Size Datasets
β AK (@ak92501) December 28, 2021
abs: https://t.co/uy4e4Mo44m
SPT and LSA were applied to the ViTs, the performance improved by an average of 2.96% in TinyImageNet, which is a representative small-size dataset pic.twitter.com/QxZUoqJGCQ
πPart 2- Summary of 10 summaries on:
β AI Fast Track (60/60) (@ai_fast_track) December 6, 2021
Tips & Trick & Best Practices in training (not only) object detection models.
Don't miss any of those posts, follow @ai_fast_track to catch them in your feed.
π Summary of summaries: ... pic.twitter.com/VLcWNkMaph
.@Gradio Demo for Pyxelate: convert images to pixel art now on @huggingface Spaces
β AK (@ak92501) December 4, 2021
demo: https://t.co/T8cBs8lk0o
github: https://t.co/pdWuidyvaM pic.twitter.com/s0PM9KHK63
BEVT: BERT Pretraining of Video Transformers
β AK (@ak92501) December 3, 2021
abs: https://t.co/6BI5E3f9Cv pic.twitter.com/tV5ASUKHMd
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
β AK (@ak92501) December 1, 2021
abs: https://t.co/JkGgzi64CW
experiments on ImageNet, method obtains more than 2Γ improvement on efficiency compared to sota vision
transformers with 0.8% drop of accuracy pic.twitter.com/wIcMjPA72X
Donut π© : Document Understanding Transformer without OCR
β AK (@ak92501) December 1, 2021
abs: https://t.co/A644UXgUuG
achieves sota performance on various document understanding tasks in public benchmark datasets and private industrial service datasets pic.twitter.com/2broMiK9r5
Pyramid Adversarial Training Improves ViT Performance
β AK (@ak92501) December 1, 2021
abs: https://t.co/oaxB6Q99R2
new sota for ImageNet-C (41.4 mCE), ImageNetR (53.92%), and ImageNet-Sketch (41.04%) without extra
data, using only the ViT-B/16 backbone and pyramid
adversarial training pic.twitter.com/afdSG0J35i
Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity
β AK (@ak92501) November 30, 2021
abs: https://t.co/U4I6gnNvej
Sparse DETR achieves better performance than Deformable DETR even with only 10% encoder tokens on the COCO dataset pic.twitter.com/gcERMMxUu4
Self-slimmed Vision Transformer
β AK (@ak92501) November 25, 2021
abs: https://t.co/BHQIZVWZlN pic.twitter.com/ToB9DODFwU