TransMix: Attend to Mix for Vision Transformers
— AK (@ak92501) November 19, 2021
abs: https://t.co/P1m7LHzOqH pic.twitter.com/PJeURpaWpw
TransMix: Attend to Mix for Vision Transformers
— AK (@ak92501) November 19, 2021
abs: https://t.co/P1m7LHzOqH pic.twitter.com/PJeURpaWpw
Restormer: Efficient Transformer for High-Resolution Image Restoration
— AK (@ak92501) November 19, 2021
abs: https://t.co/ioFgvW3TcA
propose an efficient Transformer model by making several key designs in the building blocks such that it can capture long-range pixel interactions pic.twitter.com/Ud1YQlVyAG
Swin Transformer V2: Scaling Up Capacity and Resolution
— AK (@ak92501) November 19, 2021
abs: https://t.co/vBm66uyUBZ
scaling Swin Transformer up to 3 billion parameters and making it capable of training with images of up to 1,536×1,536 resolution, sets new records on four representative vision benchmarks pic.twitter.com/2lvEvCOZ35
XLS-R is our new model for cross-lingual self-supervised speech learning trained on 436K hours of public speech in 128 languages and with up to 2B params.
— Michael Auli (@MichaelAuli) November 18, 2021
Paper: https://t.co/o0HuU7azeu
Blog: https://t.co/w9Ta8ZCLj5
Code/models: https://t.co/hIUFJLPupbhttps://t.co/jjZrgoz686 pic.twitter.com/iENwHgoTuj
(Of course) XGBoost does not always win on tabular datasets. Made a HW where students got to tinker w. hyperparam tuning techniques (grid search, randomized search, Hyperopt, Optuna, successive halving) and algos (GBMs and everything in scikit-learn + mlxtend). Top-10 results: pic.twitter.com/aq88Uw5dkF
— Sebastian Raschka (@rasbt) November 16, 2021
New tutorial for TF Decision Forests!🌲
— Josh Gordon (@random_forests) November 16, 2021
If you’re curious about using trees and neural networks 🧠 together, then check out this new example that shows how to combine them.
Thanks @mat_gb + team.
Learn more → https://t.co/SEo2izdiGZ pic.twitter.com/VgacuTMPnH
Interesting #history map shows the territory that was still held by Nazi Germany at the of of surrender in May 1945. Source: https://t.co/pEqXrBUy1W pic.twitter.com/lYIiXBfcV7
— Simon Kuestenmacher (@simongerman600) November 14, 2021
Great paper and thread!
— Andrej Karpathy (@karpathy) November 13, 2021
- 😮that super simple MSE loss works vs. BEiT-style dVAE (multi-modal) cross-entropy
- <3 efficiency of asymmetric encoder/decoder
- 👏detailed training recipes
- +1 v curious about dataset size scaling
- bit of lack of commentary on test-time protocol https://t.co/MQFAvrqBvr
Learning rate is one of the most important hyperparameters to adjust well during the ML model training.
— Jean de Nyandwi (@Jeande_d) November 12, 2021
A high learning rate can speed up the training, but it can cause the model to diverge. A low rate can slow the training.
Here are different learning rate curves pic.twitter.com/GsGf4phRaA
TensorFlow.js found its killer app for Virtual YouTubers https://t.co/lbh6akLMOq
— hardmaru (@hardmaru) November 12, 2021
Dense Unsupervised Learning for Video Segmentation
— AK (@ak92501) November 12, 2021
abs: https://t.co/zKV94Pn7s5
github: https://t.co/8WQyRw3jw9 pic.twitter.com/nST5odpRMU
Want to know why it's risky to assume that your ML is going to continue work if the test regime changes from your training data? Just ask Zillow.
— Gary Marcus (@GaryMarcus) November 12, 2021
https://t.co/EUH71gZspy