Ceshine's Data Science Tweet Collection

by bneyshabur on 2022-03-17 (UTC).

You think the RNN era is over? Think again!

We introduce "Block-Recurrent Transformer", which applies a transformer layer in a recurrent fashion & beats transformer XL on LM tasks.

Paper: https://t.co/j9GOABGCsx

W. DeLesley Hutchins, Imanol Schlag, @Yuhu_ai_ & @ethansdyer

1/ pic.twitter.com/NDn8YyWoOE
— Behnam Neyshabur (@bneyshabur) March 17, 2022

research

by rasbt on 2022-03-17 (UTC).

Really enjoy working on academic research. However, I recently realized that training models on my GPU workstation != using DL in the real-world. There are lots of cool, compounding techniques to scale-up your PyTorch models. A good & succinct overview: https://t.co/JtfBLvMYXF
— Sebastian Raschka (@rasbt) March 17, 2022

learning survey pytorch

by PyTorch on 2022-03-17 (UTC).

We just assessed the effectiveness of the DDP, Pipe and FSDP distributed training techniques available via PyTorch with different model sizes and network configurations. See the results here. https://t.co/vN191zRclh
— PyTorch (@PyTorch) March 17, 2022

learning pytorch survey

by ak92501 on 2022-03-17 (UTC).

.@Gradio Demo for YOLOX: Exceeding YOLO Series in 2021 on @huggingface Spaces
demo: https://t.co/8iNN2Rhqea
github: https://t.co/rym6pRl10e pic.twitter.com/cBpl9T8akZ
— AK (@ak92501) March 17, 2022

cv w_code application

by ak92501 on 2022-03-17 (UTC).

Delta Tuning: A Comprehensive Study of Parameter
Efficient Methods for Pre-trained Language Models
abs: https://t.co/8cPUIX2Rfh pic.twitter.com/nPOErYQcfH
— AK (@ak92501) March 17, 2022

nlp research

by karpathy on 2022-03-15 (UTC).

Excellent and unintuitive read on GPUs. The chip doing the compute has tiny amount of memory & is connected to the main memory literally through a straw. Most of the energy goes to data movement too. Many repercussions. E.g. latency better predicted by # activations than # flops https://t.co/67PBOfEcNK
— Andrej Karpathy (@karpathy) March 15, 2022

learning survey

by GoogleAI on 2022-03-15 (UTC).

Introducing the Multimodal Bottleneck Transformer, a novel transformer-based model for multimodal fusion that restricts cross-modal attention flow to achieve state-of-the-art results on video classification tasks with less compute. Read more ↓ https://t.co/BXMVgap0ID pic.twitter.com/Pb8b3j1A5N
— Google AI (@GoogleAI) March 15, 2022

research

by BMarcusMcCann on 2022-03-15 (UTC).

Today we launched YouWrite — an AI writing assistant built into https://t.co/iDkPtEe5TU.

For me, this is a special moment that ties my past research to the https://t.co/iDkPtEe5TU vision.

YouWrite example below👇with thoughts on search, research, writing, and meaning. 🧵 pic.twitter.com/hGdtiVfiUq
— Bryan McCann (@BMarcusMcCann) March 15, 2022

nlp application

by rasbt on 2022-03-15 (UTC).

Whoa. 96% of the winning solutions used Python. This is the way.

Interesting tidbit: all winning NLP solutions used transformers. However, most winning computer vision solutions were still convolutional nets (mostly EffficientNet). https://t.co/VQHbqz84Pi
— Sebastian Raschka (@rasbt) March 15, 2022

misc kaggle

by karpathy on 2022-03-14 (UTC).

New blog post!⬆️ Deep Neural Nets: 33 years ago and 33 years from now https://t.co/pbZvYh3Mck we reproduce what I think may be the earliest real-world application of a neural net trained end-to-end with backprop (LeCun et al. 1989), try improve it with time travel, and reflect. pic.twitter.com/MKZ7S3GUdv
— Andrej Karpathy (@karpathy) March 14, 2022

misc learning

by Tim_Dettmers on 2022-03-14 (UTC).

An important but elusive quality to learn in a PhD is research style. It is valuable to be aware of this before you start a PhD. Among other updates, I added an extensive discussion on research style to my "choosing a grad school" blog post. Enjoy! https://t.co/HrMkPbGZcv
— Tim Dettmers (@Tim_Dettmers) March 14, 2022

misc

by hardmaru on 2022-03-13 (UTC).

“Model soups”: Averaging the weights of multiple models fine-tuned with different hyperparameter configurations improves accuracy and robustness, without increasing inference time! @mitchnw et al.https://t.co/QJ4f4MvTHu
— hardmaru (@hardmaru) March 13, 2022

research

Tags