Homepage
Close
Menu

Site Navigation

  • Home
  • Archive(TODO)
    • By Day
    • By Month
  • About(TODO)
  • Stats
Close
by bneyshabur on 2022-03-17 (UTC).

You think the RNN era is over? Think again!

We introduce "Block-Recurrent Transformer", which applies a transformer layer in a recurrent fashion & beats transformer XL on LM tasks.

Paper: https://t.co/j9GOABGCsx

W. DeLesley Hutchins, Imanol Schlag, @Yuhu_ai_ & @ethansdyer

1/ pic.twitter.com/NDn8YyWoOE

— Behnam Neyshabur (@bneyshabur) March 17, 2022
research
by rasbt on 2022-03-17 (UTC).

Really enjoy working on academic research. However, I recently realized that training models on my GPU workstation != using DL in the real-world. There are lots of cool, compounding techniques to scale-up your PyTorch models. A good & succinct overview: https://t.co/JtfBLvMYXF

— Sebastian Raschka (@rasbt) March 17, 2022
learningsurveypytorch
by PyTorch on 2022-03-17 (UTC).

We just assessed the effectiveness of the DDP, Pipe and FSDP distributed training techniques available via PyTorch with different model sizes and network configurations. See the results here. https://t.co/vN191zRclh

— PyTorch (@PyTorch) March 17, 2022
learningpytorchsurvey
by ak92501 on 2022-03-17 (UTC).

.@Gradio Demo for YOLOX: Exceeding YOLO Series in 2021 on @huggingface Spaces
demo: https://t.co/8iNN2Rhqea
github: https://t.co/rym6pRl10e pic.twitter.com/cBpl9T8akZ

— AK (@ak92501) March 17, 2022
cvw_codeapplication
by ak92501 on 2022-03-17 (UTC).

Delta Tuning: A Comprehensive Study of Parameter
Efficient Methods for Pre-trained Language Models
abs: https://t.co/8cPUIX2Rfh pic.twitter.com/nPOErYQcfH

— AK (@ak92501) March 17, 2022
nlpresearch
by karpathy on 2022-03-15 (UTC).

Excellent and unintuitive read on GPUs. The chip doing the compute has tiny amount of memory & is connected to the main memory literally through a straw. Most of the energy goes to data movement too. Many repercussions. E.g. latency better predicted by # activations than # flops https://t.co/67PBOfEcNK

— Andrej Karpathy (@karpathy) March 15, 2022
learningsurvey
by GoogleAI on 2022-03-15 (UTC).

Introducing the Multimodal Bottleneck Transformer, a novel transformer-based model for multimodal fusion that restricts cross-modal attention flow to achieve state-of-the-art results on video classification tasks with less compute. Read more ↓ https://t.co/BXMVgap0ID pic.twitter.com/Pb8b3j1A5N

— Google AI (@GoogleAI) March 15, 2022
research
by BMarcusMcCann on 2022-03-15 (UTC).

Today we launched YouWrite — an AI writing assistant built into https://t.co/iDkPtEe5TU.

For me, this is a special moment that ties my past research to the https://t.co/iDkPtEe5TU vision.

YouWrite example below👇with thoughts on search, research, writing, and meaning. 🧵 pic.twitter.com/hGdtiVfiUq

— Bryan McCann (@BMarcusMcCann) March 15, 2022
nlpapplication
by rasbt on 2022-03-15 (UTC).

Whoa. 96% of the winning solutions used Python. This is the way.

Interesting tidbit: all winning NLP solutions used transformers. However, most winning computer vision solutions were still convolutional nets (mostly EffficientNet). https://t.co/VQHbqz84Pi

— Sebastian Raschka (@rasbt) March 15, 2022
misckaggle
by karpathy on 2022-03-14 (UTC).

New blog post!⬆️ Deep Neural Nets: 33 years ago and 33 years from now https://t.co/pbZvYh3Mck we reproduce what I think may be the earliest real-world application of a neural net trained end-to-end with backprop (LeCun et al. 1989), try improve it with time travel, and reflect. pic.twitter.com/MKZ7S3GUdv

— Andrej Karpathy (@karpathy) March 14, 2022
misclearning
by Tim_Dettmers on 2022-03-14 (UTC).

An important but elusive quality to learn in a PhD is research style. It is valuable to be aware of this before you start a PhD. Among other updates, I added an extensive discussion on research style to my "choosing a grad school" blog post. Enjoy! https://t.co/HrMkPbGZcv

— Tim Dettmers (@Tim_Dettmers) March 14, 2022
misc
by hardmaru on 2022-03-13 (UTC).

“Model soups”: Averaging the weights of multiple models fine-tuned with different hyperparameter configurations improves accuracy and robustness, without increasing inference time! @mitchnw et al.https://t.co/QJ4f4MvTHu

— hardmaru (@hardmaru) March 13, 2022
research
  • Prev
  • 39
  • 40
  • 41
  • 42
  • 43
  • …
  • Next

Tags

learning tutorial misc nlp rstats gan ethics research dataviz survey python tool security kaggle video thought bayesian humour tensorflow w_code bias dataset pytorch cv tip application javascript forecast swift golang rl jax julia gnn causal surey diffusion
© Copyright Philosophy 2018 Site Template by Colorlib