Keras implementation of DeepLabV3+ for image segmentation: https://t.co/LtI5EALWWG pic.twitter.com/6vXUkKSTQS
— François Chollet (@fchollet) October 22, 2018
Keras implementation of DeepLabV3+ for image segmentation: https://t.co/LtI5EALWWG pic.twitter.com/6vXUkKSTQS
— François Chollet (@fchollet) October 22, 2018
PyTorch implementation of DeepLabV3, trained on the Cityscapes dataset. https://t.co/skw18iP6A7 #deeplearning #machinelearning #ml #ai #neuralnetworks #datascience #pytorch
— PyTorch Best Practices (@PyTorchPractice) October 21, 2018
By breaking the training of sequence-to-sequence models into 5 steps (encoding, decoding, attending, predicting, search), SEQ2SEQ-VIS visualizes training and possible errors at each step. It'd be great to have this built-in with TensorFlow / PyTorch https://t.co/PJFlrNoAKv
— Chip Huyen (@chipro) October 18, 2018
They also released code to reproduce all of the experiments in the paper that they claim achieve new state of the art results: https://t.co/cGUYNpcRzG
— hardmaru (@hardmaru) October 17, 2018
I wrote an in-depth analysis of how GPUs would compare against TPUs for training BERT. I conclude that current GPUs are about 30-50% slower than TPUs for this task https://t.co/BG8mIqQWMj
— Tim Dettmers (@Tim_Dettmers) October 17, 2018
There’s more to online influencing than “fake news” and censorship: examining agenda-setting and framing in Russian News—collaborative #emnlp2018 #NLProc paper by @anjalie_f @KligerD Shuly Winter @jenjpan @jurafsky Yulia Tsvetkov. https://t.co/PmR75sOa7K pic.twitter.com/3MaWJ7exsJ
— Stanford NLP Group (@stanfordnlp) October 17, 2018
Temporal Value Transport - new heuristic for dealing with long-term credit assignment in RL with memory-augmented NNs: https://t.co/vPImOO3DPJ . Work like this and RUDDER are trying to address a fundamental problem with RL. pic.twitter.com/nVY7SRPDXn
— Kaixhin (@KaiLashArul) October 17, 2018
Quite exciting to see several(!) independent ongoing works @CERN using our adversarial decorrelation algorithm (https://t.co/6ZNCkf8mRt) for real physics analysis! pic.twitter.com/lHOr2fh9lz
— Gilles Louppe (@glouppe) October 17, 2018
The discriminator often knows something about the data distribution that the generator didn't manage to capture. By using rejection sampling, it's possible to knock out a lot of bad samples. https://t.co/J1KHhgKfca
— Ian Goodfellow (@goodfellow_ian) October 17, 2018
Dynamic word embeddings: instead of using one type of embedding, the model chooses a linear combination of different embeddings (glove, word2vec, fasttext) Outperforms single embedding on SNLI and Sentiment analysishttps://t.co/OJsAOc3gav
— Chip Huyen (@chipro) October 16, 2018
Cool stuff! 👏@lukasheinrich_ @HEPfeickert @pablodecm @kratsg created a pure python (with @TensorFlow & @PyTorch backends) implementation of HistFactory, a tool I originally wrote with @HerbieLewis & Akira Shibata. @diana_hep https://t.co/jXDRTFSCS8 https://t.co/twCKRrQhZr
— Kyle Cranmer (@KyleCranmer) October 16, 2018
I've spent most of 2018 training models that could barely fit 1-4 samples/GPU.
— Thomas Wolf (@Thom_Wolf) October 15, 2018
But SGD usually needs more than few samples/batch for decent results.
I wrote a post gathering practical tips I use, from simple tricks to multi-GPU code & distributed setups: https://t.co/oLe6JlxcVw pic.twitter.com/pQTXQ9X7Ug