🔥Pytorch-Transformers 1.0🔥— Thomas Wolf (@Thom_Wolf) July 16, 2019
Six NLU/NLG architectures: BERT, GPT, GPT-2, Transfo-XL, XLNet, XLM
Total: 27 pretrained models
Still the same
-SOTA scripts: GLUE, SQuAD, Text generation
-Access hidden-states, attentions...
For all the attention the Wasserstein distance as a loss function gets, there didn't seem to be an efficient batch stable implementation. Now we have one as a @PyTorch extension. On that occasion, we look at a general technique for reductions on the GPU: https://t.co/2hkFKQ0Qww pic.twitter.com/h6NaRhcpQU— Thomas Viehmann (@ThomasViehmann) July 8, 2019
Super excited to be on my way to the International Summer School on Deep Learning in Gdansk, Poland @dl_iss ! To pass the time between layovers, I just finished up my slides & code (DL for Ordinal Regression) & upl. to GitHub...in case you are interested: https://t.co/ZWtnCmLwXV pic.twitter.com/1rlU18IttG— Sebastian Raschka (@rasbt) June 29, 2019