nVidia sets World Record BERT Training Time - 47mins https://t.co/VaEdfNyN14
β /MachineLearning (@slashML) February 14, 2020
nVidia sets World Record BERT Training Time - 47mins https://t.co/VaEdfNyN14
β /MachineLearning (@slashML) February 14, 2020
Here is a @Kaggle dataset of 28,752 Project Gutenberg books that has been used for the recent @DeepMind long-range memory research.https://t.co/e0OzBD1wlf#AI #ML #deeplearning #NLP @KaggleDatasets pic.twitter.com/hsTQciAY7C
β Bojan Tunguz (@tunguz) February 11, 2020
This is great work that collects corpora and evaluates models for two extremely low-resource languages spoken in Africa π, Twi and Yoruba.
β Sebastian Ruder (@seb_ruder) February 11, 2020
Link to the paper: https://t.co/Cm5YhzTbBL https://t.co/mjAvuVyPjE
The search engine of the future will know "who Jason Mraz is engaged to" not by querying some semi-manually curated semantic triplet graph, but simply running a "fact answering neural network" on the raw question. https://t.co/3MfRxQVEfN
β Eric Jang πΊπΈπΉπΌ (@ericjang11) February 11, 2020
We evaluated on Natural Questions, WebQuestions, and TriviaQA, outperforming all previous open-domain systems on NQ and WQ.
β Adam Roberts (@ada_rob) February 11, 2020
(3/5) pic.twitter.com/Xv23zOBDm9
New preprint: How Much Knowledge Can You Pack into the Parameters of a Language Model?
β Adam Roberts (@ada_rob) February 11, 2020
We show that T5 outperforms all previous open-domain QA systems *without using any external knowledge or context*.
Joint work w/ @colinraffel & Noam Shazeer.https://t.co/Ojg3wSUDQq
(1/5) pic.twitter.com/3adQ59LFYr
Turing-NLG: A 17-billion-parameter language model
β hardmaru (@hardmaru) February 11, 2020
βAny model with more than 1.3B parameters cannot fit into a single GPU (even one with 32GB memory)β¦ The resulting T-NLG model has 78 Transformer layers with a hidden size of 4256 and 28 attention heads.βhttps://t.co/bRjEacrZma
Two major methods for learning multilingual embeddings are 1. monolingual training then alignment and 2. joint training. Our #ICLR2020 paper asks "why not do both?" Result: even jointly-trained embeddings still benefit significantly from alignment: https://t.co/HanXbBhDko pic.twitter.com/7ozKHjSmAz
β Graham Neubig (@gneubig) February 7, 2020
New dataset: 4.5B parallel sentences in 576 langage pairs. https://t.co/nWY5egojho
β Yann LeCun (@ylecun) February 7, 2020
TyDi QA is a new multilingual dataset for information-seeking question answering featuring 11 Typologically Diverse languages and over 200k QA pairs. Learn more and start experimenting with the data and code β https://t.co/azeYPUvqXZ
β Google AI (@GoogleAI) February 6, 2020
New models from:
β Julien Chaumond (@julien_c) February 3, 2020
- @Wietsedv (Dutch BERT),
- @douwekiela at Facebook AI (MMBT, multi-modal model)
- @formiel, @laurent_besacie et al. (FlauBERT, French-trained XLM-like)
- @loretoparisi, @simofrancia et al. at @musixmatch (UmBERTo, Italian CamemBERT-like) pic.twitter.com/qgJ0fwqiwC
There is a little known project call News Crawl from the folks at @CommonCrawl that has a giant real time S3 archive in WARC format of articles from over 50K news publication feeds: https://t.co/RZUIx7D3Zo github code is also here: https://t.co/DBSiIdR0BE
β Peter Skomoroch (@peteskomoroch) January 30, 2020