If you're interested in interpretability and better understanding #NLProc models 🔎, read this excellent TACL '19 survey by @boknilev. Clearly covers important research areas.— Sebastian Ruder (@seb_ruder) January 11, 2019
Appendix (categorizing all methods): https://t.co/a8mFNzNd7i
I expected the Transformer-based BERT models to be bad on syntax-sensitive dependencies, compared to LSTM-based models.— (((ل()(ل() 'yoav)))) (@yoavgo) January 6, 2019
So I run a few experiments. I was mistaken, they actually perform *very well*.
More details in this tech report: https://t.co/6hV9YoOvN8 pic.twitter.com/O0YwRnp7QH
"Elimination of All Bad Local Minima in Deep Learning— ML Review (@ml_review) January 5, 2019
Proves, without any strong assumption, that adding one neuron per output unit can eliminate all suboptimal local minima for multi-class classification/regression with an arbitrary loss functionhttps://t.co/6PmJ8mJp1s