The other dimension here is monolingual vs multilingual models. I think monolingual models in low-resource languages currently have an edge e.g. as seen in our MultiFiT paper.
— Sebastian Ruder (@seb_ruder) January 28, 2020
The other dimension here is monolingual vs multilingual models. I think monolingual models in low-resource languages currently have an edge e.g. as seen in our MultiFiT paper.
— Sebastian Ruder (@seb_ruder) January 28, 2020
I feel like good-old LSTM (or QRNN) are usually better for text classification indeed.
— Thomas Wolf (@Thom_Wolf) January 28, 2020
Note that for those who want to give a try at text classification with pretrained Bert models, you can give a look at the experimental section of this paper https://t.co/w4MWPTB79u
Even my continued fiddling with the SHA-RNN model shows there's a _lot_ to be studied and explored. I haven't published new incremental progress but you can tie the RNN across the 4 layers to substantially decrease total params yet get nearly equivalent perplexity results.
— Smerity (@Smerity) January 28, 2020