Ceshine's Data Science Tweet Collection

by _inesmontani on 2019-07-17 (UTC).

Here's the built-in spaCy version for aligning different tokenization (e.g. 🤗's BERT tokens and spaCy tokens):

```
from https://t.co/rsigVJnBju import align

cost, a2b, b2a, a2b_multi, b2a_multi = align(bert_tokens, spacy_tokens)
``` pic.twitter.com/tICtGXNSem
— Ines Montani 〰️ (@_inesmontani) July 17, 2019

nlp tool

by _inesmontani on 2019-07-17 (UTC).

📖 Also just added usage and API docs for tokenization alignment, including an example of how to interpret the generated alignment information: https://t.co/00puBek3df pic.twitter.com/fLgYV1rEDj
— Ines Montani 〰️ (@_inesmontani) July 17, 2019

nlp tool

by revodavid on 2019-07-17 (UTC).

Microsoft open-sources scripts and notebooks to pre-train and finetune BERT natural language model with domain-specific texts: https://t.co/JRe0e6joP2
— David Smith (@revodavid) July 17, 2019

nlp tool w_code

Tags