Tweeted By @HaoTan5
*Vokenization*: a visually-supervised language model attempt in our #emnlp2020 paper: https://t.co/r9MZNniAhn (w. @mohitban47)
— Hao Tan (@HaoTan5) October 15, 2020
To improve language pre-training, we extrapolate multimodal alignments to lang-only data by contextually mapping tokens to related images ("vokens") 1/4 pic.twitter.com/wuXt1K58BH