Tweeted By @srush_nlp
Still cropping and modifying BERT diagrams from Devlin et al. (2019)? Maybe don't?
— Sasha Rush (@srush_nlp) July 21, 2020
Jimmy's diagram below is super awesome. But for most cases BERT is a (very useful magic) feed-forward network. Draw a box. https://t.co/Gsox1y89Mr pic.twitter.com/DZ6y9rzj07