Tweeted By @hardmaru
I love how some of the nitty gritty details are explained.
— hardmaru (@hardmaru) August 20, 2021
For example, the Transformer and it’s positional encoding scheme is relied upon to create a vector representation of the world from images (which are first processed through a common ‘virtual camera’). pic.twitter.com/jjBtmDAfme