Have you wondered why object detection, unlike classification, has so many sophisticated algorithms?
— Ting Chen (@tingchenai) September 23, 2021
With Pix2Seq (https://t.co/ygsG3aAIbG), we simply cast object detection as a language modeling task conditioned on pixels!
(with @srbhsxn, Lala Li, @fleet_dj, @geoffreyhinton) pic.twitter.com/aTYZ5IvJc9