Tweeted By @hardmaru
Thinking Like Transformers
— hardmaru (@hardmaru) June 16, 2021
RNNs have direct parallels in finite state machines, but Transformers have no such familiar parallel. This paper aims to change that. They propose a computational model for the Transformer in the form of a programming language.https://t.co/OuPBSrS1EJ pic.twitter.com/zYyAcH2zQd