Tweeted By @OriolVinyalsML
MuZero removed simulators in MBRL vs AlphaGo. VQ Models for Planning generalize to partial observable & stochastic environments. How?
— Oriol Vinyals (@OriolVinyalsML) June 11, 2021
1. Discretize states w/ VQVAE
2. Train a LM over states
3. Plan w/ MCTS using the LM
Led by @yazhe_li & @sherjilozair https://t.co/thvB6Ke1EA pic.twitter.com/tsXGcrweTZ