Tweeted By @OriolVinyalsML

on 2021-06-11 (UTC)
research rl

MuZero removed simulators in MBRL vs AlphaGo. VQ Models for Planning generalize to partial observable & stochastic environments. How?

1. Discretize states w/ VQVAE
2. Train a LM over states
3. Plan w/ MCTS using the LM

Led by @yazhe_li & @sherjilozair https://t.co/thvB6Ke1EA pic.twitter.com/tsXGcrweTZ
— Oriol Vinyals (@OriolVinyalsML) June 11, 2021

Tweeted By @OriolVinyalsML

Tags