Tweeted By @svlevine

on 2020-10-30 (UTC)
research rl

We've been studying why deep RL is so hard, and we think we have another reason: implicit under-parameterization: https://t.co/haeE1YX4Ue

Iteratively training on your own targets is a kind of "self-distillation," and leads to loss of rank ->

w/ Aviral Kumar @agarwl_ @its_dibya pic.twitter.com/h97OiV7d4Y
— Sergey Levine (@svlevine) October 30, 2020

Tweeted By @svlevine

Tags