Tweeted By @svlevine
We've been studying why deep RL is so hard, and we think we have another reason: implicit under-parameterization: https://t.co/haeE1YX4Ue
— Sergey Levine (@svlevine) October 30, 2020
Iteratively training on your own targets is a kind of "self-distillation," and leads to loss of rank ->
w/ Aviral Kumar @agarwl_ @its_dibya pic.twitter.com/h97OiV7d4Y