Tweeted By @svlevine

on 2020-10-29 (UTC)
rl research

Conservative safety critics use conservative Q-learning (CQL) to learn a safety critic, exploiting the lower bound property of CQL to provide guarantees on safety.

w/ @mangahomanga, Aviral Kumar, @nick_rhinehart, @florian_shkurti, @animesh_garg https://t.co/6BjwHz9Zjx

-> pic.twitter.com/9dYxUjBSEn
— Sergey Levine (@svlevine) October 29, 2020

Tweeted By @svlevine

Tags