Tweeted By @svlevine
Conservative safety critics use conservative Q-learning (CQL) to learn a safety critic, exploiting the lower bound property of CQL to provide guarantees on safety.
— Sergey Levine (@svlevine) October 29, 2020
w/ @mangahomanga, Aviral Kumar, @nick_rhinehart, @florian_shkurti, @animesh_garg https://t.co/6BjwHz9Zjx
-> pic.twitter.com/9dYxUjBSEn