Steve's YouTube channel is awesome. I found the control bootcamp series useful a few years ago to review some basic concepts. https://t.co/tuRzCqg0jT https://t.co/rD6pAsP4gS
β hardmaru (@hardmaru) February 16, 2021
Steve's YouTube channel is awesome. I found the control bootcamp series useful a few years ago to review some basic concepts. https://t.co/tuRzCqg0jT https://t.co/rD6pAsP4gS
β hardmaru (@hardmaru) February 16, 2021
Introducing the World Models Library, an open-source, platform-agnostic suite of tasks and tools for examination of world model design and performance in visual model-based reinforcement learning. Learn more and grab the code at https://t.co/l9jKoBWS2V pic.twitter.com/sT6W4m2FDe
β Google AI (@GoogleAI) February 3, 2021
We thank @svlevine for his excellent talk "Data-Driven Reinforcement Learning: Deriving Common Sense from Past Experience" last Friday, now available on our YouTube channel. https://t.co/RfFMxmLivj
β UCL CSML (@uclcsml) January 10, 2021
Offline model-based RL for goal reaching: learn a distance "Q-like" function from offline data, and a video prediction model, then use them to accomplish visually indicated goals.
β Sergey Levine (@svlevine) January 10, 2021
w/ Stephen Tian et al.https://t.co/pmXL8fGHXvhttps://t.co/x9XXI7PN06
π§΅> pic.twitter.com/G3t23nBWXo
Aviral Kumar and I have posted our NeurIPS offline reinforcement learning tutorial on YouTube for your enjoyment :)
β Sergey Levine (@svlevine) December 17, 2020
Slides, colab exercise, etc.: https://t.co/S639WkAroh
Part 1: https://t.co/OozPaXLVhF
Part 2: https://t.co/MPLhyipS1K
PyGeneses is a Deep Reinforcement Learning framework that attempts to simulate artificial agents in bio-inspired environments. One of the use cases features researching various possible behavior trends and drawing parallels with the real world. https://t.co/sfNC5WJsy8
β PyTorch (@PyTorch) December 4, 2020
One Learning to RL them all:
β Yann LeCun (@ylecun) December 4, 2020
ReBeL (Recursive Belief-based Learning) is a general RL+Search method that works for all two-player zero-sum games, including imperfect-information games (poker, liar's dice,...) and perfect-information games (chess, go....). https://t.co/2sw8Zbe8rg
"Coax is Plug-n-Play reinforcement learning in Python usingΒ @OpenAI Gym,Β JAX, and @DeepMind's Haiku.
β π©βπ» Paige Bailey @ 127.0.0.1 π (@DynamicWebPaige) November 6, 2020
For the full documentation, including many examples, go toΒ https://t.co/cVMYPnbp8j."https://t.co/PXdfWLZnWR pic.twitter.com/UZwLeMOGU7
We've been studying why deep RL is so hard, and we think we have another reason: implicit under-parameterization: https://t.co/haeE1YX4Ue
β Sergey Levine (@svlevine) October 30, 2020
Iteratively training on your own targets is a kind of "self-distillation," and leads to loss of rank ->
w/ Aviral Kumar @agarwl_ @its_dibya pic.twitter.com/h97OiV7d4Y
Gamma-models are dynamics models without a fixed time step. Instead, gamma models predict discounted averages of future state visitations, allowing us to train "infinite horizon" models with TD.
β Sergey Levine (@svlevine) October 29, 2020
w/ @michaeljanner & @IMordatch https://t.co/QYga0n3Vy4https://t.co/j4FKAQmrWK
->
Conservative safety critics use conservative Q-learning (CQL) to learn a safety critic, exploiting the lower bound property of CQL to provide guarantees on safety.
β Sergey Levine (@svlevine) October 29, 2020
w/ @mangahomanga, Aviral Kumar, @nick_rhinehart, @florian_shkurti, @animesh_garg https://t.co/6BjwHz9Zjx
-> pic.twitter.com/9dYxUjBSEn
I recorded an extended version of my offline RL talk, as practice for a live presentation earlier this week: https://t.co/DvnjzagxsN
β Sergey Levine (@svlevine) October 20, 2020
Covers the following:
AWAC: https://t.co/JYyprRInhR
MOPO: https://t.co/53VtOZKbcx
CQL: https://t.co/TYL7RTNbO7
D4RL: https://t.co/MvBXpSghkM