The deep learning moment of deep RL: https://t.co/25qB39K3HL
— Ilya Sutskever (@ilyasut) June 25, 2018
The deep learning moment of deep RL: https://t.co/25qB39K3HL
— Ilya Sutskever (@ilyasut) June 25, 2018
OpenAI demonstrates remarkable progress in a limited version of 5v5 Dota using two concepts that we didn't think can learn long time-scale strategies: selfplay, LSTM. Carefully designed reward functions are notable -- intermediate, global, team-spirit.https://t.co/GBTw1e7ERR
— Soumith Chintala (@soumithchintala) June 25, 2018
Great work by @OpenAI team. More evidence that scaling up simple RL methods (rather than designing complicated algorithms) enables solving increasingly complex problems. https://t.co/1wpazo69hU
— Arthur Juliani (@awjuliani) June 25, 2018
Some megascale RL results from @OpenAI:
— Jack Clark (@jackclarkSF) June 25, 2018
We've scaled existing methods to train AIs with sufficient teamwork skills to solve hard problems within Dota 2
- Scaled-up PPO+LSTM
~120,000 CPUs + 256 GPUs
- Self-play
- Hyperparameter called "Team Spirit" to teach AIs to collaborate https://t.co/lcSGWw0yr5
Amazing what a single-layer 1024-unit LSTM can be trained to do with a bit of engineering! OpenAI Five Model Architecture: pic.twitter.com/mRbD02KpNc
— hardmaru (@hardmaru) June 25, 2018
I want to wax poetic about the models (LSTM+PPO pushed far beyond what people likely thought possible, mirroring @Smerity et al + @GaborMelis et al in LSTM language modeling), the game (DotA as a complex test-bed), or the stupendous compute (180 years of gaming per day), but ... pic.twitter.com/cgUJ4w39G0
— Smerity (@Smerity) June 25, 2018