Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

Thomas Lampe,Abbas Abdolmaleki,Sarah Bechtle,Sandy H. Huang,Jost Tobias Springenberg,Michael Bloesch,Oliver Groth,Roland Hafner,Tim Hertweck,Michael Neunert,Markus Wulfmeier,Jingwei Zhang,Francesco Nori,Nicolas Heess,Martin Riedmiller,Thomas Lampe,Abbas Abdolmaleki,Sarah Bechtle,Sandy H. Huang,Jost Tobias Springenberg,Michael Bloesch,Oliver Groth,Roland Hafner,Tim Hertweck,Michael Neunert,Markus Wulfmeier,Jingwei Zhang,Francesco Nori,Nicolas Heess,Martin Riedmiller

Reinforcement learning solely from an agent’s self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and...