Value learning from trajectory optimization and Sobolev descent: A step toward reinforcement learning with superlinear convergence properties
Amit Parag,Sébastien Kleff,Léo Saci,Nicolas Mansard,Olivier Stasse,Amit Parag,Sébastien Kleff,Léo Saci,Nicolas Mansard,Olivier Stasse
The recent successes in deep reinforcement learning largely rely on the capabilities of generating masses of data, which in turn implies the use of a simulator. In particular, current progress in multi body dynamic simulators are under-pinning the implementation of reinforcement learning for end-to-end control of robotic systems. Yet simulators are mostly considered as black boxes while we have th...