Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

Quentin Le Lidec,Wilson Jallet,Ivan Laptev,Cordelia Schmid,Justin Carpentier,Quentin Le Lidec,Wilson Jallet,Ivan Laptev,Cordelia Schmid,Justin Carpentier

Reinforcement learning (RL) and trajectory opti-mization (TO) present strong complementary advantages. On one hand, RL approaches are able to learn global control policies directly from data, but generally require large sample sizes to properly converge towards feasible policies. On the other hand, TO methods are able to exploit gradient-based information extracted from simulators to quickly conve...