Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control
Quentin Le Lidec,Wilson Jallet,Ivan Laptev,Cordelia Schmid,Justin Carpentier,Quentin Le Lidec,Wilson Jallet,Ivan Laptev,Cordelia Schmid,Justin Carpentier
Reinforcement learning (RL) and trajectory opti-mization (TO) present strong complementary advantages. On one hand, RL approaches are able to learn global control policies directly from data, but generally require large sample sizes to properly converge towards feasible policies. On the other hand, TO methods are able to exploit gradient-based information extracted from simulators to quickly conve...


