Extremum-Seeking Action Selection for Accelerating Policy Optimization

Ya-Chien Chang,Sicun Gao,Ya-Chien Chang,Sicun Gao

Reinforcement learning for control over continuous spaces typically uses high-entropy stochastic policies, such as Gaussian distributions, for local exploration and estimating policy gradient to optimize performance. Many robotic control problems deal with complex unstable dynamics, where applying actions that are off the feasible control manifolds can quickly lead to undesirable divergence. In su...