Steady State Analysis of Episodic Reinforcement Learning

Huang Bojun

In this paper we proved that unique steady-state distributions pervasively exist in the learning environment of episodic learning tasks, and that the marginal distributions of the system state indeed approach to the steady state in essentially all episodic tasks. This observation supports an interestingly reversed mindset against conventional wisdom: While steady states are traditionally presumed to exist in continual learning and considered less relevant in episodic learning, it turns out they are guaranteed to exist for the latter under any behavior policy. We further developed interesting connections for important concepts that have been separately treated in episodic and continual RL. At the practical side, the existence of unique and approachable steady state implies a general, reliable, and efficient way to collect data in episodic RL algorithms. We applied this method to policy gradient algorithms, based on a new steady-state policy gradient theorem. We also proposed and experimentally evaluated a perturbation method to enforce faster convergence to steady state in real-world episodic RL tasks.