Understanding Self-Predictive Learning for Reinforcement Learning
Yunhao Tang,u00a0Zhaohan Daniel Guo,u00a0Pierre Harvey Richemond,u00a0Bernardo Avila Pires,u00a0Yash Chandak,u00a0Remi Munos,u00a0Mark Rowland,u00a0Mohammad Gheshlaghi Azar,u00a0Charline Le Lan,u00a0Clare Lyle,u00a0Andru00e1s Gyu00f6rgy,u00a0Shantanu Thakoor,u00a0Will Dabney,u00a0Bilal Piot,u00a0Daniele Calandriello,u00a0Michal Valko
We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.


