Stochastically Dominant Distributional Reinforcement Learning

John Martin,u00a0Michal Lyskawinski,u00a0Xiaohu Li,u00a0Brendan Englot

We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (RL) paradigm. Instead of selecting actions according to a single statistic, we propose a distributional method based on the second-order stochastic dominance (SSD) relation. This compares the inherent dispersion of random returns induced by actions, producing a comprehensive evaluation of the environmentu2019s uncertainty. The necessary conditions for SSD require estimators to predict accurate second moments. To accommodate this, we map the distributional RL problem to a Wasserstein gradient flow, treating the distributional Bellman residual as a potential energy functional. We propose a particle-based algorithm for which we prove optimality and convergence. Our experiments characterize the algorithmu2019s performance and demonstrate how uncertainty and performance are better balanced using an SSD policy than with other risk measures.