Multi-User Reinforcement Learning with Low Rank Rewards
Dheeraj Mysore Nagaraj,u00a0Suhas S Kowshik,u00a0Naman Agarwal,u00a0Praneeth Netrapalli,u00a0Prateek Jain
We consider collaborative multi-user reinforcement learning, where multiple users have the same state-action space and transition probabilities but different rewards. Under the assumption that the reward matrix of the $N$ users has a low-rank structure u2013 a standard and practically successful assumption in the collaborative filtering setting u2013 we design algorithms with significantly lower sample complexity compared to the ones that learn the MDP individually for each user. Our main contribution is an algorithm which explores rewards collaboratively with $N$ user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs. When $N$ is large and the rank is constant, the sample complexity per MDP depends logarithmically over the size of the state-space, which represents an exponential reduction (in the state-space size) when compared to the standard u201cnon-collaborativeu201d algorithms. Our main technical contribution is a method to construct policies which obtain data such that low rank matrix completion is possible (without a generative model). This goes beyond the regular RL framework and is closely related to mean field limits of multi-agent RL.


