Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

Joel Z Leibo,u00a0Edgar A Dueu00f1ez-Guzman,u00a0Alexander Vezhnevets,u00a0John P Agapiou,u00a0Peter Sunehag,u00a0Raphael Koster,u00a0Jayd Matyas,u00a0Charlie Beattie,u00a0Igor Mordatch,u00a0Thore Graepel

Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agentu2019s behavior constitutes (part of) another agentu2019s environment. To demonstrate scalability, we have created over 80 unique test scenarios covering a broad range of research topics such as social dilemmas, reciprocity, resource sharing, and task partitioning. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.