MAMBPO: Sample-efficient multi-robot reinforcement learning using learned world models
Daniël Willemsen,Mario Coppola,Guido C.H.E. de Croon,Daniël Willemsen,Mario Coppola,Guido C.H.E. de Croon
Multi-robot systems can benefit from reinforcement learning (RL) algorithms that learn behaviours in a small number of trials, a property known as sample efficiency. This research thus investigates the use of learned world models to improve sample efficiency. We present a novel multi-agent model-based RL algorithm: Multi-Agent Model-Based Policy Optimization (MAMBPO), utilizing the Centralized Lea...