Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations

Angeliki Kamoutsi,u00a0Goran Banjac,u00a0John Lygeros

We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations.