Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning

Alper Kamil Bozkurt,Yu Wang,Michael M. Zavlanos,Miroslav Pajic,Alper Kamil Bozkurt,Yu Wang,Michael M. Zavlanos,Miroslav Pajic

We present a reinforcement learning (RL) frame-work to synthesize a control policy from a given linear temporal logic (LTL) specification in an unknown stochastic environment that can be modeled as a Markov Decision Process (MDP). Specifically, we learn a policy that maximizes the probability of satisfying the LTL formula without learning the transition probabilities. We introduce a novel rewardin...