Efficient Planning in Large MDPs with Weak Linear Function Approximation

NIPS2020

Roshan Shariff,Csaba Szepesvari

Large-scale Markov decision processes (MDPs) require planning algorithms withruntime independent of the number of states of the MDP. We consider the planningproblem in MDPs using linear value function approximation with only weakrequirements: low approximation error for the optimal value function, and a smallset of u201ccoreu201d states whose features span those of other states. In particular, wemake no assumptions about the representability of policies or value functions ofnon-optimal policies. Our algorithm produces almost-optimal actions for any stateusing a generative oracle (simulator) for the MDP, while its computation time scalespolynomially with the number of features, core states, and actions and the effectivehorizon.

Efficient Planning in Large MDPs with Weak Linear Function Approximation

Roshan Shariff,Csaba Szepesvari

Discussion

Related Contents