Efficient Planning in Large MDPs with Weak Linear Function Approximation
Roshan Shariff,Csaba Szepesvari
Large-scale Markov decision processes (MDPs) require planning algorithms withruntime independent of the number of states of the MDP. We consider the planningproblem in MDPs using linear value function approximation with only weakrequirements: low approximation error for the optimal value function, and a smallset of u201ccoreu201d states whose features span those of other states. In particular, wemake no assumptions about the representability of policies or value functions ofnon-optimal policies. Our algorithm produces almost-optimal actions for any stateusing a generative oracle (simulator) for the MDP, while its computation time scalespolynomially with the number of features, core states, and actions and the effectivehorizon.


