Q-learning with Long-term Action-space Shaping to Model Complex Behavior for Autonomous Lane Changes

Gabriel Kalweit,Maria Huegle,Moritz Werling,Joschka Boedecker,Gabriel Kalweit,Maria Huegle,Moritz Werling,Joschka Boedecker

In autonomous driving applications, reinforcement learning agents often have to perform complex behavior, which can translate into optimizing multiple objectives while following certain rules. Encoding traffic rules and desires such as safety and comfort via classical methods based on reward shaping (i.e. a weighted combination of different objectives in the reward signal) or Lagrangian methods (i...