Vision-Language Interpreter for Robot Task Planning

Keisuke Shirai,Cristian C. Beltran-Hernandez,Masashi Hamaya,Atsushi Hashimoto,Shohei Tanaka,Kento Kawaharazuka,Kazutoshi Tanaka,Yoshitaka Ushiku,Shinsuke Mori,Keisuke Shirai,Cristian C. Beltran-Hernandez,Masashi Hamaya,Atsushi Hashimoto,Shohei Tanaka,Kento Kawaharazuka,Kazutoshi Tanaka,Yoshitaka Ushiku,Shinsuke Mori

Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By gener...