Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

Xufeng Zhao,Mengdi Li,Cornelius Weber,Muhammad Burhan Hafez,Stefan Wermter,Xufeng Zhao,Mengdi Li,Cornelius Weber,Muhammad Burhan Hafez,Stefan Wermter

Programming robot behavior in a complex world faces challenges on multiple levels, from dextrous low-level skills to high-level planning and reasoning. Recent pre-trained Large Language Models (LLMs) have shown remarkable reasoning ability in few-shot robotic planning. However, it remains challenging to ground LLMs in multimodal sensory input and continuous action output, while enabling a robot to...