SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability
Marcella Cornia,Lorenzo Baraldi,Rita Cucchiara,Marcella Cornia,Lorenzo Baraldi,Rita Cucchiara
The ability to generate natural language explanations conditioned on the visual perception is a crucial step towards autonomous agents which can explain themselves and communicate with humans. While the research efforts in image and video captioning are giving promising results, this is often done at the expense of the computational requirements of the approaches, limiting their applicability to r...