SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

Marcella Cornia,Lorenzo Baraldi,Rita Cucchiara,Marcella Cornia,Lorenzo Baraldi,Rita Cucchiara

The ability to generate natural language explanations conditioned on the visual perception is a crucial step towards autonomous agents which can explain themselves and communicate with humans. While the research efforts in image and video captioning are giving promising results, this is often done at the expense of the computational requirements of the approaches, limiting their applicability to r...