Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines

Jordan J. Bird,Diego R. Faria,Cristiano Premebida,Anikó Ekárt,George Vogiatzis,Jordan J. Bird,Diego R. Faria,Cristiano Premebida,Anikó Ekárt,George Vogiatzis

The novelty of this study consists in a multi-modality approach to scene classification, where image and audio complement each other in a process of deep late fusion. The approach is demonstrated on a difficult classification problem, consisting of two synchronised and balanced datasets of 16,000 data objects, encompassing 4.4 hours of video of 8 environments with varying degrees of similarity. We...