Skip to main navigation Skip to search Skip to main content

Multimodal Emotion Recognition Harnessing the Complementarity of Speech, Language, and Vision

  • Thomas Thebaud
  • , Anna Favaro
  • , Yaohan Guan
  • , Yuchen Yang
  • , Prabhav Singh
  • , Jesus Villalba
  • , Laureano Mono-Velazquez
  • , Najim Dehak

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the realm of audiovisual emotion recognition, a significant challenge lies in developing neural network architectures capable of effectively harnessing and integrating multimodal information. This study introduces an advanced methodology for the Empathic Virtual Agent Challenge (EVAC), utilizing state-of-the-art speech, language, and image models. Specifically, we leverage cutting-edge pre-trained models, including multilingual variants fine-tuned in French for each modality, and integrate them using late fusion techniques. Through extensive experimentation and validation, we demonstrate the efficacy of our approach in achieving competitive results on the challenge dataset. Our findings highlight that multimodal approaches outperform unimodal methods across Core Affect Presence and Intensity and Appraisal Dimensions tasks, underscoring the effectiveness of integrating diverse modalities. This underscores the importance of leveraging multiple sources of information to capture nuanced emotional states more accurately and robustly in real-world applications.

Original languageEnglish (US)
Title of host publicationICMI 2024 - Proceedings of the 26th International Conference on Multimodal Interaction
PublisherAssociation for Computing Machinery
Pages684-689
Number of pages6
ISBN (Electronic)9798400704628
DOIs
StatePublished - Nov 4 2024
Event26th International Conference on Multimodal Interaction, ICMI 2024 - San Jose, Costa Rica
Duration: Nov 4 2024Nov 8 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference26th International Conference on Multimodal Interaction, ICMI 2024
Country/TerritoryCosta Rica
CitySan Jose
Period11/4/2411/8/24

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint

Dive into the research topics of 'Multimodal Emotion Recognition Harnessing the Complementarity of Speech, Language, and Vision'. Together they form a unique fingerprint.

Cite this