Speech Emotion Recognition Using Deep CNNs Trained on Log-Frequency Spectrograms

Mainak Biswas, Mridu Sahu, Maroi Agrebi, Pawan Kumar Singh, Youakim Badr

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Scopus citations

Abstract

Speech serves as the most important means of communication between humans. Every phrase a person speaks has certain emotions intertwined with it. Therefore, a natural desire would be to build a system that understands the mood and feelings of the speaker. Speech emotion detection may have a lot of real-life applications ranging from bettering recommendation systems (which adapt to the emotion user is experiencing) to monitoring people with chronic depression and suicidal tendencies. In this chapter, we propose a model for the recognition of emotions from speech data using log-frequency spectrograms and a deep convolutional neural network (CNN). We supplement our data with the noise of varied loudness obtained from various contexts with the aim of making our model resilient to noise. The augmented data is used for the extraction of spectrograms. These spectrogram images are used to train the deep CNNs, proposed in this paper. The model is independent of linguistic features, speaker-dependent features, the gender of speakers, and the intensity of the expressed emotion. This has been guaranteed by using the RAVDESS dataset, where the same sentences were spoken by 24 speakers (12 male and 12 female) with different expressions (in two levels of intensity). The model obtained an accuracy of 98.13% on this dataset. The experimental results show that our proposed model is quite capable of classifying emotions from human speech. The source code of the proposed model can be accessed using the following link: https://github.com/mainak-biswas1999/Spoken_Emotion_classification.git.

Original languageEnglish (US)
Title of host publicationStudies in Big Data
PublisherSpringer Science and Business Media Deutschland GmbH
Pages83-108
Number of pages26
DOIs
StatePublished - 2023

Publication series

NameStudies in Big Data
Volume134
ISSN (Print)2197-6503
ISSN (Electronic)2197-6511

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Engineering (miscellaneous)
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Speech Emotion Recognition Using Deep CNNs Trained on Log-Frequency Spectrograms'. Together they form a unique fingerprint.

Cite this