Constructing a heterogeneous training dataset for emotion classification

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations


Emotion classification deals with identifying emotions expressed within a text. Social media is generating a vast amount of emotion rich data in the form of tweets, status updates, blog posts etc. Tweets are a good representative of emotions a person usually expresses publicly. By analyzing emotions in these tweets, one can get an idea of how a person feels about the subject they are referring to. Machine Learning (ML) techniques are widely used for analyzing emotions within the tweets. However, there are no balanced training datasets that can be used for training the ML classifiers. As a result, supervised classifiers demonstrate a poor performance with classifying emotions within texts particularly within the tweets. In addition to that, none of the available datasets are useful for training the classifiers to identify and classify emotions within the tweets. Therefore, in this paper we have proposed a novel approach for constructing a balanced heterogeneous training dataset for emotion classification of the tweets. Using the lexicon-based NRC classifier we have classified the textual instances in to four different emotions such as joyful, sad, angry and surprise. this as a training dataset we have trained six different machine learning models including the Multiclass Logistic Regression (MLR), Multinomial Naïve Bayes (MLB), Random Forest (RF), Support Vector Machine (SVM), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN). Our study reveals that this approach has the potentiality in boosting the performance of the supervised classifiers for emotion classification within the tweets.

Original languageEnglish (US)
Pages (from-to)73-79
Number of pages7
JournalProcedia Computer Science
StatePublished - 2020
Event2020 Complex Adaptive Systems Conference, CAS 2019 - Malvern, United States
Duration: Nov 13 2019Nov 15 2019

All Science Journal Classification (ASJC) codes

  • General Computer Science


Dive into the research topics of 'Constructing a heterogeneous training dataset for emotion classification'. Together they form a unique fingerprint.

Cite this