TY - JOUR
T1 - Hope Speech Detection Using Social Media Discourse (Posi-Vox-2024)
T2 - A Transfer Learning Approach
AU - Ahmad, Muhammad
AU - Usman, Sardar
AU - Farid, Humaira
AU - Ameer, Iqra
AU - Muzammil, Muhammad
AU - Hamza, Ameer
AU - Sidorov, Grigori
AU - Batyrshin, Ildar
N1 - Publisher Copyright:
© 2024 National Research University, Higher School of Econoimics. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Background: The notion of hope is characterized as an optimistic expectation or anticipation of favorable outcomes. In the age of extensive social media usage, research has primarily focused on monolingual techniques, and the Urdu and Arabic languages have not been addressed. Purpose: This study addresses joint multilingual hope speech detection in the Urdu, English, and Arabic languages using a transfer learning paradigm. We developed a new multilingual dataset named Posi-Vox-2024 and employed a joint multilingual technique to design a universal classifier for multilingual dataset. We explored the fine-tuned BERT model, which demonstrated a remarkable performance in capturing semantic and contextual information. Method: The framework includes (1) preprocessing, (2) data representation using BERT, (3) fine-tuning, and (4) classification of hope speech into binary (‘hope’ and ‘not hope’) and multiclass (realistic, unrealistic, and generalized hope) categories. Results: Our proposed model (BERT) demonstrated benchmark performance to our dataset, achieving 0.78 accuracy in binary classification and 0.66 in multi-class classification, with a 0.04 and 0.08 performance improvement over the baselines (Logistic Regression, in binary class 0.75 and multi class 0.61), respectively. Conclusion: Our findings will be applied to improve automated systems for detecting and promoting supportive content in English, Arabic and Urdu on social media platforms, fostering positive online discourse. This work sets new benchmarks for multilingual hope speech detection, advancing existing knowledge and enabling future research in underrepresented languages.
AB - Background: The notion of hope is characterized as an optimistic expectation or anticipation of favorable outcomes. In the age of extensive social media usage, research has primarily focused on monolingual techniques, and the Urdu and Arabic languages have not been addressed. Purpose: This study addresses joint multilingual hope speech detection in the Urdu, English, and Arabic languages using a transfer learning paradigm. We developed a new multilingual dataset named Posi-Vox-2024 and employed a joint multilingual technique to design a universal classifier for multilingual dataset. We explored the fine-tuned BERT model, which demonstrated a remarkable performance in capturing semantic and contextual information. Method: The framework includes (1) preprocessing, (2) data representation using BERT, (3) fine-tuning, and (4) classification of hope speech into binary (‘hope’ and ‘not hope’) and multiclass (realistic, unrealistic, and generalized hope) categories. Results: Our proposed model (BERT) demonstrated benchmark performance to our dataset, achieving 0.78 accuracy in binary classification and 0.66 in multi-class classification, with a 0.04 and 0.08 performance improvement over the baselines (Logistic Regression, in binary class 0.75 and multi class 0.61), respectively. Conclusion: Our findings will be applied to improve automated systems for detecting and promoting supportive content in English, Arabic and Urdu on social media platforms, fostering positive online discourse. This work sets new benchmarks for multilingual hope speech detection, advancing existing knowledge and enabling future research in underrepresented languages.
UR - https://www.scopus.com/pages/publications/85215077832
UR - https://www.scopus.com/inward/citedby.url?scp=85215077832&partnerID=8YFLogxK
U2 - 10.17323/jle.2024.22443
DO - 10.17323/jle.2024.22443
M3 - Article
AN - SCOPUS:85215077832
SN - 2411-7390
VL - 10
SP - 31
EP - 43
JO - Journal of Language and Education
JF - Journal of Language and Education
IS - 4
ER -