Big team science reveals promises and limitations of machine learning efforts to model physiological markers of affective experience

  • Nicholas A. Coles
  • , Bartosz Perz
  • , Maciej Behnke
  • , Johannes C. Eichstaedt
  • , Soo Hyung Kim
  • , Tu N. Vu
  • , Chirag Raman
  • , Julian Tejada
  • , Van Thong Huynh
  • , Guangyi Zhang
  • , Tanming Cui
  • , Sharanyak Podder
  • , Rushi Chavda
  • , Shubham Pandey
  • , Arpit Upadhyay
  • , Jorge I. Padilla-Buritica
  • , Carlos J. Barrera Causil
  • , Linying Ji
  • , Felix Dollack
  • , Kiyoshi Kiyokawa
  • Huakun Liu, Monica Perusquia-Hernandez, Hideaki Uchiyama, Xin Wei, Houwei Cao, Ziqing Yang, Alessia Iancarelli, Kieran McVeigh, Yiyu Wang, Isabel M. Berwian, Jamie C. Chiu, Dan Mircea Mirea, Erik C. Nook, Henna I. Vartiainen, Claire Whiting, Young Won Cho, Sy Miin Chow, Zachary F. Fisher, Yanling Li, Xiaoyue Xiong, Yuqi Shen, Enzo Tagliazucchi, Leandro A. Bugnon, Raydonal Ospina, Nicolas M. Bruno, Tomas A. D'Amelio, Federico Zamberlan, Luis R. Mercado Diaz, Javier O. Pinzon-Arenas, Hugo F. Posada-Quintero, Maneesh Bilalpur, Saurabh Hinduja, Fernando Marmolejo-Ramos, Shaun Canavan, Liza Jivnani, Stanisław Saganowski

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Researchers are increasingly using machine learning to study physiological markers of emotion. We evaluated the promises and limitations of this approach via a big team science competition. Twelve teams competed to predict self-reported affective experiences using a multi-modal set of peripheral nervous system measures. Models were trained and tested in multiple ways: with data divided by participants, targeted emotion, inductions, and time. In 100% of tests, teams outperformed baseline models that made random predictions. In 46% of tests, teams also outperformed baseline models that relied on the simple average of ratings from training datasets. More notably, results uncovered a methodological challenge: multiplicative constraints on generalizability. Inferences about the accuracy and theoretical implications of machine learning efforts depended not only on their architecture, but also how they were trained, tested, and evaluated. For example, some teams performed better when tested on observations from the same (vs. different) subjects seen during training. Such results could be interpreted as evidence against claims of universality. However, such conclusions would be premature because other teams exhibited the opposite pattern. Taken together, results illustrate how big team science can be leveraged to understand the promises and limitations of machine learning methods in affective science and beyond.

Original languageEnglish (US)
Article number241778
JournalRoyal Society Open Science
Volume12
Issue number6
DOIs
StatePublished - Jun 25 2025

All Science Journal Classification (ASJC) codes

  • General

Fingerprint

Dive into the research topics of 'Big team science reveals promises and limitations of machine learning efforts to model physiological markers of affective experience'. Together they form a unique fingerprint.

Cite this