TY - JOUR
T1 - Multilingual Semantic Distance
T2 - Automatic Verbal Creativity Assessment in Many Languages
AU - Patterson, John D.
AU - Merseal, Hannah M.
AU - Johnson, Dan R.
AU - Agnoli, Sergio
AU - Baas, Matthijs
AU - Baker, Brendan S.
AU - Barbot, Baptiste
AU - Benedek, Mathias
AU - Borhani, Khatereh
AU - Chen, Qunlin
AU - Christensen, Julia F.
AU - Corazza, Giovanni Emanuele
AU - Forthmann, Boris
AU - Karwowski, Maciej
AU - Kazemian, Nastaran
AU - Kreisberg-Nitzav, Ariel
AU - Kenett, Yoed N.
AU - Link, Allison
AU - Lubart, Todd
AU - Mercier, Maxence
AU - Miroshnik, Kirill
AU - Ovando-Tellez, Marcela
AU - Primi, Ricardo
AU - Puente-Díaz, Rogelio
AU - Said-Metwaly, Sameh
AU - Stevenson, Claire
AU - Vartanian, Meghedi
AU - Volle, Emannuelle
AU - van Hell, Janet G.
AU - Beaty, Roger E.
N1 - Publisher Copyright:
© 2023 American Psychological Association
PY - 2023
Y1 - 2023
N2 - Creativity research commonly involves recruiting human raters to judge the originality of responses to divergent thinking tasks, such as the alternate uses task (AUT). These manual scoring practices have benefited the field, but they also have limitations, including labor-intensiveness and subjectivity, which can adversely impact the reliability and validity of assessments. To address these challenges, researchers are increasingly employing automatic scoring approaches, such as distributional models of semantic distance. However, semantic distance has primarily been studied in English-speaking samples, with very little research in the many other languages of the world. In a multilab study (N= 6,522 participants), we aimed to validate semantic distance on the AUT in 12 languages: Arabic, Chinese, Dutch, English, Farsi, French, German, Hebrew, Italian, Polish, Russian, and Spanish. We gathered AUT responses and human creativity ratings (N= 107,672 responses), as well as criterion measures for validation (e.g., creative achievement).We compared two deep learning-based semantic models—multilingual bidirectional encoder representations from transformers and cross-lingual language model RoBERTa—to compute semantic distance and validate this automated metric with human ratings and criterion measures. We found that the top-performing model for each language correlated positively with human creativity ratings, with correlations ranging from medium to large across languages. Regarding criterion validity, semantic distance showed small-to-moderate effect sizes (comparable to human ratings) for openness, creative behavior/achievement, and creative self-concept. We provide open access to our multilingual dataset for future algorithmic development, along with Python code to compute semantic distance in 12 languages.
AB - Creativity research commonly involves recruiting human raters to judge the originality of responses to divergent thinking tasks, such as the alternate uses task (AUT). These manual scoring practices have benefited the field, but they also have limitations, including labor-intensiveness and subjectivity, which can adversely impact the reliability and validity of assessments. To address these challenges, researchers are increasingly employing automatic scoring approaches, such as distributional models of semantic distance. However, semantic distance has primarily been studied in English-speaking samples, with very little research in the many other languages of the world. In a multilab study (N= 6,522 participants), we aimed to validate semantic distance on the AUT in 12 languages: Arabic, Chinese, Dutch, English, Farsi, French, German, Hebrew, Italian, Polish, Russian, and Spanish. We gathered AUT responses and human creativity ratings (N= 107,672 responses), as well as criterion measures for validation (e.g., creative achievement).We compared two deep learning-based semantic models—multilingual bidirectional encoder representations from transformers and cross-lingual language model RoBERTa—to compute semantic distance and validate this automated metric with human ratings and criterion measures. We found that the top-performing model for each language correlated positively with human creativity ratings, with correlations ranging from medium to large across languages. Regarding criterion validity, semantic distance showed small-to-moderate effect sizes (comparable to human ratings) for openness, creative behavior/achievement, and creative self-concept. We provide open access to our multilingual dataset for future algorithmic development, along with Python code to compute semantic distance in 12 languages.
UR - http://www.scopus.com/inward/record.url?scp=85177178542&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85177178542&partnerID=8YFLogxK
U2 - 10.1037/aca0000618
DO - 10.1037/aca0000618
M3 - Article
AN - SCOPUS:85177178542
SN - 1931-3896
VL - 17
SP - 495
EP - 507
JO - Psychology of Aesthetics, Creativity, and the Arts
JF - Psychology of Aesthetics, Creativity, and the Arts
IS - 4
ER -