TY - JOUR
T1 - Automatic Scoring of Metaphor Creativity with Large Language Models
AU - DiStefano, Paul V.
AU - Patterson, John D.
AU - Beaty, Roger E.
N1 - Publisher Copyright:
© 2024 The Author(s). Published with license by Taylor & Francis Group, LLC.
PY - 2024
Y1 - 2024
N2 - Metaphor is crucial in human cognition and creativity, facilitating abstract thinking, analogical reasoning, and idea generation. Typically, human raters manually score the originality of responses to creative thinking tasks–a laborious and error-prone process. Previous research sought to remedy these risks by scoring creativity tasks automatically using semantic distance and large language models (LLMs). Here, we extend research on automatic creativity scoring to metaphor generation–the ability to creatively describe episodes and concepts using nonliteral language. Metaphor is arguably more abstract and naturalistic than prior targets of automated creativity assessment. We collected 4,589 responses from 1,546 participants to various metaphor prompts and corresponding human creativity ratings. We fine-tuned two open-source LLMs (RoBERTa and GPT-2)–effectively “teaching” them to score metaphors like humans–before testing their ability to accurately assess the creativity of new metaphors. Results showed both models reliably predicted new human creativity ratings (RoBERTa r =.72, GPT-2 r =.70), significantly more strongly than semantic distance (r =.42). Importantly, the fine-tuned models generalized accurately to metaphor prompts they had not been trained on (RoBERTa r =.68, GPT-2 r =.63). We provide open access to the fine-tuned models, allowing researchers to assess metaphor creativity in a reproducible and timely manner.
AB - Metaphor is crucial in human cognition and creativity, facilitating abstract thinking, analogical reasoning, and idea generation. Typically, human raters manually score the originality of responses to creative thinking tasks–a laborious and error-prone process. Previous research sought to remedy these risks by scoring creativity tasks automatically using semantic distance and large language models (LLMs). Here, we extend research on automatic creativity scoring to metaphor generation–the ability to creatively describe episodes and concepts using nonliteral language. Metaphor is arguably more abstract and naturalistic than prior targets of automated creativity assessment. We collected 4,589 responses from 1,546 participants to various metaphor prompts and corresponding human creativity ratings. We fine-tuned two open-source LLMs (RoBERTa and GPT-2)–effectively “teaching” them to score metaphors like humans–before testing their ability to accurately assess the creativity of new metaphors. Results showed both models reliably predicted new human creativity ratings (RoBERTa r =.72, GPT-2 r =.70), significantly more strongly than semantic distance (r =.42). Importantly, the fine-tuned models generalized accurately to metaphor prompts they had not been trained on (RoBERTa r =.68, GPT-2 r =.63). We provide open access to the fine-tuned models, allowing researchers to assess metaphor creativity in a reproducible and timely manner.
UR - http://www.scopus.com/inward/record.url?scp=85189565459&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189565459&partnerID=8YFLogxK
U2 - 10.1080/10400419.2024.2326343
DO - 10.1080/10400419.2024.2326343
M3 - Article
AN - SCOPUS:85189565459
SN - 1040-0419
JO - Creativity Research Journal
JF - Creativity Research Journal
ER -