GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions

Ting Yao Hsu, Chieh Yang Huang, Ryan Rossi, Sungchul Kim, Clyde Lee Giles, Ting Hao Kenneth Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

There is growing interest in systems that generate captions for scientific figures. However, assessing these systems' output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluation depends on often low-quality author-written captions. This paper investigates using large language models (LLMs) as a cost-effective, reference-free method for evaluating figure captions. We first constructed SCICAP-EVAL, a human evaluation dataset that contains human judgments for 3,600 scientific figure captions, both original and machine-made, for 600 arXiv figures. We then prompted LLMs like GPT-4 and GPT-3 to score (1-6) each caption based on its potential to aid reader understanding, given relevant context such as figure-mentioning paragraphs. Results show that GPT-4, used as a zero-shot evaluator, outperformed all other models and even surpassed assessments made by Computer Science and Informatics undergraduates, achieving a Kendall correlation score of 0.401 with Ph.D. students' rankings.

Original languageEnglish (US)
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationEMNLP 2023
PublisherAssociation for Computational Linguistics (ACL)
Pages5464-5474
Number of pages11
ISBN (Electronic)9798891760615
StatePublished - 2023
Event2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Singapore, Singapore
Duration: Dec 6 2023Dec 10 2023

Publication series

NameFindings of the Association for Computational Linguistics: EMNLP 2023

Conference

Conference2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Country/TerritorySingapore
CitySingapore
Period12/6/2312/10/23

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions'. Together they form a unique fingerprint.

Cite this