Skip to main navigation Skip to search Skip to main content

Multi-LLM Collaborative Caption Generation in Scientific Documents

  • Jaeyoung Kim
  • , Jongho Lee
  • , Hong Jun Choi
  • , Ting Yao Hsu
  • , Chieh Yang Huang
  • , Sungchul Kim
  • , Ryan Rossi
  • , Tong Yu
  • , Clyde Lee Giles
  • , Ting Hao ‘Kenneth’ Huang
  • , Sungchul Choi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scientific figure captioning is a complex task that requires generating contextually appropriate descriptions of visual content. However, existing methods often fall short by utilizing incomplete information, treating the task solely as either an image-to-text or text summarization problem. This limitation hinders the generation of high-quality captions that fully capture the necessary details. Moreover, existing data sourced from arXiv papers contain low-quality captions, posing significant challenges for training large language models (LLMs). In this paper, we introduce a framework called Multi-LLM Collaborative Figure Caption Generation (MLBCAP) to address these challenges by leveraging specialized LLMs for distinct sub-tasks. Our approach unfolds in three key modules: (Quality Assessment) We utilize multimodal LLMs to assess the quality of training data, enabling the filtration of low-quality captions. (Diverse Caption Generation) We then employ a strategy of fine-tuning/prompting multiple LLMs on the captioning task to generate candidate captions. (Judgment) Lastly, we prompt a prominent LLM to select the highest quality caption from the candidates, followed by refining any remaining inaccuracies. Human evaluations demonstrate that informative captions produced by our approach rank better than human-written captions, highlighting its effectiveness. Our code is available at https://github.com/teamreboott/MLBCAP

Original languageEnglish (US)
Title of host publicationAI for Research and Scalable, Efficient Systems - Second International Workshop, AI4Research 2025, and First International Workshop, SEAS 2025, Held in Conjunction with AAAI 2025, Proceedings
EditorsQingyun Wang, Wenpeng Yin, Abhishek Aich, Yumin Suh, Kuan-Chuan Peng
PublisherSpringer Science and Business Media Deutschland GmbH
Pages142-160
Number of pages19
ISBN (Print)9789819689118
DOIs
StatePublished - 2025
Event2nd AI4Research Workshop: Towards a Knowledge-Grounded Scientific Research Lifecycle, AI4Research 2025 and 1st Workshop on Scalable and Efficient Artificial Intelligence Systems, SEAS 2025, held in conjunction with the 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: Feb 25 2025Mar 4 2025

Publication series

NameCommunications in Computer and Information Science
Volume2533 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference2nd AI4Research Workshop: Towards a Knowledge-Grounded Scientific Research Lifecycle, AI4Research 2025 and 1st Workshop on Scalable and Efficient Artificial Intelligence Systems, SEAS 2025, held in conjunction with the 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
Country/TerritoryUnited States
CityPhiladelphia
Period2/25/253/4/25

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Mathematics

Fingerprint

Dive into the research topics of 'Multi-LLM Collaborative Caption Generation in Scientific Documents'. Together they form a unique fingerprint.

Cite this