Do Language Model Agents Align With Humans in Rating Visualizations? An Empirical Study

  • Zekai Shao
  • , Yi Shan
  • , Yixuan He
  • , Yuxuan Yao
  • , Junhong Wang
  • , Xiaolong Zhang
  • , Yu Zhang
  • , Siming Chen

Research output: Contribution to journalArticlepeer-review

Abstract

Large language models (LLMs) show potential in understanding visualizations and may capture design knowledge. However, their ability to predict human feedback remains unclear. To explore this, we conduct three studies evaluating the alignment between LLM-based agents and human ratings in visualization tasks. The first study replicates a human-subject study, showing promising agent performance in human-like reasoning and rating, and informing further experiments. The second study simulates six prior studies using agents and finds that alignment correlates with expert-pre-experiment confidence. The third study tests enhancement techniques, such as input preprocessing and knowledge injection, revealing limitations in robustness and potential bias. These findings suggest that LLM-based agents can simulate human ratings when guided by high-confidence hypotheses from expert evaluators. We also demonstrate the usage scenario in rapid prototyping study designs and discuss future directions. We note that simulation may only serve as complements and cannot replace user studies.

Original languageEnglish (US)
Pages (from-to)14-28
Number of pages15
JournalIEEE Computer Graphics and Applications
Volume45
Issue number6
DOIs
StatePublished - 2025

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Do Language Model Agents Align With Humans in Rating Visualizations? An Empirical Study'. Together they form a unique fingerprint.

Cite this