Eyes on the Text: Assessing Readability of Artificial Intelligence and Ophthalmologist Responses to Patient Surgery Queries

  • Sai S. Kurapati
  • , Derek J. Barnett
  • , Antonio Yaghy
  • , Cameron J. Sabet
  • , David N. Younessi
  • , Dang Nguyen
  • , John C. Lin
  • , Ingrid U. Scott

Research output: Contribution to journalArticlepeer-review

Abstract

Introduction: Generative artificial intelligence (AI) technologies like GPT-4 can instantaneously provide health information to patients; however, the readability of these outputs compared to ophthalmologist-written responses is unknown. This study aimed to evaluate the readability of GPT-4- generated and ophthalmologist-written responses to patient queries about ophthalmic surgery. Methods: This retrospective cross-sectional study used 200 randomly selected patient questions about ophthalmic surgery extracted from the American Academy of Ophthalmology's EyeSmart platform. The questions were inputted into GPT-4, and the generated responses were recorded. Ophthalmologistwritten replies to the same questions were compiled for comparison. Readability of GPT-4 and ophthalmologist responses was assessed using six validated metrics: Flesch Kincaid Reading Ease (FK-RE), Flesch Kincaid Grade Level (FK-GL), Gunning Fog Score (GFS), SMOG Index (SI), Coleman Liau Index (CLI), and Automated Readability Index (ARI). Descriptive statistics, one-way ANOVA, Shapiro-Wilk, and Levene's tests (α = 0.05) were used to compare readability between the two groups. Results: GPT-4 used a higher percentage of complex words (24.42%) compared to ophthalmologists (17.76%), although mean (standard deviation) word count per sentence was similar (18.43 [2.95] and 18.01 [6.09]). Across all metrics (FK-RE; FK-GL; GFS; SI; CLI; and ARI), GPT-4 responses were at a higher grade level (34.39 [8.51]; 13.19 [2.63]; 16.37 [2.04]; 12.18 [1.43]; 15.72 [1.40]; 12.99 [1.86]) than ophthalmologists' responses (50.61 [15.53]; 10.71 [2.99]; 14.13 [3.55]; 10.07 [2.46]; 12.64 [2.93]; 10.40 [3.61]), with both sources necessitating a 12th-grade education for comprehension. ANOVA tests showed significance (p < 0.05) for all comparisons except word count (p = 0.438). Conclusion: The National Institutes of Health advises health information to be written at a 6th- to 7th-grade level. Both GPT-4- and ophthalmologist-written answers exceeded this recommendation, with GPT-4 showing a greater gap. Information accessibility is vital when designing patient resources, particularly with the rise of AI as an educational tool.

Original languageEnglish (US)
Pages (from-to)149-159
Number of pages11
JournalOphthalmologica
Volume248
Issue number3
DOIs
StatePublished - Jul 1 2025

All Science Journal Classification (ASJC) codes

  • Ophthalmology
  • Sensory Systems

Fingerprint

Dive into the research topics of 'Eyes on the Text: Assessing Readability of Artificial Intelligence and Ophthalmologist Responses to Patient Surgery Queries'. Together they form a unique fingerprint.

Cite this