Development of an Algorithm for Estimating the Likelihood of Venous Thromboembolism in Primary Care Using Structured and Unstructured Electronic Health Record Data

  • Siona Prasad
  • , Patricia C. Dykes
  • , Richard Schreiber
  • , Shadi Hijjawi
  • , Khalid Nawab
  • , Alice Kim
  • , Stuart Lipsitz
  • , Ania Syrowatka
  • , Lipika Samal
  • , David W. Bates
  • , Veysel Karani Baris
  • , Tien Thai
  • , Michael Sainlaire
  • , Frank Y. Chang
  • , John Novoa-Laurentiev
  • , Gregory Piazza
  • , Wenyu Song

Research output: Contribution to journalArticlepeer-review

Abstract

Venous thromboembolism (VTE) is a major public health concern. It is often clinically difficult to diagnose and affects up to 900 000 individuals annually in the United States. Delayed or missed VTE diagnosis can impact treatment and increase morbidity and mortality. This retrospective study utilized structured and unstructured electronic health record (EHR) data from a large integrated care network in the northeastern US, focusing on 4678 adult patients presenting with at least one VTE-associated sign or symptom at primary care visits during 2019–2020. Feature selection incorporated expert-guided and data-driven approaches, resulting in a final set of demographic, clinical history, and sign/symptom risk factors. The primary analysis developed seven machine learning models to predict VTE incidence. Secondary analyses included the prediction of timely and delayed VTE diagnoses. All models showed predictive ability with area under the curve (AUC) of 0.83–0.88. The logistic regression model demonstrated robust performance in predicting incident VTE cases, achieving an AUC of 0.88 (95% CI: 0.86–0.90). Multiple risk factors were identified, including cancer history, smoking history, and spinal cord trauma. Variations in the top risk factors between timely and delayed prediction models highlighted how certain patients were more likely to have a delayed or missed diagnosis. This study highlights the potential for data-driven tools to facilitate timely, point-of-care VTE detection by leveraging structured and unstructured EHR data. The prediction model accurately estimated the likelihood of incident VTEs, especially in cases diagnosed late, showing potential to reduce costly diagnostic delays.

Original languageEnglish (US)
Pages (from-to)2238-2247
Number of pages10
JournalAmerican Journal of Hematology
Volume100
Issue number12
DOIs
StatePublished - Dec 2025

All Science Journal Classification (ASJC) codes

  • Hematology

Fingerprint

Dive into the research topics of 'Development of an Algorithm for Estimating the Likelihood of Venous Thromboembolism in Primary Care Using Structured and Unstructured Electronic Health Record Data'. Together they form a unique fingerprint.

Cite this