TY - JOUR
T1 - Development of an Algorithm for Estimating the Likelihood of Venous Thromboembolism in Primary Care Using Structured and Unstructured Electronic Health Record Data
AU - Prasad, Siona
AU - Dykes, Patricia C.
AU - Schreiber, Richard
AU - Hijjawi, Shadi
AU - Nawab, Khalid
AU - Kim, Alice
AU - Lipsitz, Stuart
AU - Syrowatka, Ania
AU - Samal, Lipika
AU - Bates, David W.
AU - Baris, Veysel Karani
AU - Thai, Tien
AU - Sainlaire, Michael
AU - Chang, Frank Y.
AU - Novoa-Laurentiev, John
AU - Piazza, Gregory
AU - Song, Wenyu
N1 - Publisher Copyright:
© 2025 Wiley Periodicals LLC.
PY - 2025/12
Y1 - 2025/12
N2 - Venous thromboembolism (VTE) is a major public health concern. It is often clinically difficult to diagnose and affects up to 900 000 individuals annually in the United States. Delayed or missed VTE diagnosis can impact treatment and increase morbidity and mortality. This retrospective study utilized structured and unstructured electronic health record (EHR) data from a large integrated care network in the northeastern US, focusing on 4678 adult patients presenting with at least one VTE-associated sign or symptom at primary care visits during 2019–2020. Feature selection incorporated expert-guided and data-driven approaches, resulting in a final set of demographic, clinical history, and sign/symptom risk factors. The primary analysis developed seven machine learning models to predict VTE incidence. Secondary analyses included the prediction of timely and delayed VTE diagnoses. All models showed predictive ability with area under the curve (AUC) of 0.83–0.88. The logistic regression model demonstrated robust performance in predicting incident VTE cases, achieving an AUC of 0.88 (95% CI: 0.86–0.90). Multiple risk factors were identified, including cancer history, smoking history, and spinal cord trauma. Variations in the top risk factors between timely and delayed prediction models highlighted how certain patients were more likely to have a delayed or missed diagnosis. This study highlights the potential for data-driven tools to facilitate timely, point-of-care VTE detection by leveraging structured and unstructured EHR data. The prediction model accurately estimated the likelihood of incident VTEs, especially in cases diagnosed late, showing potential to reduce costly diagnostic delays.
AB - Venous thromboembolism (VTE) is a major public health concern. It is often clinically difficult to diagnose and affects up to 900 000 individuals annually in the United States. Delayed or missed VTE diagnosis can impact treatment and increase morbidity and mortality. This retrospective study utilized structured and unstructured electronic health record (EHR) data from a large integrated care network in the northeastern US, focusing on 4678 adult patients presenting with at least one VTE-associated sign or symptom at primary care visits during 2019–2020. Feature selection incorporated expert-guided and data-driven approaches, resulting in a final set of demographic, clinical history, and sign/symptom risk factors. The primary analysis developed seven machine learning models to predict VTE incidence. Secondary analyses included the prediction of timely and delayed VTE diagnoses. All models showed predictive ability with area under the curve (AUC) of 0.83–0.88. The logistic regression model demonstrated robust performance in predicting incident VTE cases, achieving an AUC of 0.88 (95% CI: 0.86–0.90). Multiple risk factors were identified, including cancer history, smoking history, and spinal cord trauma. Variations in the top risk factors between timely and delayed prediction models highlighted how certain patients were more likely to have a delayed or missed diagnosis. This study highlights the potential for data-driven tools to facilitate timely, point-of-care VTE detection by leveraging structured and unstructured EHR data. The prediction model accurately estimated the likelihood of incident VTEs, especially in cases diagnosed late, showing potential to reduce costly diagnostic delays.
UR - https://www.scopus.com/pages/publications/105018233426
UR - https://www.scopus.com/pages/publications/105018233426#tab=citedBy
U2 - 10.1002/ajh.70096
DO - 10.1002/ajh.70096
M3 - Article
C2 - 41036578
AN - SCOPUS:105018233426
SN - 0361-8609
VL - 100
SP - 2238
EP - 2247
JO - American Journal of Hematology
JF - American Journal of Hematology
IS - 12
ER -