TY - JOUR
T1 - An Extraction Tool for Venous Thromboembolism Symptom Identification in Primary Care Notes to Facilitate Electronic Clinical Quality Measure Reporting
T2 - Algorithm Development and Validation Study
AU - Novoa-Laurentiev, John
AU - Bowen, Mica
AU - Pullman, Avery
AU - Song, Wenyu
AU - Syrowatka, Ania
AU - Chen, Jin
AU - Sainlaire, Michael
AU - Chang, Frank
AU - Gray, Krissy
AU - Panta, Purushottam
AU - Liu, Luwei
AU - Nawab, Khalid
AU - Hijjawi, Shadi
AU - Schreiber, Richard
AU - Zhou, Li
AU - Dykes, Patricia C.
N1 - Publisher Copyright:
© John Novoa-Laurentiev, Mica Bowen, Avery Pullman, Wenyu Song, Ania Syrowatka, Jin Chen, Michael Sainlaire, Frank Chang, Krissy Gray, Purushottam Panta, Luwei Liu, Khalid Nawab, Shadi Hijjawi, Richard Schreiber, Li Zhou, Patricia C Dykes.
PY - 2025
Y1 - 2025
N2 - Background: Diagnosis of venous thromboembolism (VTE) is often delayed, and facilitating earlier diagnosis may improve associated morbidity and mortality. Clinical notes contain information not found elsewhere in the medical record that could facilitate timely VTE diagnosis and accurate quality measurement. However, extracting relevant information from unstructured clinical notes is complex. Today, there are relatively few electronic clinical quality measures (eCQMs) in our national payment program and none that use natural language processing (NLP) techniques for data extraction. NLP holds great promise for making quality measurement more accurate and more efficient. Given the potential of NLP-based applications to facilitate more accurate VTE detection, primary care is one clinical setting in urgent need of this type of tool. Objective: This study aimed to develop a tool that extracts VTE symptoms from clinical notes for use within an eCQM to quantify the rate of delayed diagnosis of VTE in primary care settings. Methods: We iteratively developed an NLP-based data extraction tool, venous thromboembolism symptom extractor (VTExt), on an internal dataset using a rule-based approach to extract VTE symptoms from primary care clinical note text. The VTE symptoms lexicon was derived and optimized with physician guidance and externally validated using datasets from 2 independent health care organizations. We performed 26 rounds of performance evaluation of notes sampled from the case cohort (17,585 patient progress note sentences from 279 patient notes), and 5 rounds of evaluation of the control cohort (2838 patient progress note sentences from 50 patient notes). VTExt’s performance was evaluated using evaluation metrics, including area under the curve, positive predictive value, negative predictive value, sensitivity, and specificity. Results: VTExt achieved near-perfect performance in extracting VTE symptoms from primary care notes sampled from records of patients diagnosed with or without VTE. In external validation, VTExt achieved promising performance in 2 additional geographically distant organizations using different electronic health record systems. When compared against a deep learning model and 4 machine learning models, VTExt exhibited similar or even improved performance across all metrics. Conclusions: This study demonstrates a data-driven NLP-based approach to clinical note information extraction that can be generalized to different electronic health record systems across different institutions. Due to the robust performance of this tool, VTExt is the first NLP application to be used in a nationally endorsed eCQM.
AB - Background: Diagnosis of venous thromboembolism (VTE) is often delayed, and facilitating earlier diagnosis may improve associated morbidity and mortality. Clinical notes contain information not found elsewhere in the medical record that could facilitate timely VTE diagnosis and accurate quality measurement. However, extracting relevant information from unstructured clinical notes is complex. Today, there are relatively few electronic clinical quality measures (eCQMs) in our national payment program and none that use natural language processing (NLP) techniques for data extraction. NLP holds great promise for making quality measurement more accurate and more efficient. Given the potential of NLP-based applications to facilitate more accurate VTE detection, primary care is one clinical setting in urgent need of this type of tool. Objective: This study aimed to develop a tool that extracts VTE symptoms from clinical notes for use within an eCQM to quantify the rate of delayed diagnosis of VTE in primary care settings. Methods: We iteratively developed an NLP-based data extraction tool, venous thromboembolism symptom extractor (VTExt), on an internal dataset using a rule-based approach to extract VTE symptoms from primary care clinical note text. The VTE symptoms lexicon was derived and optimized with physician guidance and externally validated using datasets from 2 independent health care organizations. We performed 26 rounds of performance evaluation of notes sampled from the case cohort (17,585 patient progress note sentences from 279 patient notes), and 5 rounds of evaluation of the control cohort (2838 patient progress note sentences from 50 patient notes). VTExt’s performance was evaluated using evaluation metrics, including area under the curve, positive predictive value, negative predictive value, sensitivity, and specificity. Results: VTExt achieved near-perfect performance in extracting VTE symptoms from primary care notes sampled from records of patients diagnosed with or without VTE. In external validation, VTExt achieved promising performance in 2 additional geographically distant organizations using different electronic health record systems. When compared against a deep learning model and 4 machine learning models, VTExt exhibited similar or even improved performance across all metrics. Conclusions: This study demonstrates a data-driven NLP-based approach to clinical note information extraction that can be generalized to different electronic health record systems across different institutions. Due to the robust performance of this tool, VTExt is the first NLP application to be used in a nationally endorsed eCQM.
UR - https://www.scopus.com/pages/publications/105015330090
UR - https://www.scopus.com/pages/publications/105015330090#tab=citedBy
U2 - 10.2196/63720
DO - 10.2196/63720
M3 - Article
AN - SCOPUS:105015330090
SN - 2291-9694
VL - 13
JO - JMIR Medical Informatics
JF - JMIR Medical Informatics
M1 - e63720
ER -