Machine Learning Prediction of 1-Year Mortality and Recurrence after Ischemic Stroke Using Enriched EHR data

Project: Research project

Project Details


Machine Learning Prediction of 1-Year Mortality and Recurrence after Ischemic Stroke Using Enriched EHR data PROJECT SUMMARY / ABSTRACT Stroke is the leading cause of death and disability worldwide. It has been estimated that the 1-year risk of death and recurrence after a stroke is around 15% and 10%, respectively. Furthermore, a recent report from the Global Burden of Diseases (GBD) has shown a substantial increase in the annual number of strokes and secondary deaths, especially in low-income groups. Recurrent strokes, with an increasing trend, have a higher rate of death and disability. Thus, it is imperative to identify at-risk patients for recurrence and death for proper and timely evaluation, resource allocation, and targeted prevention. The investigators’ recently published review indicates that ─the multiple clinical scores developed for predicting stroke recurrence have only limited clinical utility. Similarly, current stroke prognostic models vary widely in quality; prediction models of post-stroke mortality are limited by their validation cohort size, breadth of clinical variables, and overall usefulness. The investigators have recently developed machine learning-based models of post-stroke all-cause mortality and recurrence using electronic health records (EHR) data. Despite promising results, our current pilot predictive models are limited to a single health system and may have inadequate generalizability due to implicit bias. This proposal seeks to expand and improve predictive models through the creative use of vetted EHR data for ischemic stroke patients from three large and different health systems (Penn State Health, Geisinger, and Johns Hopkins), caring for more than eight million people in rural and urban areas. This project will further explore the predictive value of social determinants of health (SDoH) when added to the clinical data. The investigators propose an integrative approach to design parameter-optimized and interpretable models, leveraging enriched EHR to identify the risk of ischemic stroke recurrence and all-cause mortality. Aim 1: Standardize EHR-based data across health care centers to identify clusters of ischemic stroke patients with common traits. Aim 2: Develop optimal interpretable ensemble models to predict 1-year mortality and recurrence after an ischemic stroke. Aim 3: Validate, prospectively and externally, ensemble models for 1-year mortality and stroke recurrence. This proposal includes model development with internal, external, and temporal validation and lays the foundation for an impact study to provide evidence of clinical utility. The investigators envision that this study will lead to EHR-based screening tools that can flag high-risk stroke patients for more targeted secondary prevention.
Effective start/end date8/11/237/31/24


  • National Institute of Neurological Disorders and Stroke: $730,806.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.