Introduction: The cut-point for defining the age of young ischemic stroke (IS) is clinically and epidemiologically important, yet it is arbitrary and differs across studies. In this study, we leveraged electronic health records (EHRs) and data science techniques to estimate an optimal cut-point for defining the age of young IS. Methods: Patient-level EHRs were extracted from 13 hospitals in Pennsylvania, and used in two parallel approaches. The first approach included ICD9/10, from IS patients to group comorbidities, and computed similarity scores between every patient pair. We determined the optimal age of young IS by analyzing the trend of patient similarity with respect to their clinical profile for different ages of index IS. The second approach used the IS cohort and control (without IS), and built three sets of machine-learning models—generalized linear regression (GLM), random forest (RF), and XGBoost (XGB)—to classify patients for seventeen age groups. After extracting feature importance from the models, we determined the optimal age of young IS by analyzing the pattern of comorbidity with respect to the age of index IS. Both approaches were completed separately for male and female patients. Results: The stroke cohort contained 7555 ISs, and the control included 31,067 patients. In the first approach, the optimal age of young stroke was 53.7 and 51.0 years in female and male patients, respectively. In the second approach, we created 102 models, based on three algorithms, 17 age brackets, and two sexes. The optimal age was 53 (GLM), 52 (RF), and 54 (XGB) for female, and 52 (GLM and RF) and 53 (RF) for male patients. Different age and sex groups exhibited different comorbidity patterns. Discussion: Using a data-driven approach, we determined the age of young stroke to be 54 years for women and 52 years for men in our mainly rural population, in central Pennsylvania. Future validation studies should include more diverse populations.
All Science Journal Classification (ASJC) codes