TY - JOUR
T1 - Early stage machine learning–based prediction of US county vulnerability to the COVID-19 pandemic
T2 - Machine learning approach
AU - Mehta, Mihir
AU - Julaiti, Juxihong
AU - Griffin, Paul
AU - Kumara, Soundar
N1 - Publisher Copyright:
© Mihir Mehta, Juxihong Julaiti, Paul Griffin, Soundar Kumara.
PY - 2020/7
Y1 - 2020/7
N2 - Background: The rapid spread of COVID-19 means that government and health services providers have little time to plan and design effective response policies. It is therefore important to quickly provide accurate predictions of how vulnerable geographic regions such as counties are to the spread of this virus. Objective: The aim of this study is to develop county-level prediction around near future disease movement for COVID-19 occurrences using publicly available data. Methods: We estimated county-level COVID-19 occurrences for the period March 14 to 31, 2020, based on data fused from multiple publicly available sources inclusive of health statistics, demographics, and geographical features. We developed a three-stage model using XGBoost, a machine learning algorithm, to quantify the probability of COVID-19 occurrence and estimate the number of potential occurrences for unaffected counties. Finally, these results were combined to predict the county-level risk. This risk was then used as an estimated after-five-day-vulnerability of the county. Results: The model predictions showed a sensitivity over 71% and specificity over 94% for models built using data from March 14 to 31, 2020. We found that population, population density, percentage of people aged >70 years, and prevalence of comorbidities play an important role in predicting COVID-19 occurrences. We observed a positive association at the county level between urbanicity and vulnerability to COVID-19. Conclusions: The developed model can be used for identification of vulnerable counties and potential data discrepancies. Limited testing facilities and delayed results introduce significant variation in reported cases, which produces a bias in the model.
AB - Background: The rapid spread of COVID-19 means that government and health services providers have little time to plan and design effective response policies. It is therefore important to quickly provide accurate predictions of how vulnerable geographic regions such as counties are to the spread of this virus. Objective: The aim of this study is to develop county-level prediction around near future disease movement for COVID-19 occurrences using publicly available data. Methods: We estimated county-level COVID-19 occurrences for the period March 14 to 31, 2020, based on data fused from multiple publicly available sources inclusive of health statistics, demographics, and geographical features. We developed a three-stage model using XGBoost, a machine learning algorithm, to quantify the probability of COVID-19 occurrence and estimate the number of potential occurrences for unaffected counties. Finally, these results were combined to predict the county-level risk. This risk was then used as an estimated after-five-day-vulnerability of the county. Results: The model predictions showed a sensitivity over 71% and specificity over 94% for models built using data from March 14 to 31, 2020. We found that population, population density, percentage of people aged >70 years, and prevalence of comorbidities play an important role in predicting COVID-19 occurrences. We observed a positive association at the county level between urbanicity and vulnerability to COVID-19. Conclusions: The developed model can be used for identification of vulnerable counties and potential data discrepancies. Limited testing facilities and delayed results introduce significant variation in reported cases, which produces a bias in the model.
UR - http://www.scopus.com/inward/record.url?scp=85097902191&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097902191&partnerID=8YFLogxK
U2 - 10.2196/19446
DO - 10.2196/19446
M3 - Article
C2 - 32784193
AN - SCOPUS:85097902191
SN - 2369-2960
VL - 6
JO - JMIR Public Health and Surveillance
JF - JMIR Public Health and Surveillance
IS - 3
M1 - e19446
ER -