TY - JOUR
T1 - Interpretable machine learning for analysing heterogeneous drivers of geographic events in space-time
AU - Masrur, Arif
AU - Yu, Manzhu
AU - Mitra, Prasenjit
AU - Peuquet, Donna
AU - Taylor, Alan
N1 - Publisher Copyright:
© 2021 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2022
Y1 - 2022
N2 - Machine learning (ML) interpretability has become increasingly crucial for identifying accurate and relevant structural relationships between spatial events and factors that explain them. Methodologically aspatial ML algorithms with an apparent high predictive power ignore non-stationary domain relationships in spatio-temporal data (e.g. dependence, heterogeneity), leading to incorrect interpretations and poor management decisions. This study addresses this critical methodological issue of ‘interpretability’ in ML-based modeling of structural relationships using the example of heterogeneous drivers of wildfires across the United States. Specifically, we present and evaluate a spatio-temporally interpretable random forest (iST-RF) that uses spatio-temporal sampling-based training and weighted prediction. Although the ultimate scientific objective is to derive interpretation in space-time, experiments show that iST-RF can improve predictive accuracy (76%) compared to the aspatial RF approach (70%) while enhancing interpretations of the trained model’s spatio-temporal relevance for its ensemble prediction. This novel approach can help balance prediction and interpretation with fidelity in a spatial data science life cycle. However, challenges exist for predictive modeling when the dataset is very small because in such cases locally optimized sub-model’s prediction performance can be suboptimal. With that caveat, our proposed approach is an ideal choice for identifying drivers of spatio-temporal events at country- or regional-scale studies.
AB - Machine learning (ML) interpretability has become increasingly crucial for identifying accurate and relevant structural relationships between spatial events and factors that explain them. Methodologically aspatial ML algorithms with an apparent high predictive power ignore non-stationary domain relationships in spatio-temporal data (e.g. dependence, heterogeneity), leading to incorrect interpretations and poor management decisions. This study addresses this critical methodological issue of ‘interpretability’ in ML-based modeling of structural relationships using the example of heterogeneous drivers of wildfires across the United States. Specifically, we present and evaluate a spatio-temporally interpretable random forest (iST-RF) that uses spatio-temporal sampling-based training and weighted prediction. Although the ultimate scientific objective is to derive interpretation in space-time, experiments show that iST-RF can improve predictive accuracy (76%) compared to the aspatial RF approach (70%) while enhancing interpretations of the trained model’s spatio-temporal relevance for its ensemble prediction. This novel approach can help balance prediction and interpretation with fidelity in a spatial data science life cycle. However, challenges exist for predictive modeling when the dataset is very small because in such cases locally optimized sub-model’s prediction performance can be suboptimal. With that caveat, our proposed approach is an ideal choice for identifying drivers of spatio-temporal events at country- or regional-scale studies.
UR - http://www.scopus.com/inward/record.url?scp=85113972092&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113972092&partnerID=8YFLogxK
U2 - 10.1080/13658816.2021.1965608
DO - 10.1080/13658816.2021.1965608
M3 - Article
AN - SCOPUS:85113972092
SN - 1365-8816
VL - 36
SP - 692
EP - 719
JO - International Journal of Geographical Information Science
JF - International Journal of Geographical Information Science
IS - 4
ER -