TY - JOUR
T1 - Automated Machine Learning to Evaluate the Information Content of Tropospheric Trace Gas Columns for Fine Particle Estimates Over India
T2 - A Modeling Testbed
AU - Zheng, Zhonghua
AU - Fiore, Arlene M.
AU - Westervelt, Daniel M.
AU - Milly, George P.
AU - Goldsmith, Jeff
AU - Karambelas, Alexandra
AU - Curci, Gabriele
AU - Randles, Cynthia A.
AU - Paiva, Antonio R.
AU - Wang, Chi
AU - Wu, Qingyun
AU - Dey, Sagnik
N1 - Funding Information:
We acknowledge ExxonMobil Technology and Engineering Company and Columbia Data Science Institute Seed Funds for supporting this work. We are grateful for helpful discussions with Dr. Ruth S. DeFries and Dr. Marianthi‐Anna Kioumourtzoglou. We would like to acknowledge high‐performance computing support from Cheyenne ( https://doi.org/10.5065/D6RX99HX ) provided by NCAR’s Computational and Information Systems Laboratory, sponsored by the National Science Foundation. This material is based upon work supported by the National Center for Atmospheric Research, which is a major facility sponsored by the National Science Foundation under Cooperative Agreement No. 1755088. ZZ acknowledges support from NCAR Advanced Study Program Postdoctoral Fellowship. SD acknowledges support from IIT Delhi for Chair Professor Fellowship. We appreciate the careful reading of our manuscript and the many insightful comments and suggestions from three anonymous reviewers.
Publisher Copyright:
© 2023 ExxonMobil Technology and Engineering Company (EMTEC) and The Authors.
PY - 2023/3
Y1 - 2023/3
N2 - India is largely devoid of high-quality and reliable on-the-ground measurements of fine particulate matter (PM2.5). Ground-level PM2.5 concentrations are estimated from publicly available satellite Aerosol Optical Depth (AOD) products combined with other information. Prior research has largely overlooked the possibility of gaining additional accuracy and insights into the sources of PM using satellite retrievals of tropospheric trace gas columns. We evaluate the information content of tropospheric trace gas columns for PM2.5 estimates over India within a modeling testbed using an Automated Machine Learning (AutoML) approach, which selects from a menu of different machine learning tools based on the data set. We then quantify the relative information content of tropospheric trace gas columns, AOD, meteorological fields, and emissions for estimating PM2.5 over four Indian sub-regions on daily and monthly time scales. Our findings suggest that, regardless of the specific machine learning model assumptions, incorporating trace gas modeled columns improves PM2.5 estimates. We use the ranking scores produced from the AutoML algorithm and Spearman’s rank correlation to infer or link the possible relative importance of primary versus secondary sources of PM2.5 as a first step toward estimating particle composition. Our comparison of AutoML-derived models to selected baseline machine learning models demonstrates that AutoML is at least as good as user-chosen models. The idealized pseudo-observations (chemical-transport model simulations) used in this work lay the groundwork for applying satellite retrievals of tropospheric trace gases to estimate fine particle concentrations in India and serve to illustrate the promise of AutoML applications in atmospheric and environmental research.
AB - India is largely devoid of high-quality and reliable on-the-ground measurements of fine particulate matter (PM2.5). Ground-level PM2.5 concentrations are estimated from publicly available satellite Aerosol Optical Depth (AOD) products combined with other information. Prior research has largely overlooked the possibility of gaining additional accuracy and insights into the sources of PM using satellite retrievals of tropospheric trace gas columns. We evaluate the information content of tropospheric trace gas columns for PM2.5 estimates over India within a modeling testbed using an Automated Machine Learning (AutoML) approach, which selects from a menu of different machine learning tools based on the data set. We then quantify the relative information content of tropospheric trace gas columns, AOD, meteorological fields, and emissions for estimating PM2.5 over four Indian sub-regions on daily and monthly time scales. Our findings suggest that, regardless of the specific machine learning model assumptions, incorporating trace gas modeled columns improves PM2.5 estimates. We use the ranking scores produced from the AutoML algorithm and Spearman’s rank correlation to infer or link the possible relative importance of primary versus secondary sources of PM2.5 as a first step toward estimating particle composition. Our comparison of AutoML-derived models to selected baseline machine learning models demonstrates that AutoML is at least as good as user-chosen models. The idealized pseudo-observations (chemical-transport model simulations) used in this work lay the groundwork for applying satellite retrievals of tropospheric trace gases to estimate fine particle concentrations in India and serve to illustrate the promise of AutoML applications in atmospheric and environmental research.
UR - http://www.scopus.com/inward/record.url?scp=85151046942&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85151046942&partnerID=8YFLogxK
U2 - 10.1029/2022MS003099
DO - 10.1029/2022MS003099
M3 - Article
AN - SCOPUS:85151046942
SN - 1942-2466
VL - 15
JO - Journal of Advances in Modeling Earth Systems
JF - Journal of Advances in Modeling Earth Systems
IS - 3
M1 - e2022MS003099
ER -