TY - JOUR
T1 - The Data Synergy Effects of Time-Series Deep Learning Models in Hydrology
AU - Fang, Kuai
AU - Kifer, Daniel
AU - Lawson, Kathryn
AU - Feng, Dapeng
AU - Shen, Chaopeng
N1 - Publisher Copyright:
© 2022. American Geophysical Union. All Rights Reserved.
PY - 2022/4
Y1 - 2022/4
N2 - When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to stratify a large domain into multiple regions (or regimes) and study each region separately. Traditional wisdom suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, each stratified model has access to fewer and less diverse data points. Here, through two hydrologic examples (soil moisture and streamflow), we show that conventional wisdom may no longer hold in the era of big data and deep learning (DL). We systematically examined an effect we call data synergy, where the results of the DL models improved when data were pooled together from characteristically different regions. The performance of the DL models benefited from modest diversity in the training data compared to a homogeneous training set, even with similar data quantity. Moreover, allowing heterogeneous training data makes eligible much larger training datasets, which is an inherent advantage of DL. A large, diverse data set is advantageous in terms of representing extreme events and future scenarios, which has strong implications for climate change impact assessment. The results here suggest the research community should place greater emphasis on data sharing.
AB - When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to stratify a large domain into multiple regions (or regimes) and study each region separately. Traditional wisdom suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, each stratified model has access to fewer and less diverse data points. Here, through two hydrologic examples (soil moisture and streamflow), we show that conventional wisdom may no longer hold in the era of big data and deep learning (DL). We systematically examined an effect we call data synergy, where the results of the DL models improved when data were pooled together from characteristically different regions. The performance of the DL models benefited from modest diversity in the training data compared to a homogeneous training set, even with similar data quantity. Moreover, allowing heterogeneous training data makes eligible much larger training datasets, which is an inherent advantage of DL. A large, diverse data set is advantageous in terms of representing extreme events and future scenarios, which has strong implications for climate change impact assessment. The results here suggest the research community should place greater emphasis on data sharing.
UR - http://www.scopus.com/inward/record.url?scp=85128046438&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128046438&partnerID=8YFLogxK
U2 - 10.1029/2021WR029583
DO - 10.1029/2021WR029583
M3 - Article
AN - SCOPUS:85128046438
SN - 0043-1397
VL - 58
JO - Water Resources Research
JF - Water Resources Research
IS - 4
M1 - e2021WR029583
ER -