Dissolved oxygen (DO) reflects river metabolic pulses and is an essential water quality measure. Our capabilities of forecasting DO however remain elusive. Water quality data, specifically DO data here, often have large gaps and sparse areal and temporal coverage. Earth surface and hydrometeorology data, on the other hand, have become largely available. Here we ask: can a Long Short-Term Memory (LSTM) model learn about river DO dynamics from sparse DO and intensive (daily) hydrometeorology data? We used CAMELS-chem, a new data set with DO concentrations from 236 minimally disturbed watersheds across the U.S. The model generally learns the theory of DO solubility and captures its decreasing trend with increasing water temperature. It exhibits the potential of predicting DO in "chemically ungauged basins", defined as basins without any measurements of DO and broadly water quality in general. The model however misses some DO peaks and troughs when in-stream biogeochemical processes become important. Surprisingly, the model does not perform better where more data are available. Instead, it performs better in basins with low variations of streamflow and DO, high runoff-ratio (>0.45), and winter precipitation peaks. Results here suggest that more data collections at DO peaks and troughs and in sparsely monitored areas are essential to overcome the issue of data scarcity, an outstanding challenge in the water quality community.
All Science Journal Classification (ASJC) codes
- Environmental Chemistry