AI predicts river water quality with weather data
June 3, 2021
difficulty and expense of collecting river water samples in remote areas has led
to significant — and in some cases, decades-long — gaps in available water
chemistry data, according to a Penn State-led team of researchers. The team is
using artificial intelligence (AI) to predict water quality and fill the gaps in
the data. Their efforts could lead to an improved understanding of how rivers
react to human disturbances and climate change.
The researchers developed a model that forecasts dissolved oxygen (DO), a key
indicator of water’s capability to support aquatic life, in lightly monitored
watersheds across the United States. They published their results in
Environmental Science & Technology.
Generally, the amount of oxygen dissolved in rivers and streams reflects their
ecosystems, as certain organisms produce oxygen while others consume it. DO also
varies based on the season and elevation, and the area’s local weather
conditions cause fluctuations, too, according to Li Li, professor of civil and
environmental engineering at Penn State.
“People usually think about DO as being driven by stream biological and
geochemical processes, like fish breathing in the water or aquatic plants making
DO on sunny days,” Li said. “But weather can also be a major driver.
Hydrometeorological conditions, including temperature and sunlight, are
influencing the life in the water, and this in turn influences the concentration
levels of DO.”
Hydrometeorological data, which tracks how water moves between the surface of
the Earth and the atmosphere, is recorded far more frequently and with more
spatial coverage than water chemistry data, according to Wei Zhi, postdoctoral
researcher in the Department of Civil and Environmental Engineering and first
author of the paper. The team theorized that a nationwide hydrometeorological
database, which would include measurements like air temperature, precipitation
and stream flow rate, could be used to forecast DO concentrations in remote
“There is a lot of hydrometeorological data available, and we wanted to see if
there was enough correlation, even indirectly, to make a prediction and help
fill in the river water chemistry data gaps,” Zhi said.
“A seed grant from Penn State’s Institute of Computation and Data Science
supported this research.”
The model was created through an AI framework known as a Long Short-Term Memory
(LSTM) network, an approach used to model natural “storage and release” systems,
according to Chaopeng Shen, associate professor of civil and environmental
engineering at Penn State.
“Think of it like a box,” Shen said. “It can take in water and store it in a
tank at certain rates, while on the other side releasing it at different rates,
and each of those rates are determined by the training. We have used it in the
past to model soil moisture, rain flow, water temperature and now, DO.”
The researchers received data from the Catchment Attributes and Meteorology for
Large-sample Studies (CAMELS) hydrology database, which included a recent
addition of river water chemistry data from 1980 to 2014 for minimally disturbed
watersheds. Of the 505 watersheds included in the “CAMELS-chem” data set, the
team found 236 with the needed minimum of ten DO concentration measurements in
the 35-year span.
To train the LSTM network and create a model, they used watershed data from 1980
to 2000, including DO concentrations, daily hydrometeorological measurements and
watershed attributes like topography, land cover and vegetation.
According to Zhi, the team then tested the model’s accuracy against the
remaining DO data from 2001 to 2014, finding that the model had generally
learned the dynamics of DO solubility, including how oxygen decreases in warmer
water temperatures and at higher elevation. It also proved to have strong
predictive capability in almost three-quarters of test cases.
“It is a really strong tool,” Zhi said. “It surprised us to see how well the
model learned DO dynamics across many different watershed conditions on a
added that the model performed best in areas with steadier DO levels and stable
water flow conditions, but more data would be needed to improve forecasting
capabilities for watersheds with higher DO and streamflow variability.
“If we can collect more samples that capture the high peaks and low troughs of
DO levels, we will be able to reflect that in the training process and improve
performance in the future,” Zhi said.
Penn State researchers Dapeng Feng, doctoral candidate in environmental
engineering, and Wen-Ping Tsai, postdoctoral researcher in the Department of
Civil and Environmental Engineering, and University of Nevada, Reno researchers
Adrian Harpold, associate professor of mountain ecohydrology, and Gary Sterle,
graduate research assistant in hydrological sciences, also contributed to the
A seed grant from Penn State’s Institute of Computation and Data Science, the
U.S. Department of Energy Subsurface Biogeochemical Research program, and the
National Science Foundation supported this research.