{"title":"Deep representation learning enables cross-basin water quality prediction under data-scarce conditions","authors":"Yue Zheng, Xiaoran Zhang, Yongchao Zhou, Yiping Zhang, Tuqiao Zhang, Raziyeh Farmani","doi":"10.1038/s41545-025-00466-2","DOIUrl":null,"url":null,"abstract":"<p>Artificial intelligence has been extensively used to predict surface water quality to assess the health of aquatic ecosystems proactively. However, water quality prediction in data-scarce conditions is a challenge, especially with heterogeneous data from monitoring sites that lack similarity in water quality, hindering the information transfer. A deep learning model is proposed that utilizes representation learning to capture knowledge from source river basins during the pre-training stage, and incorporates meteorological data to accurately predict water quality. This model is successfully implemented and validated using data from 149 monitoring sites across inland China. The results show that the model has outstanding prediction accuracy across all sites, with a mean Nash-Sutcliffe efficiency of 0.80, and has a significant advantage in multi-indicator prediction. The model maintains its excellent performance even when trained with only half of the data. This can be attributed to the representation learning used in the pre-training stage, which enables extensive and accurate prediction under data-scarce conditions. The developed model holds significant potential for cross-basin water quality prediction, which could substantially advance the development of water environment system management.</p>","PeriodicalId":19375,"journal":{"name":"npj Clean Water","volume":"44 1","pages":""},"PeriodicalIF":10.4000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Clean Water","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1038/s41545-025-00466-2","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial intelligence has been extensively used to predict surface water quality to assess the health of aquatic ecosystems proactively. However, water quality prediction in data-scarce conditions is a challenge, especially with heterogeneous data from monitoring sites that lack similarity in water quality, hindering the information transfer. A deep learning model is proposed that utilizes representation learning to capture knowledge from source river basins during the pre-training stage, and incorporates meteorological data to accurately predict water quality. This model is successfully implemented and validated using data from 149 monitoring sites across inland China. The results show that the model has outstanding prediction accuracy across all sites, with a mean Nash-Sutcliffe efficiency of 0.80, and has a significant advantage in multi-indicator prediction. The model maintains its excellent performance even when trained with only half of the data. This can be attributed to the representation learning used in the pre-training stage, which enables extensive and accurate prediction under data-scarce conditions. The developed model holds significant potential for cross-basin water quality prediction, which could substantially advance the development of water environment system management.
npj Clean WaterEnvironmental Science-Water Science and Technology
CiteScore
15.30
自引率
2.60%
发文量
61
审稿时长
5 weeks
期刊介绍:
npj Clean Water publishes high-quality papers that report cutting-edge science, technology, applications, policies, and societal issues contributing to a more sustainable supply of clean water. The journal's publications may also support and accelerate the achievement of Sustainable Development Goal 6, which focuses on clean water and sanitation.