Prediction of ammonia and total nitrogen in large freshwater lake watershed based on small sample data and analysis of their spatiotemporal variation and driving mechanism

IF 7.8 2区环境科学与生态学 Q1 ENGINEERING, CHEMICAL

Process Safety and Environmental Protection Pub Date : 2025-09-17 DOI:10.1016/j.psep.2025.107887

Chengming Luo , Xihua Wang , Y. Jun Xu , Shunqing Jia , Zejun Liu , Boyang Mao , Qinya Lv , Xuming Ji , Yanxin Rong , Yan Dai

{"title":"Prediction of ammonia and total nitrogen in large freshwater lake watershed based on small sample data and analysis of their spatiotemporal variation and driving mechanism","authors":"Chengming Luo , Xihua Wang , Y. Jun Xu , Shunqing Jia , Zejun Liu , Boyang Mao , Qinya Lv , Xuming Ji , Yanxin Rong , Yan Dai","doi":"10.1016/j.psep.2025.107887","DOIUrl":null,"url":null,"abstract":"<div><div>Ammonia nitrogen (NH₃-N) and total nitrogen (TN) pollution pose serious threats to freshwater lake ecosystems, making accurate prediction essential for watershed management. However, limited and variable-quality data challenge the performance of existing prediction models. This study proposed an integrated prediction framework combining sample enhancement, adaptive feature selection, and multiple machine learning methods to improve NH₃-N and TN prediction in the Poyang Lake watershed. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) was used to generate high-quality virtual samples, enhancing data availability. Recursive Feature Elimination (RFE) was then applied to identify key variables and remove redundancy, improving model efficiency. Four models, Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Gated Recurrent Unit (GRU), and Extreme Learning Machine, were used to construct prediction models and compared. Meanwhile, Spearman correlation analysis and principal component analysis methods were used to reveal the main sources of TN and NH₃-N pollution. Results showed clear spatiotemporal heterogeneity in NH₃-N and TN levels, with the Fuhe River sub-basin being the most polluted. Agricultural runoff, domestic sewage, and industrial discharge were identified as key pollution sources. WGAN-GP and RFE significantly improved model performance: the R<sup>2</sup> of the best prediction model (GRU) for TN improved from 0.515 to 0.709 and the best prediction model (Bi-LSTM) for NH₃-N improved from 0.369 to 0.909. The deep learning models demonstrated similar predictive capabilities and could be integrated to enhance accuracy and stability. This study offers an effective, data-efficient approach for water quality prediction under small-sample conditions and provides scientific guidance for watershed environmental management.</div></div>","PeriodicalId":20743,"journal":{"name":"Process Safety and Environmental Protection","volume":"203 ","pages":"Article 107887"},"PeriodicalIF":7.8000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Process Safety and Environmental Protection","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957582025011541","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Ammonia nitrogen (NH₃-N) and total nitrogen (TN) pollution pose serious threats to freshwater lake ecosystems, making accurate prediction essential for watershed management. However, limited and variable-quality data challenge the performance of existing prediction models. This study proposed an integrated prediction framework combining sample enhancement, adaptive feature selection, and multiple machine learning methods to improve NH₃-N and TN prediction in the Poyang Lake watershed. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) was used to generate high-quality virtual samples, enhancing data availability. Recursive Feature Elimination (RFE) was then applied to identify key variables and remove redundancy, improving model efficiency. Four models, Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Gated Recurrent Unit (GRU), and Extreme Learning Machine, were used to construct prediction models and compared. Meanwhile, Spearman correlation analysis and principal component analysis methods were used to reveal the main sources of TN and NH₃-N pollution. Results showed clear spatiotemporal heterogeneity in NH₃-N and TN levels, with the Fuhe River sub-basin being the most polluted. Agricultural runoff, domestic sewage, and industrial discharge were identified as key pollution sources. WGAN-GP and RFE significantly improved model performance: the R² of the best prediction model (GRU) for TN improved from 0.515 to 0.709 and the best prediction model (Bi-LSTM) for NH₃-N improved from 0.369 to 0.909. The deep learning models demonstrated similar predictive capabilities and could be integrated to enhance accuracy and stability. This study offers an effective, data-efficient approach for water quality prediction under small-sample conditions and provides scientific guidance for watershed environmental management.

查看原文本刊更多论文

基于小样本数据的大型淡水湖流域氨氮和总氮时空变化及驱动机制分析

氨氮（NH₃-N）和总氮（TN）污染对淡水湖生态系统构成严重威胁，对其进行准确预测对流域管理至关重要。然而，有限和可变质量的数据对现有预测模型的性能提出了挑战。本研究提出了结合样本增强、自适应特征选择和多种机器学习方法的综合预测框架，以改进鄱阳湖流域NH₃-N和TN的预测。采用梯度惩罚生成对抗网络（WGAN-GP）生成高质量的虚拟样本，提高了数据的可用性。然后采用递归特征消除（RFE）识别关键变量并去除冗余，提高了模型效率。采用长短期记忆（LSTM）、双向LSTM （Bi-LSTM）、门控循环单元（GRU）和极限学习机（Extreme Learning Machine） 4种模型构建预测模型并进行比较。同时，采用Spearman相关分析和主成分分析方法揭示了TN和NH₃-N污染的主要来源。结果表明，NH₃-N和TN水平具有明显的时空异质性，其中抚河子流域污染最严重。农业径流、生活污水和工业排放被确定为主要污染源。WGAN-GP和RFE显著提高了模型的性能：TN的最佳预测模型（GRU）的R2从0.515提高到0.709，NH₃-N的最佳预测模型（Bi-LSTM）的R2从0.369提高到0.909。深度学习模型显示出类似的预测能力，可以集成以提高准确性和稳定性。本研究为小样本条件下的水质预测提供了一种有效的、数据高效的方法，为流域环境管理提供了科学指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Process Safety and Environmental Protection 环境科学-工程：化工

CiteScore

11.40

自引率

15.40%

发文量

929

审稿时长

8.0 months

期刊介绍： The Process Safety and Environmental Protection (PSEP) journal is a leading international publication that focuses on the publication of high-quality, original research papers in the field of engineering, specifically those related to the safety of industrial processes and environmental protection. The journal encourages submissions that present new developments in safety and environmental aspects, particularly those that show how research findings can be applied in process engineering design and practice. PSEP is particularly interested in research that brings fresh perspectives to established engineering principles, identifies unsolved problems, or suggests directions for future research. The journal also values contributions that push the boundaries of traditional engineering and welcomes multidisciplinary papers. PSEP's articles are abstracted and indexed by a range of databases and services, which helps to ensure that the journal's research is accessible and recognized in the academic and professional communities. These databases include ANTE, Chemical Abstracts, Chemical Hazards in Industry, Current Contents, Elsevier Engineering Information database, Pascal Francis, Web of Science, Scopus, Engineering Information Database EnCompass LIT (Elsevier), and INSPEC. This wide coverage facilitates the dissemination of the journal's content to a global audience interested in process safety and environmental engineering.