Chengming Luo , Xihua Wang , Y. Jun Xu , Shunqing Jia , Zejun Liu , Boyang Mao , Qinya Lv , Xuming Ji , Yanxin Rong , Yan Dai
{"title":"基于小样本数据的大型淡水湖流域氨氮和总氮时空变化及驱动机制分析","authors":"Chengming Luo , Xihua Wang , Y. Jun Xu , Shunqing Jia , Zejun Liu , Boyang Mao , Qinya Lv , Xuming Ji , Yanxin Rong , Yan Dai","doi":"10.1016/j.psep.2025.107887","DOIUrl":null,"url":null,"abstract":"<div><div>Ammonia nitrogen (NH₃-N) and total nitrogen (TN) pollution pose serious threats to freshwater lake ecosystems, making accurate prediction essential for watershed management. However, limited and variable-quality data challenge the performance of existing prediction models. This study proposed an integrated prediction framework combining sample enhancement, adaptive feature selection, and multiple machine learning methods to improve NH₃-N and TN prediction in the Poyang Lake watershed. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) was used to generate high-quality virtual samples, enhancing data availability. Recursive Feature Elimination (RFE) was then applied to identify key variables and remove redundancy, improving model efficiency. Four models, Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Gated Recurrent Unit (GRU), and Extreme Learning Machine, were used to construct prediction models and compared. Meanwhile, Spearman correlation analysis and principal component analysis methods were used to reveal the main sources of TN and NH₃-N pollution. Results showed clear spatiotemporal heterogeneity in NH₃-N and TN levels, with the Fuhe River sub-basin being the most polluted. Agricultural runoff, domestic sewage, and industrial discharge were identified as key pollution sources. WGAN-GP and RFE significantly improved model performance: the R<sup>2</sup> of the best prediction model (GRU) for TN improved from 0.515 to 0.709 and the best prediction model (Bi-LSTM) for NH₃-N improved from 0.369 to 0.909. The deep learning models demonstrated similar predictive capabilities and could be integrated to enhance accuracy and stability. This study offers an effective, data-efficient approach for water quality prediction under small-sample conditions and provides scientific guidance for watershed environmental management.</div></div>","PeriodicalId":20743,"journal":{"name":"Process Safety and Environmental Protection","volume":"203 ","pages":"Article 107887"},"PeriodicalIF":7.8000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of ammonia and total nitrogen in large freshwater lake watershed based on small sample data and analysis of their spatiotemporal variation and driving mechanism\",\"authors\":\"Chengming Luo , Xihua Wang , Y. Jun Xu , Shunqing Jia , Zejun Liu , Boyang Mao , Qinya Lv , Xuming Ji , Yanxin Rong , Yan Dai\",\"doi\":\"10.1016/j.psep.2025.107887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Ammonia nitrogen (NH₃-N) and total nitrogen (TN) pollution pose serious threats to freshwater lake ecosystems, making accurate prediction essential for watershed management. However, limited and variable-quality data challenge the performance of existing prediction models. This study proposed an integrated prediction framework combining sample enhancement, adaptive feature selection, and multiple machine learning methods to improve NH₃-N and TN prediction in the Poyang Lake watershed. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) was used to generate high-quality virtual samples, enhancing data availability. Recursive Feature Elimination (RFE) was then applied to identify key variables and remove redundancy, improving model efficiency. Four models, Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Gated Recurrent Unit (GRU), and Extreme Learning Machine, were used to construct prediction models and compared. Meanwhile, Spearman correlation analysis and principal component analysis methods were used to reveal the main sources of TN and NH₃-N pollution. Results showed clear spatiotemporal heterogeneity in NH₃-N and TN levels, with the Fuhe River sub-basin being the most polluted. Agricultural runoff, domestic sewage, and industrial discharge were identified as key pollution sources. WGAN-GP and RFE significantly improved model performance: the R<sup>2</sup> of the best prediction model (GRU) for TN improved from 0.515 to 0.709 and the best prediction model (Bi-LSTM) for NH₃-N improved from 0.369 to 0.909. The deep learning models demonstrated similar predictive capabilities and could be integrated to enhance accuracy and stability. This study offers an effective, data-efficient approach for water quality prediction under small-sample conditions and provides scientific guidance for watershed environmental management.</div></div>\",\"PeriodicalId\":20743,\"journal\":{\"name\":\"Process Safety and Environmental Protection\",\"volume\":\"203 \",\"pages\":\"Article 107887\"},\"PeriodicalIF\":7.8000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Process Safety and Environmental Protection\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957582025011541\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Process Safety and Environmental Protection","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957582025011541","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
Prediction of ammonia and total nitrogen in large freshwater lake watershed based on small sample data and analysis of their spatiotemporal variation and driving mechanism
Ammonia nitrogen (NH₃-N) and total nitrogen (TN) pollution pose serious threats to freshwater lake ecosystems, making accurate prediction essential for watershed management. However, limited and variable-quality data challenge the performance of existing prediction models. This study proposed an integrated prediction framework combining sample enhancement, adaptive feature selection, and multiple machine learning methods to improve NH₃-N and TN prediction in the Poyang Lake watershed. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) was used to generate high-quality virtual samples, enhancing data availability. Recursive Feature Elimination (RFE) was then applied to identify key variables and remove redundancy, improving model efficiency. Four models, Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Gated Recurrent Unit (GRU), and Extreme Learning Machine, were used to construct prediction models and compared. Meanwhile, Spearman correlation analysis and principal component analysis methods were used to reveal the main sources of TN and NH₃-N pollution. Results showed clear spatiotemporal heterogeneity in NH₃-N and TN levels, with the Fuhe River sub-basin being the most polluted. Agricultural runoff, domestic sewage, and industrial discharge were identified as key pollution sources. WGAN-GP and RFE significantly improved model performance: the R2 of the best prediction model (GRU) for TN improved from 0.515 to 0.709 and the best prediction model (Bi-LSTM) for NH₃-N improved from 0.369 to 0.909. The deep learning models demonstrated similar predictive capabilities and could be integrated to enhance accuracy and stability. This study offers an effective, data-efficient approach for water quality prediction under small-sample conditions and provides scientific guidance for watershed environmental management.
期刊介绍:
The Process Safety and Environmental Protection (PSEP) journal is a leading international publication that focuses on the publication of high-quality, original research papers in the field of engineering, specifically those related to the safety of industrial processes and environmental protection. The journal encourages submissions that present new developments in safety and environmental aspects, particularly those that show how research findings can be applied in process engineering design and practice.
PSEP is particularly interested in research that brings fresh perspectives to established engineering principles, identifies unsolved problems, or suggests directions for future research. The journal also values contributions that push the boundaries of traditional engineering and welcomes multidisciplinary papers.
PSEP's articles are abstracted and indexed by a range of databases and services, which helps to ensure that the journal's research is accessible and recognized in the academic and professional communities. These databases include ANTE, Chemical Abstracts, Chemical Hazards in Industry, Current Contents, Elsevier Engineering Information database, Pascal Francis, Web of Science, Scopus, Engineering Information Database EnCompass LIT (Elsevier), and INSPEC. This wide coverage facilitates the dissemination of the journal's content to a global audience interested in process safety and environmental engineering.