Jia-Qiang Lv , Wan-Xin Yin , Jia-Min Xu , Hao-Yi Cheng , Zhi-Ling Li , Ji-Xian Yang , Ai-Jie Wang , Hong-Cheng Wang
{"title":"Augmented machine learning for sewage quality assessment with limited data","authors":"Jia-Qiang Lv , Wan-Xin Yin , Jia-Min Xu , Hao-Yi Cheng , Zhi-Ling Li , Ji-Xian Yang , Ai-Jie Wang , Hong-Cheng Wang","doi":"10.1016/j.ese.2024.100512","DOIUrl":null,"url":null,"abstract":"<div><div>Physical, chemical, and biological processes within sewers significantly alter sewage composition during conveyance. This leads to the formation of sulfide and methane—compounds that contribute to sewer corrosion and greenhouse gas emissions. Reliable modeling of these compounds is essential for effective sewer management, but the development of machine learning (ML) models is hindered by differences in data accessibility and sampling frequencies of water quality variables. Here we present a mechanistically enhanced hybrid (ME-Hybrid) model that combines mechanistic modeling with data-driven approaches. This model harmonizes datasets with varying sampling frequencies and generates synthetic samples for ML training, thereby enhancing the monitoring of methane and sulfide in sewers. The optimal ME-Hybrid model integrates the backpropagation neural network with mechanistic frequency harmonization. We demonstrate that the ME-Hybrid model outperforms pure ML and linear interpolation in capturing fluctuating trends and extremes of sulfide concentrations, achieving a coefficient of determination (R<sup>2</sup>) of 0.94. Synthetic samples generated through mechanistic augmentation closely approximate real samples in modeling performance, statistical distribution, and data structure. This enables the model to maintain high predictive accuracy (R<sup>2</sup> > 0.76) for sulfide even when trained on only 50 % of the dataset. Additionally, the ME-Hybrid model successfully assesses sewer methane concentrations with an R<sup>2</sup> of 0.94, validating its applicability and generalization ability. Our results provide a reliable methodological framework for modeling and prediction under data scarcity. By facilitating better monitoring and management of sewer systems, the ME-Hybrid model aids in the development of strategies that minimize environmental impacts, enhance urban resilience, and ultimately lead to sustainable urban water systems.</div></div>","PeriodicalId":34434,"journal":{"name":"Environmental Science and Ecotechnology","volume":"23 ","pages":"Article 100512"},"PeriodicalIF":14.0000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science and Ecotechnology","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666498424001261","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Physical, chemical, and biological processes within sewers significantly alter sewage composition during conveyance. This leads to the formation of sulfide and methane—compounds that contribute to sewer corrosion and greenhouse gas emissions. Reliable modeling of these compounds is essential for effective sewer management, but the development of machine learning (ML) models is hindered by differences in data accessibility and sampling frequencies of water quality variables. Here we present a mechanistically enhanced hybrid (ME-Hybrid) model that combines mechanistic modeling with data-driven approaches. This model harmonizes datasets with varying sampling frequencies and generates synthetic samples for ML training, thereby enhancing the monitoring of methane and sulfide in sewers. The optimal ME-Hybrid model integrates the backpropagation neural network with mechanistic frequency harmonization. We demonstrate that the ME-Hybrid model outperforms pure ML and linear interpolation in capturing fluctuating trends and extremes of sulfide concentrations, achieving a coefficient of determination (R2) of 0.94. Synthetic samples generated through mechanistic augmentation closely approximate real samples in modeling performance, statistical distribution, and data structure. This enables the model to maintain high predictive accuracy (R2 > 0.76) for sulfide even when trained on only 50 % of the dataset. Additionally, the ME-Hybrid model successfully assesses sewer methane concentrations with an R2 of 0.94, validating its applicability and generalization ability. Our results provide a reliable methodological framework for modeling and prediction under data scarcity. By facilitating better monitoring and management of sewer systems, the ME-Hybrid model aids in the development of strategies that minimize environmental impacts, enhance urban resilience, and ultimately lead to sustainable urban water systems.
期刊介绍:
Environmental Science & Ecotechnology (ESE) is an international, open-access journal publishing original research in environmental science, engineering, ecotechnology, and related fields. Authors publishing in ESE can immediately, permanently, and freely share their work. They have license options and retain copyright. Published by Elsevier, ESE is co-organized by the Chinese Society for Environmental Sciences, Harbin Institute of Technology, and the Chinese Research Academy of Environmental Sciences, under the supervision of the China Association for Science and Technology.