Augmented machine learning for sewage quality assessment with limited data

IF 14 1区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Jia-Qiang Lv , Wan-Xin Yin , Jia-Min Xu , Hao-Yi Cheng , Zhi-Ling Li , Ji-Xian Yang , Ai-Jie Wang , Hong-Cheng Wang
{"title":"Augmented machine learning for sewage quality assessment with limited data","authors":"Jia-Qiang Lv ,&nbsp;Wan-Xin Yin ,&nbsp;Jia-Min Xu ,&nbsp;Hao-Yi Cheng ,&nbsp;Zhi-Ling Li ,&nbsp;Ji-Xian Yang ,&nbsp;Ai-Jie Wang ,&nbsp;Hong-Cheng Wang","doi":"10.1016/j.ese.2024.100512","DOIUrl":null,"url":null,"abstract":"<div><div>Physical, chemical, and biological processes within sewers significantly alter sewage composition during conveyance. This leads to the formation of sulfide and methane—compounds that contribute to sewer corrosion and greenhouse gas emissions. Reliable modeling of these compounds is essential for effective sewer management, but the development of machine learning (ML) models is hindered by differences in data accessibility and sampling frequencies of water quality variables. Here we present a mechanistically enhanced hybrid (ME-Hybrid) model that combines mechanistic modeling with data-driven approaches. This model harmonizes datasets with varying sampling frequencies and generates synthetic samples for ML training, thereby enhancing the monitoring of methane and sulfide in sewers. The optimal ME-Hybrid model integrates the backpropagation neural network with mechanistic frequency harmonization. We demonstrate that the ME-Hybrid model outperforms pure ML and linear interpolation in capturing fluctuating trends and extremes of sulfide concentrations, achieving a coefficient of determination (R<sup>2</sup>) of 0.94. Synthetic samples generated through mechanistic augmentation closely approximate real samples in modeling performance, statistical distribution, and data structure. This enables the model to maintain high predictive accuracy (R<sup>2</sup> &gt; 0.76) for sulfide even when trained on only 50 % of the dataset. Additionally, the ME-Hybrid model successfully assesses sewer methane concentrations with an R<sup>2</sup> of 0.94, validating its applicability and generalization ability. Our results provide a reliable methodological framework for modeling and prediction under data scarcity. By facilitating better monitoring and management of sewer systems, the ME-Hybrid model aids in the development of strategies that minimize environmental impacts, enhance urban resilience, and ultimately lead to sustainable urban water systems.</div></div>","PeriodicalId":34434,"journal":{"name":"Environmental Science and Ecotechnology","volume":"23 ","pages":"Article 100512"},"PeriodicalIF":14.0000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science and Ecotechnology","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666498424001261","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Physical, chemical, and biological processes within sewers significantly alter sewage composition during conveyance. This leads to the formation of sulfide and methane—compounds that contribute to sewer corrosion and greenhouse gas emissions. Reliable modeling of these compounds is essential for effective sewer management, but the development of machine learning (ML) models is hindered by differences in data accessibility and sampling frequencies of water quality variables. Here we present a mechanistically enhanced hybrid (ME-Hybrid) model that combines mechanistic modeling with data-driven approaches. This model harmonizes datasets with varying sampling frequencies and generates synthetic samples for ML training, thereby enhancing the monitoring of methane and sulfide in sewers. The optimal ME-Hybrid model integrates the backpropagation neural network with mechanistic frequency harmonization. We demonstrate that the ME-Hybrid model outperforms pure ML and linear interpolation in capturing fluctuating trends and extremes of sulfide concentrations, achieving a coefficient of determination (R2) of 0.94. Synthetic samples generated through mechanistic augmentation closely approximate real samples in modeling performance, statistical distribution, and data structure. This enables the model to maintain high predictive accuracy (R2 > 0.76) for sulfide even when trained on only 50 % of the dataset. Additionally, the ME-Hybrid model successfully assesses sewer methane concentrations with an R2 of 0.94, validating its applicability and generalization ability. Our results provide a reliable methodological framework for modeling and prediction under data scarcity. By facilitating better monitoring and management of sewer systems, the ME-Hybrid model aids in the development of strategies that minimize environmental impacts, enhance urban resilience, and ultimately lead to sustainable urban water systems.

Abstract Image

利用有限数据进行污水质量评估的增强型机器学习
在输送过程中,下水道内的物理、化学和生物过程会显著改变污水成分。这导致硫化物和甲烷化合物的形成,从而造成下水道腐蚀和温室气体排放。这些化合物的可靠建模对于有效的下水道管理至关重要,但机器学习(ML)模型的开发却受到水质变量数据可获取性和采样频率差异的阻碍。在此,我们提出了一种机理增强型混合(ME-Hybrid)模型,该模型结合了机理建模和数据驱动方法。该模型可协调不同采样频率的数据集,并生成用于 ML 训练的合成样本,从而加强对下水道中甲烷和硫化物的监测。最佳 ME-Hybrid 模型集成了反向传播神经网络和机理频率协调。我们证明,ME-Hybrid 模型在捕捉硫化物浓度的波动趋势和极端值方面优于纯 ML 和线性插值,其判定系数 (R2) 达到 0.94。通过机理增强生成的合成样本在建模性能、统计分布和数据结构方面与真实样本非常接近。这使得该模型即使只在 50% 的数据集上进行训练,也能保持较高的硫化物预测精度(R2 > 0.76)。此外,ME-Hybrid 模型成功评估了下水道甲烷浓度,R2 为 0.94,验证了其适用性和概括能力。我们的研究结果为数据稀缺情况下的建模和预测提供了可靠的方法框架。通过促进更好地监测和管理下水道系统,ME-Hybrid 模型有助于制定战略,最大限度地减少对环境的影响,提高城市的适应能力,并最终实现可持续的城市水系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
20.40
自引率
6.30%
发文量
11
审稿时长
18 days
期刊介绍: Environmental Science & Ecotechnology (ESE) is an international, open-access journal publishing original research in environmental science, engineering, ecotechnology, and related fields. Authors publishing in ESE can immediately, permanently, and freely share their work. They have license options and retain copyright. Published by Elsevier, ESE is co-organized by the Chinese Society for Environmental Sciences, Harbin Institute of Technology, and the Chinese Research Academy of Environmental Sciences, under the supervision of the China Association for Science and Technology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信