Jia-Qiang Lv , Wan-Xin Yin , Jia-Min Xu , Hao-Yi Cheng , Zhi-Ling Li , Ji-Xian Yang , Ai-Jie Wang , Hong-Cheng Wang
{"title":"利用有限数据进行污水质量评估的增强型机器学习","authors":"Jia-Qiang Lv , Wan-Xin Yin , Jia-Min Xu , Hao-Yi Cheng , Zhi-Ling Li , Ji-Xian Yang , Ai-Jie Wang , Hong-Cheng Wang","doi":"10.1016/j.ese.2024.100512","DOIUrl":null,"url":null,"abstract":"<div><div>Physical, chemical, and biological processes within sewers significantly alter sewage composition during conveyance. This leads to the formation of sulfide and methane—compounds that contribute to sewer corrosion and greenhouse gas emissions. Reliable modeling of these compounds is essential for effective sewer management, but the development of machine learning (ML) models is hindered by differences in data accessibility and sampling frequencies of water quality variables. Here we present a mechanistically enhanced hybrid (ME-Hybrid) model that combines mechanistic modeling with data-driven approaches. This model harmonizes datasets with varying sampling frequencies and generates synthetic samples for ML training, thereby enhancing the monitoring of methane and sulfide in sewers. The optimal ME-Hybrid model integrates the backpropagation neural network with mechanistic frequency harmonization. We demonstrate that the ME-Hybrid model outperforms pure ML and linear interpolation in capturing fluctuating trends and extremes of sulfide concentrations, achieving a coefficient of determination (R<sup>2</sup>) of 0.94. Synthetic samples generated through mechanistic augmentation closely approximate real samples in modeling performance, statistical distribution, and data structure. This enables the model to maintain high predictive accuracy (R<sup>2</sup> > 0.76) for sulfide even when trained on only 50 % of the dataset. Additionally, the ME-Hybrid model successfully assesses sewer methane concentrations with an R<sup>2</sup> of 0.94, validating its applicability and generalization ability. Our results provide a reliable methodological framework for modeling and prediction under data scarcity. By facilitating better monitoring and management of sewer systems, the ME-Hybrid model aids in the development of strategies that minimize environmental impacts, enhance urban resilience, and ultimately lead to sustainable urban water systems.</div></div>","PeriodicalId":34434,"journal":{"name":"Environmental Science and Ecotechnology","volume":"23 ","pages":"Article 100512"},"PeriodicalIF":14.0000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Augmented machine learning for sewage quality assessment with limited data\",\"authors\":\"Jia-Qiang Lv , Wan-Xin Yin , Jia-Min Xu , Hao-Yi Cheng , Zhi-Ling Li , Ji-Xian Yang , Ai-Jie Wang , Hong-Cheng Wang\",\"doi\":\"10.1016/j.ese.2024.100512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Physical, chemical, and biological processes within sewers significantly alter sewage composition during conveyance. This leads to the formation of sulfide and methane—compounds that contribute to sewer corrosion and greenhouse gas emissions. Reliable modeling of these compounds is essential for effective sewer management, but the development of machine learning (ML) models is hindered by differences in data accessibility and sampling frequencies of water quality variables. Here we present a mechanistically enhanced hybrid (ME-Hybrid) model that combines mechanistic modeling with data-driven approaches. This model harmonizes datasets with varying sampling frequencies and generates synthetic samples for ML training, thereby enhancing the monitoring of methane and sulfide in sewers. The optimal ME-Hybrid model integrates the backpropagation neural network with mechanistic frequency harmonization. We demonstrate that the ME-Hybrid model outperforms pure ML and linear interpolation in capturing fluctuating trends and extremes of sulfide concentrations, achieving a coefficient of determination (R<sup>2</sup>) of 0.94. Synthetic samples generated through mechanistic augmentation closely approximate real samples in modeling performance, statistical distribution, and data structure. This enables the model to maintain high predictive accuracy (R<sup>2</sup> > 0.76) for sulfide even when trained on only 50 % of the dataset. Additionally, the ME-Hybrid model successfully assesses sewer methane concentrations with an R<sup>2</sup> of 0.94, validating its applicability and generalization ability. Our results provide a reliable methodological framework for modeling and prediction under data scarcity. By facilitating better monitoring and management of sewer systems, the ME-Hybrid model aids in the development of strategies that minimize environmental impacts, enhance urban resilience, and ultimately lead to sustainable urban water systems.</div></div>\",\"PeriodicalId\":34434,\"journal\":{\"name\":\"Environmental Science and Ecotechnology\",\"volume\":\"23 \",\"pages\":\"Article 100512\"},\"PeriodicalIF\":14.0000,\"publicationDate\":\"2024-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Science and Ecotechnology\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666498424001261\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science and Ecotechnology","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666498424001261","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
在输送过程中,下水道内的物理、化学和生物过程会显著改变污水成分。这导致硫化物和甲烷化合物的形成,从而造成下水道腐蚀和温室气体排放。这些化合物的可靠建模对于有效的下水道管理至关重要,但机器学习(ML)模型的开发却受到水质变量数据可获取性和采样频率差异的阻碍。在此,我们提出了一种机理增强型混合(ME-Hybrid)模型,该模型结合了机理建模和数据驱动方法。该模型可协调不同采样频率的数据集,并生成用于 ML 训练的合成样本,从而加强对下水道中甲烷和硫化物的监测。最佳 ME-Hybrid 模型集成了反向传播神经网络和机理频率协调。我们证明,ME-Hybrid 模型在捕捉硫化物浓度的波动趋势和极端值方面优于纯 ML 和线性插值,其判定系数 (R2) 达到 0.94。通过机理增强生成的合成样本在建模性能、统计分布和数据结构方面与真实样本非常接近。这使得该模型即使只在 50% 的数据集上进行训练,也能保持较高的硫化物预测精度(R2 > 0.76)。此外,ME-Hybrid 模型成功评估了下水道甲烷浓度,R2 为 0.94,验证了其适用性和概括能力。我们的研究结果为数据稀缺情况下的建模和预测提供了可靠的方法框架。通过促进更好地监测和管理下水道系统,ME-Hybrid 模型有助于制定战略,最大限度地减少对环境的影响,提高城市的适应能力,并最终实现可持续的城市水系统。
Augmented machine learning for sewage quality assessment with limited data
Physical, chemical, and biological processes within sewers significantly alter sewage composition during conveyance. This leads to the formation of sulfide and methane—compounds that contribute to sewer corrosion and greenhouse gas emissions. Reliable modeling of these compounds is essential for effective sewer management, but the development of machine learning (ML) models is hindered by differences in data accessibility and sampling frequencies of water quality variables. Here we present a mechanistically enhanced hybrid (ME-Hybrid) model that combines mechanistic modeling with data-driven approaches. This model harmonizes datasets with varying sampling frequencies and generates synthetic samples for ML training, thereby enhancing the monitoring of methane and sulfide in sewers. The optimal ME-Hybrid model integrates the backpropagation neural network with mechanistic frequency harmonization. We demonstrate that the ME-Hybrid model outperforms pure ML and linear interpolation in capturing fluctuating trends and extremes of sulfide concentrations, achieving a coefficient of determination (R2) of 0.94. Synthetic samples generated through mechanistic augmentation closely approximate real samples in modeling performance, statistical distribution, and data structure. This enables the model to maintain high predictive accuracy (R2 > 0.76) for sulfide even when trained on only 50 % of the dataset. Additionally, the ME-Hybrid model successfully assesses sewer methane concentrations with an R2 of 0.94, validating its applicability and generalization ability. Our results provide a reliable methodological framework for modeling and prediction under data scarcity. By facilitating better monitoring and management of sewer systems, the ME-Hybrid model aids in the development of strategies that minimize environmental impacts, enhance urban resilience, and ultimately lead to sustainable urban water systems.
期刊介绍:
Environmental Science & Ecotechnology (ESE) is an international, open-access journal publishing original research in environmental science, engineering, ecotechnology, and related fields. Authors publishing in ESE can immediately, permanently, and freely share their work. They have license options and retain copyright. Published by Elsevier, ESE is co-organized by the Chinese Society for Environmental Sciences, Harbin Institute of Technology, and the Chinese Research Academy of Environmental Sciences, under the supervision of the China Association for Science and Technology.