提高废水处理过程中的数据质量:利用深度变异自动编码器和遗传算法估算缺失数据

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Christian Kazadi Mbamba , Philip Keymer , Maira Alvi , Sebastian O.N. Topalian , Fareed Ud Din , Damien J. Batstone
{"title":"提高废水处理过程中的数据质量:利用深度变异自动编码器和遗传算法估算缺失数据","authors":"Christian Kazadi Mbamba ,&nbsp;Philip Keymer ,&nbsp;Maira Alvi ,&nbsp;Sebastian O.N. Topalian ,&nbsp;Fareed Ud Din ,&nbsp;Damien J. Batstone","doi":"10.1016/j.compchemeng.2025.109123","DOIUrl":null,"url":null,"abstract":"<div><div>Missing data is a persistent challenge in wastewater analysis, often leading to biased results and reduced accuracy. This study introduces an innovative Automated Machine Learning (AutoML) framework that combines deep learning-based variational autoencoders (VAEs) and genetic algorithms (GAs) to address this issue. VAEs are employed to impute missing values by learning latent data representations, while GAs optimize the VAE architecture and hyperparameters, including the size of the latent space. The framework is specifically designed to handle the complex and nonlinear relationships in wastewater datasets.</div><div>The framework was trained and validated using data from a full-scale water resource recovery facility. The imputed data from the optimized VAE, developed using the GA-based AutoML framework, is then used to train predictive models. Experimental evaluations demonstrate the effectiveness of the proposed approach over traditional imputation methods. The results reveal that the models can accurately predict key variables such as ammonia nitrogen (NH<sub>4</sub>-N), nitrate nitrogen (NO<sub>3</sub>-N), pH, and biogas flow rate, using imputed data. The scalability and adaptability of this framework make it valuable for real-time wastewater monitoring and predictive analytics.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"199 ","pages":"Article 109123"},"PeriodicalIF":3.9000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing data quality in wastewater processes: Missing data imputation with deep Variational Autoencoders and genetic algorithms\",\"authors\":\"Christian Kazadi Mbamba ,&nbsp;Philip Keymer ,&nbsp;Maira Alvi ,&nbsp;Sebastian O.N. Topalian ,&nbsp;Fareed Ud Din ,&nbsp;Damien J. Batstone\",\"doi\":\"10.1016/j.compchemeng.2025.109123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Missing data is a persistent challenge in wastewater analysis, often leading to biased results and reduced accuracy. This study introduces an innovative Automated Machine Learning (AutoML) framework that combines deep learning-based variational autoencoders (VAEs) and genetic algorithms (GAs) to address this issue. VAEs are employed to impute missing values by learning latent data representations, while GAs optimize the VAE architecture and hyperparameters, including the size of the latent space. The framework is specifically designed to handle the complex and nonlinear relationships in wastewater datasets.</div><div>The framework was trained and validated using data from a full-scale water resource recovery facility. The imputed data from the optimized VAE, developed using the GA-based AutoML framework, is then used to train predictive models. Experimental evaluations demonstrate the effectiveness of the proposed approach over traditional imputation methods. The results reveal that the models can accurately predict key variables such as ammonia nitrogen (NH<sub>4</sub>-N), nitrate nitrogen (NO<sub>3</sub>-N), pH, and biogas flow rate, using imputed data. The scalability and adaptability of this framework make it valuable for real-time wastewater monitoring and predictive analytics.</div></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"199 \",\"pages\":\"Article 109123\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135425001279\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425001279","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

在废水分析中,数据缺失是一个持续的挑战,经常导致结果有偏差和准确性降低。本研究引入了一个创新的自动化机器学习(AutoML)框架,该框架结合了基于深度学习的变分自编码器(VAEs)和遗传算法(GAs)来解决这个问题。使用VAE通过学习潜在数据表示来估算缺失值,而GAs优化VAE架构和超参数,包括潜在空间的大小。该框架专门设计用于处理废水数据集中的复杂和非线性关系。该框架使用来自一个全面水资源回收设施的数据进行了培训和验证。使用基于ga的AutoML框架开发的经过优化的VAE的输入数据,然后用于训练预测模型。实验结果表明,该方法比传统的插值方法更有效。结果表明,该模型能够准确预测氨氮(NH4-N)、硝态氮(NO3-N)、pH和沼气流量等关键变量。该框架的可扩展性和适应性使其在实时废水监测和预测分析中具有重要价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Enhancing data quality in wastewater processes: Missing data imputation with deep Variational Autoencoders and genetic algorithms

Enhancing data quality in wastewater processes: Missing data imputation with deep Variational Autoencoders and genetic algorithms
Missing data is a persistent challenge in wastewater analysis, often leading to biased results and reduced accuracy. This study introduces an innovative Automated Machine Learning (AutoML) framework that combines deep learning-based variational autoencoders (VAEs) and genetic algorithms (GAs) to address this issue. VAEs are employed to impute missing values by learning latent data representations, while GAs optimize the VAE architecture and hyperparameters, including the size of the latent space. The framework is specifically designed to handle the complex and nonlinear relationships in wastewater datasets.
The framework was trained and validated using data from a full-scale water resource recovery facility. The imputed data from the optimized VAE, developed using the GA-based AutoML framework, is then used to train predictive models. Experimental evaluations demonstrate the effectiveness of the proposed approach over traditional imputation methods. The results reveal that the models can accurately predict key variables such as ammonia nitrogen (NH4-N), nitrate nitrogen (NO3-N), pH, and biogas flow rate, using imputed data. The scalability and adaptability of this framework make it valuable for real-time wastewater monitoring and predictive analytics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信