Christian Kazadi Mbamba , Philip Keymer , Maira Alvi , Sebastian O.N. Topalian , Fareed Ud Din , Damien J. Batstone
{"title":"提高废水处理过程中的数据质量:利用深度变异自动编码器和遗传算法估算缺失数据","authors":"Christian Kazadi Mbamba , Philip Keymer , Maira Alvi , Sebastian O.N. Topalian , Fareed Ud Din , Damien J. Batstone","doi":"10.1016/j.compchemeng.2025.109123","DOIUrl":null,"url":null,"abstract":"<div><div>Missing data is a persistent challenge in wastewater analysis, often leading to biased results and reduced accuracy. This study introduces an innovative Automated Machine Learning (AutoML) framework that combines deep learning-based variational autoencoders (VAEs) and genetic algorithms (GAs) to address this issue. VAEs are employed to impute missing values by learning latent data representations, while GAs optimize the VAE architecture and hyperparameters, including the size of the latent space. The framework is specifically designed to handle the complex and nonlinear relationships in wastewater datasets.</div><div>The framework was trained and validated using data from a full-scale water resource recovery facility. The imputed data from the optimized VAE, developed using the GA-based AutoML framework, is then used to train predictive models. Experimental evaluations demonstrate the effectiveness of the proposed approach over traditional imputation methods. The results reveal that the models can accurately predict key variables such as ammonia nitrogen (NH<sub>4</sub>-N), nitrate nitrogen (NO<sub>3</sub>-N), pH, and biogas flow rate, using imputed data. The scalability and adaptability of this framework make it valuable for real-time wastewater monitoring and predictive analytics.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"199 ","pages":"Article 109123"},"PeriodicalIF":3.9000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing data quality in wastewater processes: Missing data imputation with deep Variational Autoencoders and genetic algorithms\",\"authors\":\"Christian Kazadi Mbamba , Philip Keymer , Maira Alvi , Sebastian O.N. Topalian , Fareed Ud Din , Damien J. Batstone\",\"doi\":\"10.1016/j.compchemeng.2025.109123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Missing data is a persistent challenge in wastewater analysis, often leading to biased results and reduced accuracy. This study introduces an innovative Automated Machine Learning (AutoML) framework that combines deep learning-based variational autoencoders (VAEs) and genetic algorithms (GAs) to address this issue. VAEs are employed to impute missing values by learning latent data representations, while GAs optimize the VAE architecture and hyperparameters, including the size of the latent space. The framework is specifically designed to handle the complex and nonlinear relationships in wastewater datasets.</div><div>The framework was trained and validated using data from a full-scale water resource recovery facility. The imputed data from the optimized VAE, developed using the GA-based AutoML framework, is then used to train predictive models. Experimental evaluations demonstrate the effectiveness of the proposed approach over traditional imputation methods. The results reveal that the models can accurately predict key variables such as ammonia nitrogen (NH<sub>4</sub>-N), nitrate nitrogen (NO<sub>3</sub>-N), pH, and biogas flow rate, using imputed data. The scalability and adaptability of this framework make it valuable for real-time wastewater monitoring and predictive analytics.</div></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"199 \",\"pages\":\"Article 109123\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135425001279\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425001279","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Enhancing data quality in wastewater processes: Missing data imputation with deep Variational Autoencoders and genetic algorithms
Missing data is a persistent challenge in wastewater analysis, often leading to biased results and reduced accuracy. This study introduces an innovative Automated Machine Learning (AutoML) framework that combines deep learning-based variational autoencoders (VAEs) and genetic algorithms (GAs) to address this issue. VAEs are employed to impute missing values by learning latent data representations, while GAs optimize the VAE architecture and hyperparameters, including the size of the latent space. The framework is specifically designed to handle the complex and nonlinear relationships in wastewater datasets.
The framework was trained and validated using data from a full-scale water resource recovery facility. The imputed data from the optimized VAE, developed using the GA-based AutoML framework, is then used to train predictive models. Experimental evaluations demonstrate the effectiveness of the proposed approach over traditional imputation methods. The results reveal that the models can accurately predict key variables such as ammonia nitrogen (NH4-N), nitrate nitrogen (NO3-N), pH, and biogas flow rate, using imputed data. The scalability and adaptability of this framework make it valuable for real-time wastewater monitoring and predictive analytics.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.