A novel hybrid variable cross layer-based machine learning model improves the accuracy and interpretation of energy intensity prediction of wastewater treatment plant.
{"title":"A novel hybrid variable cross layer-based machine learning model improves the accuracy and interpretation of energy intensity prediction of wastewater treatment plant.","authors":"Yucheng Li, Chen Cai, Erwu Liu, Xiaofeng Lin, Ying Zhang, Hongjing Chen, Zhongqing Wei, Xiangfeng Huang, Ru Guo, Kaiming Peng, Jia Liu","doi":"10.1016/j.jenvman.2024.123209","DOIUrl":null,"url":null,"abstract":"<p><p>Energy intensity (EI) prediction in wastewater treatment plants (WWTPs) suffers from inaccuracy and non-interpretability due to poor data quality, complex mechanisms and various confounding variables. In this study, the novel hybrid variable cross layer-based machine learning (VCL-ML) model was devised, which generates new knowledge with monitoring indicators (e.g., COD, etc.) and then embeds both domain knowledge and monitoring indicators into the ML model. This novel hybrid VCL-ML model achieves a root-mean-square error (RMSE) of 0.021 kW h/m³ with an 8.7% improvement over the conventional ML (Con-ML) model. The Shapley additive explanation demonstrated that domain knowledge features are ranked high and have important interpretable implications for the model, such as capacity utilization (CU), which measures the efficiency of resource use, and total nitrogen remaining rate (TN_rr), which indicates the nitrogen retention in a system. Partially dependent interactions between domain knowledge (e.g., sludge yield) and monitoring indexes (e.g., influent pH) could contribute to the interpretation of reality. By comparing the feature categorization between VCL-ML and Con-ML models, temporal information (e.g., month) and removal information (e.g., TN_rr) played an important role in the model's performance improvement. This result highlights the strong correlation between wastewater treatment plant energy intensity with pollutant removal and temporal information while weakening the contribution of other redundant features. This VCL-ML model improves the predicting accuracy and interpretation of the EI of WWTPs, which can be used in the optimal operation and sustainable management of WWTPs.</p>","PeriodicalId":356,"journal":{"name":"Journal of Environmental Management","volume":"371 ","pages":"123209"},"PeriodicalIF":8.0000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Environmental Management","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.jenvman.2024.123209","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Energy intensity (EI) prediction in wastewater treatment plants (WWTPs) suffers from inaccuracy and non-interpretability due to poor data quality, complex mechanisms and various confounding variables. In this study, the novel hybrid variable cross layer-based machine learning (VCL-ML) model was devised, which generates new knowledge with monitoring indicators (e.g., COD, etc.) and then embeds both domain knowledge and monitoring indicators into the ML model. This novel hybrid VCL-ML model achieves a root-mean-square error (RMSE) of 0.021 kW h/m³ with an 8.7% improvement over the conventional ML (Con-ML) model. The Shapley additive explanation demonstrated that domain knowledge features are ranked high and have important interpretable implications for the model, such as capacity utilization (CU), which measures the efficiency of resource use, and total nitrogen remaining rate (TN_rr), which indicates the nitrogen retention in a system. Partially dependent interactions between domain knowledge (e.g., sludge yield) and monitoring indexes (e.g., influent pH) could contribute to the interpretation of reality. By comparing the feature categorization between VCL-ML and Con-ML models, temporal information (e.g., month) and removal information (e.g., TN_rr) played an important role in the model's performance improvement. This result highlights the strong correlation between wastewater treatment plant energy intensity with pollutant removal and temporal information while weakening the contribution of other redundant features. This VCL-ML model improves the predicting accuracy and interpretation of the EI of WWTPs, which can be used in the optimal operation and sustainable management of WWTPs.
期刊介绍:
The Journal of Environmental Management is a journal for the publication of peer reviewed, original research for all aspects of management and the managed use of the environment, both natural and man-made.Critical review articles are also welcome; submission of these is strongly encouraged.