Performance analysis of machine learning algorithms for the prediction of disinfection byproducts formation during chlorination: Effect of background water characteristics.

IF 8 2区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Journal of Environmental Management Pub Date : 2025-08-01 Epub Date: 2025-06-14 DOI:10.1016/j.jenvman.2025.126144
Gamze Ersan, Eda Goz, Tanju Karanfil
{"title":"Performance analysis of machine learning algorithms for the prediction of disinfection byproducts formation during chlorination: Effect of background water characteristics.","authors":"Gamze Ersan, Eda Goz, Tanju Karanfil","doi":"10.1016/j.jenvman.2025.126144","DOIUrl":null,"url":null,"abstract":"<p><p>This study investigated the comparison of the nonlinear machine learning algorithms and linear regression models to predict the formation of trihalomethanes (THM4), haloacetic acids (HAA5 and HAA9), and haloacetonitriles (HAN4 and HAN6) under uniform formation conditions in chlorinated waters. A wide range of water sources including wastewater effluent organic matters (EfOM), laboratory grown algal organic matters (AOM) samples from different algal species, and raw/treated/isolated natural organic matter (NOM) samples were selected to investigate background water effect on the model performance. Models for THM4, HAA5, HAA9, HAN4 and HAN6 formation were developed for all water sources combined (including NOM, AOM, and EfOM-impacted waters) and for NOM separately. The results showed that Least squares support vector machine (LS-SVM) delivered the best performance for both regulated THM (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.92/0.80) and HAA5 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.91/0.72), while Kernel extreme learning machine (KELM) outperformed the other models for unregulated HAN4 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.89/0.70) and HAN6 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.91/0.41), across all water sources. For individual NOM waters, the Artificial neural network (ANN) model outperformed in predicting THMs (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.97/0.94), HAA9 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.92/0.84), HAN4 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.98/0.96), and HAN6 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.98/0.89), emphasizing its ability to generalize across narrower, more specific datasets. This suggests that while LS-SVM and KELM models are more effective for both regulated and unregulated disinfection byproducts (DBPs) modeling as the variability in water source characteristics increases, the ANN model excels for more homogeneous DBP precursor types. These findings indicate the importance of selecting the appropriate modeling approach and the characteristics of the datasets for DBP modeling.</p>","PeriodicalId":356,"journal":{"name":"Journal of Environmental Management","volume":"389 ","pages":"126144"},"PeriodicalIF":8.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Environmental Management","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.jenvman.2025.126144","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

This study investigated the comparison of the nonlinear machine learning algorithms and linear regression models to predict the formation of trihalomethanes (THM4), haloacetic acids (HAA5 and HAA9), and haloacetonitriles (HAN4 and HAN6) under uniform formation conditions in chlorinated waters. A wide range of water sources including wastewater effluent organic matters (EfOM), laboratory grown algal organic matters (AOM) samples from different algal species, and raw/treated/isolated natural organic matter (NOM) samples were selected to investigate background water effect on the model performance. Models for THM4, HAA5, HAA9, HAN4 and HAN6 formation were developed for all water sources combined (including NOM, AOM, and EfOM-impacted waters) and for NOM separately. The results showed that Least squares support vector machine (LS-SVM) delivered the best performance for both regulated THM (R2train/R2test: 0.92/0.80) and HAA5 (R2train/R2test: 0.91/0.72), while Kernel extreme learning machine (KELM) outperformed the other models for unregulated HAN4 (R2train/R2test: 0.89/0.70) and HAN6 (R2train/R2test: 0.91/0.41), across all water sources. For individual NOM waters, the Artificial neural network (ANN) model outperformed in predicting THMs (R2train/R2test: 0.97/0.94), HAA9 (R2train/R2test: 0.92/0.84), HAN4 (R2train/R2test: 0.98/0.96), and HAN6 (R2train/R2test: 0.98/0.89), emphasizing its ability to generalize across narrower, more specific datasets. This suggests that while LS-SVM and KELM models are more effective for both regulated and unregulated disinfection byproducts (DBPs) modeling as the variability in water source characteristics increases, the ANN model excels for more homogeneous DBP precursor types. These findings indicate the importance of selecting the appropriate modeling approach and the characteristics of the datasets for DBP modeling.

用于预测氯化过程中消毒副产物形成的机器学习算法的性能分析:背景水特性的影响。
本研究比较了非线性机器学习算法和线性回归模型在均匀形成条件下预测三卤甲烷(THM4)、卤乙酸(HAA5和HAA9)和卤乙腈(HAN4和HAN6)形成的效果。本研究选取了多种水源,包括废水流出有机物(EfOM)、不同藻类的实验室生长藻类有机物(AOM)样本,以及原始/处理/分离的天然有机物(NOM)样本,研究了背景水对模型性能的影响。针对所有水源(包括NOM、AOM和efom影响水域)和NOM分别建立了THM4、HAA5、HAA9、HAN4和HAN6地层模型。结果表明,最小二乘支持向量机(LS-SVM)在所有水源中对受调节的THM (R2train/R2test: 0.92/0.80)和HAA5 (R2train/R2test: 0.91/0.72)均表现最佳,而Kernel extreme learning machine (KELM)在受调节的HAN4 (R2train/R2test: 0.89/0.70)和HAN6 (R2train/R2test: 0.91/0.41)表现优于其他模型。对于单个NOM水域,人工神经网络(ANN)模型在预测THMs (R2train/R2test: 0.97/0.94)、HAA9 (R2train/R2test: 0.92/0.84)、HAN4 (R2train/R2test: 0.98/0.96)和HAN6 (R2train/R2test: 0.98/0.89)方面表现优异,强调了其在更窄、更具体的数据集上的泛化能力。这表明,随着水源特征的变异性增加,LS-SVM和KELM模型对受管制和不受管制的消毒副产物(DBP)建模更有效,而人工神经网络模型对更均匀的DBP前体类型更有效。这些发现表明选择合适的建模方法和数据集的特征对于DBP建模的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Environmental Management
Journal of Environmental Management 环境科学-环境科学
CiteScore
13.70
自引率
5.70%
发文量
2477
审稿时长
84 days
期刊介绍: The Journal of Environmental Management is a journal for the publication of peer reviewed, original research for all aspects of management and the managed use of the environment, both natural and man-made.Critical review articles are also welcome; submission of these is strongly encouraged.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信