{"title":"用于预测氯化过程中消毒副产物形成的机器学习算法的性能分析:背景水特性的影响。","authors":"Gamze Ersan, Eda Goz, Tanju Karanfil","doi":"10.1016/j.jenvman.2025.126144","DOIUrl":null,"url":null,"abstract":"<p><p>This study investigated the comparison of the nonlinear machine learning algorithms and linear regression models to predict the formation of trihalomethanes (THM4), haloacetic acids (HAA5 and HAA9), and haloacetonitriles (HAN4 and HAN6) under uniform formation conditions in chlorinated waters. A wide range of water sources including wastewater effluent organic matters (EfOM), laboratory grown algal organic matters (AOM) samples from different algal species, and raw/treated/isolated natural organic matter (NOM) samples were selected to investigate background water effect on the model performance. Models for THM4, HAA5, HAA9, HAN4 and HAN6 formation were developed for all water sources combined (including NOM, AOM, and EfOM-impacted waters) and for NOM separately. The results showed that Least squares support vector machine (LS-SVM) delivered the best performance for both regulated THM (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.92/0.80) and HAA5 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.91/0.72), while Kernel extreme learning machine (KELM) outperformed the other models for unregulated HAN4 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.89/0.70) and HAN6 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.91/0.41), across all water sources. For individual NOM waters, the Artificial neural network (ANN) model outperformed in predicting THMs (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.97/0.94), HAA9 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.92/0.84), HAN4 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.98/0.96), and HAN6 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.98/0.89), emphasizing its ability to generalize across narrower, more specific datasets. This suggests that while LS-SVM and KELM models are more effective for both regulated and unregulated disinfection byproducts (DBPs) modeling as the variability in water source characteristics increases, the ANN model excels for more homogeneous DBP precursor types. These findings indicate the importance of selecting the appropriate modeling approach and the characteristics of the datasets for DBP modeling.</p>","PeriodicalId":356,"journal":{"name":"Journal of Environmental Management","volume":"389 ","pages":"126144"},"PeriodicalIF":8.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance analysis of machine learning algorithms for the prediction of disinfection byproducts formation during chlorination: Effect of background water characteristics.\",\"authors\":\"Gamze Ersan, Eda Goz, Tanju Karanfil\",\"doi\":\"10.1016/j.jenvman.2025.126144\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study investigated the comparison of the nonlinear machine learning algorithms and linear regression models to predict the formation of trihalomethanes (THM4), haloacetic acids (HAA5 and HAA9), and haloacetonitriles (HAN4 and HAN6) under uniform formation conditions in chlorinated waters. A wide range of water sources including wastewater effluent organic matters (EfOM), laboratory grown algal organic matters (AOM) samples from different algal species, and raw/treated/isolated natural organic matter (NOM) samples were selected to investigate background water effect on the model performance. Models for THM4, HAA5, HAA9, HAN4 and HAN6 formation were developed for all water sources combined (including NOM, AOM, and EfOM-impacted waters) and for NOM separately. The results showed that Least squares support vector machine (LS-SVM) delivered the best performance for both regulated THM (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.92/0.80) and HAA5 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.91/0.72), while Kernel extreme learning machine (KELM) outperformed the other models for unregulated HAN4 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.89/0.70) and HAN6 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.91/0.41), across all water sources. For individual NOM waters, the Artificial neural network (ANN) model outperformed in predicting THMs (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.97/0.94), HAA9 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.92/0.84), HAN4 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.98/0.96), and HAN6 (R<sup>2</sup><sub>train</sub>/R<sup>2</sup><sub>test</sub>: 0.98/0.89), emphasizing its ability to generalize across narrower, more specific datasets. This suggests that while LS-SVM and KELM models are more effective for both regulated and unregulated disinfection byproducts (DBPs) modeling as the variability in water source characteristics increases, the ANN model excels for more homogeneous DBP precursor types. These findings indicate the importance of selecting the appropriate modeling approach and the characteristics of the datasets for DBP modeling.</p>\",\"PeriodicalId\":356,\"journal\":{\"name\":\"Journal of Environmental Management\",\"volume\":\"389 \",\"pages\":\"126144\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Environmental Management\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jenvman.2025.126144\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/14 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Environmental Management","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.jenvman.2025.126144","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Performance analysis of machine learning algorithms for the prediction of disinfection byproducts formation during chlorination: Effect of background water characteristics.
This study investigated the comparison of the nonlinear machine learning algorithms and linear regression models to predict the formation of trihalomethanes (THM4), haloacetic acids (HAA5 and HAA9), and haloacetonitriles (HAN4 and HAN6) under uniform formation conditions in chlorinated waters. A wide range of water sources including wastewater effluent organic matters (EfOM), laboratory grown algal organic matters (AOM) samples from different algal species, and raw/treated/isolated natural organic matter (NOM) samples were selected to investigate background water effect on the model performance. Models for THM4, HAA5, HAA9, HAN4 and HAN6 formation were developed for all water sources combined (including NOM, AOM, and EfOM-impacted waters) and for NOM separately. The results showed that Least squares support vector machine (LS-SVM) delivered the best performance for both regulated THM (R2train/R2test: 0.92/0.80) and HAA5 (R2train/R2test: 0.91/0.72), while Kernel extreme learning machine (KELM) outperformed the other models for unregulated HAN4 (R2train/R2test: 0.89/0.70) and HAN6 (R2train/R2test: 0.91/0.41), across all water sources. For individual NOM waters, the Artificial neural network (ANN) model outperformed in predicting THMs (R2train/R2test: 0.97/0.94), HAA9 (R2train/R2test: 0.92/0.84), HAN4 (R2train/R2test: 0.98/0.96), and HAN6 (R2train/R2test: 0.98/0.89), emphasizing its ability to generalize across narrower, more specific datasets. This suggests that while LS-SVM and KELM models are more effective for both regulated and unregulated disinfection byproducts (DBPs) modeling as the variability in water source characteristics increases, the ANN model excels for more homogeneous DBP precursor types. These findings indicate the importance of selecting the appropriate modeling approach and the characteristics of the datasets for DBP modeling.
期刊介绍:
The Journal of Environmental Management is a journal for the publication of peer reviewed, original research for all aspects of management and the managed use of the environment, both natural and man-made.Critical review articles are also welcome; submission of these is strongly encouraged.