Predicting reaction kinetics of reactive bromine species with organic compounds by machine learning: Feature combination and knowledge transfer with reactive chlorine species
Wenlei Qin, Shanshan Zheng, Kaiheng Guo, Ming Yang, Jingyun Fang
{"title":"Predicting reaction kinetics of reactive bromine species with organic compounds by machine learning: Feature combination and knowledge transfer with reactive chlorine species","authors":"Wenlei Qin, Shanshan Zheng, Kaiheng Guo, Ming Yang, Jingyun Fang","doi":"10.1016/j.jhazmat.2024.136410","DOIUrl":null,"url":null,"abstract":"Reactive bromine species (RBS) such as bromine atom (Br<sup>•</sup>) and dibromine radical (Br<sub>2</sub><sup>•−</sup>) are important oxidative species accounting for the transformation of organic compounds in bromide-containing water. This study developed quantitative structure−activity relationship (QSAR) models to predict second order rate constants (<em>k</em>) of RBS by machine learning (ML) and conducted knowledge transfer between RBS and reactive chlorine species (RCS, e.g., Cl<sup>•</sup> and Cl<sub>2</sub><sup>•−</sup>) to improve model performance. The ML-based models (<em>RMSE</em><sub>test</sub> = 0.476−0.712) outperformed the multiple linear regression-based models (<em>RMSE</em><sub>test</sub> = 0.572−3.68) for predicting <em>k</em> of RBS. In addition, the combination of molecular fingerprints (MFs) and quantum descriptors (QDs) as input features improved the performance of ML-based models (<em>RMSE</em><sub>test</sub> = 0.476−0.712) compared to those developed by MFs (<em>RMSE</em><sub>test</sub> = 0.524−0.834) or QDs (<em>RMSE</em><sub>test</sub> = 0.572−0.806) alone. <em>E</em><sub>HOMO</sub> and <em>E</em><sub>gap</sub> were identified to be the most important features affecting <em>k</em> of RBS based on SHAP analysis. A unified model integrating the datasets of four reactive halogen species (RHS, e.g., Br<sup>•</sup>, Br<sub>2</sub><sup>•−</sup>, Cl<sup>•</sup> and Cl<sub>2</sub><sup>•−</sup>) was further developed (<em>R</em><sup>2</sup><sub>test</sub> = 0.802), which showed better predictive performance than the individual models (<em>R</em><sup>2</sup><sub>test</sub> = 0.521−0.776). Meanwhile, the model performance changed differently by employing knowledge transfer among RHS, which was improved for Br<sup>•</sup>/Cl<sup>•</sup>, mixed for Br<sup>•</sup>/Br<sub>2</sub><sup>•−</sup> and Cl<sup>•</sup>/Cl<sub>2</sub><sup>•−</sup>, but worse for Br<sub>2</sub><sup>•−</sup>/Cl<sub>2</sub><sup>•−</sup>. This study provides useful tools for predicting <em>k</em> of RHS in aqueous environments.","PeriodicalId":361,"journal":{"name":"Journal of Hazardous Materials","volume":null,"pages":null},"PeriodicalIF":12.2000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hazardous Materials","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.jhazmat.2024.136410","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Reactive bromine species (RBS) such as bromine atom (Br•) and dibromine radical (Br2•−) are important oxidative species accounting for the transformation of organic compounds in bromide-containing water. This study developed quantitative structure−activity relationship (QSAR) models to predict second order rate constants (k) of RBS by machine learning (ML) and conducted knowledge transfer between RBS and reactive chlorine species (RCS, e.g., Cl• and Cl2•−) to improve model performance. The ML-based models (RMSEtest = 0.476−0.712) outperformed the multiple linear regression-based models (RMSEtest = 0.572−3.68) for predicting k of RBS. In addition, the combination of molecular fingerprints (MFs) and quantum descriptors (QDs) as input features improved the performance of ML-based models (RMSEtest = 0.476−0.712) compared to those developed by MFs (RMSEtest = 0.524−0.834) or QDs (RMSEtest = 0.572−0.806) alone. EHOMO and Egap were identified to be the most important features affecting k of RBS based on SHAP analysis. A unified model integrating the datasets of four reactive halogen species (RHS, e.g., Br•, Br2•−, Cl• and Cl2•−) was further developed (R2test = 0.802), which showed better predictive performance than the individual models (R2test = 0.521−0.776). Meanwhile, the model performance changed differently by employing knowledge transfer among RHS, which was improved for Br•/Cl•, mixed for Br•/Br2•− and Cl•/Cl2•−, but worse for Br2•−/Cl2•−. This study provides useful tools for predicting k of RHS in aqueous environments.
期刊介绍:
The Journal of Hazardous Materials serves as a global platform for promoting cutting-edge research in the field of Environmental Science and Engineering. Our publication features a wide range of articles, including full-length research papers, review articles, and perspectives, with the aim of enhancing our understanding of the dangers and risks associated with various materials concerning public health and the environment. It is important to note that the term "environmental contaminants" refers specifically to substances that pose hazardous effects through contamination, while excluding those that do not have such impacts on the environment or human health. Moreover, we emphasize the distinction between wastes and hazardous materials in order to provide further clarity on the scope of the journal. We have a keen interest in exploring specific compounds and microbial agents that have adverse effects on the environment.