Machine learning prediction of DOC–water partitioning coefficients for organic pollutants from diverse DOM origins†

IF 3.9 3区 环境科学与生态学 Q1 CHEMISTRY, ANALYTICAL
Ruyue Jin, Yuzhen Liang and Zhenqing Shi
{"title":"Machine learning prediction of DOC–water partitioning coefficients for organic pollutants from diverse DOM origins†","authors":"Ruyue Jin, Yuzhen Liang and Zhenqing Shi","doi":"10.1039/D5EM00029G","DOIUrl":null,"url":null,"abstract":"<p >This study aims to improve predictions and understanding of dissolved organic carbon–water partitioning coefficients (<em>K</em><small><sub>DOC</sub></small>), a crucial parameter in environmental risk assessment. A dataset encompassing 709 datapoints across 190 unique organic pollutants and various types of dissolved organic matter (DOM) was compiled. Molecular descriptors were calculated to characterize each compound's properties and structures using Multiwfn, PaDEL and RDKit. Individual machine learning models were established for four different DOM origins: all DOM, natural aquatic DOM, natural terrestrial DOM and commercial DOM. These models exhibited excellent goodness-of-fit, internal stability, and predictive performance with <em>R</em><small><sub>train</sub></small><small><sup>2</sup></small> &gt; 0.771, <em>R</em><small><sub>valid</sub></small><small><sup>2</sup></small> &gt; 0.602, <em>R</em><small><sub>test</sub></small><small><sup>2</sup></small> &gt; 0.629, and RMSE<small><sub>test</sub></small> ranging from 0.413 to 0.580. Shapley additive explanation analysis identified CrippenLogP and MATS2m as the most influencing factors. CrippenLogP, reflecting hydrophobicity, positively influenced <em>K</em><small><sub>DOC</sub></small>, while MATS2m, characterizing molecular branching and compactness, had a negative effect. Mor29m, where lower values indicate a higher abundance of heteroatoms such as halogens, also showed a negative impact, likely due to enhanced interactions with polar DOM groups. SlogP_VSA1, another descriptor related to hydrophobicity, demonstrated a positive correlation with log <em>K</em><small><sub>DOC</sub></small> in natural aquatic DOM, while its negative correlation in all DOM may reflect the great diversity of DOM properties in that group. Partial dependence plots revealed that when CrippenLogP &gt; 6, Mor29m between 0.45 and 0.52, MATS2m &lt; −0.015, and SlogP_VSA1 &lt; 7, organic pollutants tended to partition more into DOM. These findings support the application of machine learning models for assessing pollutant interactions with DOM, contributing to improved environmental risk predictions.</p>","PeriodicalId":74,"journal":{"name":"Environmental Science: Processes & Impacts","volume":" 7","pages":" 1889-1901"},"PeriodicalIF":3.9000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science: Processes & Impacts","FirstCategoryId":"93","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/em/d5em00029g","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

This study aims to improve predictions and understanding of dissolved organic carbon–water partitioning coefficients (KDOC), a crucial parameter in environmental risk assessment. A dataset encompassing 709 datapoints across 190 unique organic pollutants and various types of dissolved organic matter (DOM) was compiled. Molecular descriptors were calculated to characterize each compound's properties and structures using Multiwfn, PaDEL and RDKit. Individual machine learning models were established for four different DOM origins: all DOM, natural aquatic DOM, natural terrestrial DOM and commercial DOM. These models exhibited excellent goodness-of-fit, internal stability, and predictive performance with Rtrain2 > 0.771, Rvalid2 > 0.602, Rtest2 > 0.629, and RMSEtest ranging from 0.413 to 0.580. Shapley additive explanation analysis identified CrippenLogP and MATS2m as the most influencing factors. CrippenLogP, reflecting hydrophobicity, positively influenced KDOC, while MATS2m, characterizing molecular branching and compactness, had a negative effect. Mor29m, where lower values indicate a higher abundance of heteroatoms such as halogens, also showed a negative impact, likely due to enhanced interactions with polar DOM groups. SlogP_VSA1, another descriptor related to hydrophobicity, demonstrated a positive correlation with log KDOC in natural aquatic DOM, while its negative correlation in all DOM may reflect the great diversity of DOM properties in that group. Partial dependence plots revealed that when CrippenLogP > 6, Mor29m between 0.45 and 0.52, MATS2m < −0.015, and SlogP_VSA1 < 7, organic pollutants tended to partition more into DOM. These findings support the application of machine learning models for assessing pollutant interactions with DOM, contributing to improved environmental risk predictions.

Abstract Image

机器学习预测不同DOM来源的有机污染物doc -水分配系数。
本研究旨在提高对环境风险评价中关键参数溶解有机碳-水分配系数(KDOC)的预测和理解。编制了一个包含190种独特有机污染物和各种溶解有机质(DOM)的709个数据点的数据集。使用Multiwfn、PaDEL和RDKit计算分子描述符来表征每个化合物的性质和结构。针对四种不同的DOM来源建立了独立的机器学习模型:所有DOM、天然水生DOM、天然陆地DOM和商业DOM。这些模型具有良好的拟合优度、内部稳定性和预测性能,Rtrain2 > 0.771、Rvalid2 > 0.602、Rtest2 > 0.629, RMSEtest范围为0.413 ~ 0.580。Shapley加性解释分析发现,CrippenLogP和MATS2m是影响最大的因素。反映疏水性的CrippenLogP对KDOC有正向影响,而表征分子分支和致密性的MATS2m则有负向影响。Mor29m的值越低,表明卤素等杂原子的丰度越高,也显示出负面影响,可能是由于与极性DOM基团的相互作用增强。另一个与疏水性相关的描述子logp_vsa1在天然水生DOM中与log KDOC呈正相关,而在所有DOM中均呈负相关,这可能反映了该组DOM属性的巨大多样性。部分依赖图显示,当CrippenLogP bbb6.0、Mor29m在0.45 ~ 0.52之间、MATS2m < -0.015、SlogP_VSA1 < 7时,有机污染物更倾向于向DOM分划。这些发现支持应用机器学习模型来评估污染物与DOM的相互作用,有助于改进环境风险预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Environmental Science: Processes & Impacts
Environmental Science: Processes & Impacts CHEMISTRY, ANALYTICAL-ENVIRONMENTAL SCIENCES
CiteScore
9.50
自引率
3.60%
发文量
202
审稿时长
1 months
期刊介绍: Environmental Science: Processes & Impacts publishes high quality papers in all areas of the environmental chemical sciences, including chemistry of the air, water, soil and sediment. We welcome studies on the environmental fate and effects of anthropogenic and naturally occurring contaminants, both chemical and microbiological, as well as related natural element cycling processes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信