Predicting sorption of organic pollutants on soils with interpretable machine learning

IF 7.3 2区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Qian Wang , Jianmin Bian , Enze Ma , Jiangwei Zhang
{"title":"Predicting sorption of organic pollutants on soils with interpretable machine learning","authors":"Qian Wang ,&nbsp;Jianmin Bian ,&nbsp;Enze Ma ,&nbsp;Jiangwei Zhang","doi":"10.1016/j.envpol.2025.126665","DOIUrl":null,"url":null,"abstract":"<div><div>The sorption of organic pollutants (OPs) on soils plays a critical role in determining the environmental fate and transport of these compounds, which has been extensively studied. However, the complex nonlinear relationships between adsorption capacity and multiple influencing factors, as well as the relative contributions of these factors to adsorption behavior, remain inadequately understood. This study develops five machine learning (ML) models—support vector machine (SVM), deep neural networks (DNN), extreme gradient boosting (XGBT), random forest (RF), and gradient boosting decision tree (GBDT)—using a dataset of 352 data points from previous studies to predict OPs sorption on soils based on multiple factors. Shapley additive interpretation (SHAP) is applied to perform interpretability analysis based on the model exhibiting superior performance. Additionally, the distribution map of the sorption capacities of 12 OPs across mainland China is generated using the interpretable ML model. The results indicate that the XGBT model demonstrates superior performance, achieving a coefficient of determination of 0.952 and a root mean square error of 0.103 for the testing dataset. Interpretability analysis reveals that the electronic effects (<em>E</em>) of OPs and soil organic matter (SOM) content are the most influential factors. This finding underscores the dominant roles of π-π interactions and hydrophobic partitioning in the sorption mechanisms. The distribution map indicates that high sorption capacities are predominantly located in southern and southwestern regions, correlating with reduced environmental risks. This study presents a novel interpretable ML framework for predicting OPs adsorption potential and offers valuable insights into the mechanisms governing OPs sorption on soils. Furthermore, this framework supports the environmental management applications in risk assessment, land remediation strategies planning, and soil protection policy.</div></div>","PeriodicalId":311,"journal":{"name":"Environmental Pollution","volume":"382 ","pages":"Article 126665"},"PeriodicalIF":7.3000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Pollution","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0269749125010383","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The sorption of organic pollutants (OPs) on soils plays a critical role in determining the environmental fate and transport of these compounds, which has been extensively studied. However, the complex nonlinear relationships between adsorption capacity and multiple influencing factors, as well as the relative contributions of these factors to adsorption behavior, remain inadequately understood. This study develops five machine learning (ML) models—support vector machine (SVM), deep neural networks (DNN), extreme gradient boosting (XGBT), random forest (RF), and gradient boosting decision tree (GBDT)—using a dataset of 352 data points from previous studies to predict OPs sorption on soils based on multiple factors. Shapley additive interpretation (SHAP) is applied to perform interpretability analysis based on the model exhibiting superior performance. Additionally, the distribution map of the sorption capacities of 12 OPs across mainland China is generated using the interpretable ML model. The results indicate that the XGBT model demonstrates superior performance, achieving a coefficient of determination of 0.952 and a root mean square error of 0.103 for the testing dataset. Interpretability analysis reveals that the electronic effects (E) of OPs and soil organic matter (SOM) content are the most influential factors. This finding underscores the dominant roles of π-π interactions and hydrophobic partitioning in the sorption mechanisms. The distribution map indicates that high sorption capacities are predominantly located in southern and southwestern regions, correlating with reduced environmental risks. This study presents a novel interpretable ML framework for predicting OPs adsorption potential and offers valuable insights into the mechanisms governing OPs sorption on soils. Furthermore, this framework supports the environmental management applications in risk assessment, land remediation strategies planning, and soil protection policy.

Abstract Image

Abstract Image

用可解释的机器学习预测有机污染物在土壤上的吸附
土壤对有机污染物的吸附在决定有机污染物的环境命运和迁移过程中起着至关重要的作用,这一问题已经得到了广泛的研究。然而,吸附容量与多种影响因素之间复杂的非线性关系,以及这些因素对吸附行为的相对贡献,仍然没有得到充分的认识。本研究开发了五种机器学习(ML)模型——支持向量机(SVM)、深度神经网络(DNN)、极端梯度增强(XGBT)、随机森林(RF)和梯度增强决策树(GBDT)——利用先前研究的352个数据点的数据集,基于多因素预测土壤对OPs的吸收。利用Shapley加性解释(SHAP)对表现优异的模型进行可解释性分析。此外,使用可解释的ML模型生成了中国大陆12种OPs的吸附能力分布图。结果表明,XGBT模型表现出较好的性能,对测试数据集的决定系数为0.952,均方根误差为0.103。可解释性分析表明,有机磷的电子效应(E)和土壤有机质(SOM)含量是最主要的影响因素。这一发现强调了π-π相互作用和疏水分配在吸附机制中的主导作用。分布图显示,高吸附能力主要分布在南部和西南部地区,与环境风险降低相关。本研究提出了一种新的可解释的ML框架,用于预测OPs的吸附势,并为控制OPs在土壤上的吸附机制提供了有价值的见解。此外,该框架还支持环境管理在风险评估、土地修复策略规划和土壤保护政策方面的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Environmental Pollution
Environmental Pollution 环境科学-环境科学
CiteScore
16.00
自引率
6.70%
发文量
2082
审稿时长
2.9 months
期刊介绍: Environmental Pollution is an international peer-reviewed journal that publishes high-quality research papers and review articles covering all aspects of environmental pollution and its impacts on ecosystems and human health. Subject areas include, but are not limited to: • Sources and occurrences of pollutants that are clearly defined and measured in environmental compartments, food and food-related items, and human bodies; • Interlinks between contaminant exposure and biological, ecological, and human health effects, including those of climate change; • Contaminants of emerging concerns (including but not limited to antibiotic resistant microorganisms or genes, microplastics/nanoplastics, electronic wastes, light, and noise) and/or their biological, ecological, or human health effects; • Laboratory and field studies on the remediation/mitigation of environmental pollution via new techniques and with clear links to biological, ecological, or human health effects; • Modeling of pollution processes, patterns, or trends that is of clear environmental and/or human health interest; • New techniques that measure and examine environmental occurrences, transport, behavior, and effects of pollutants within the environment or the laboratory, provided that they can be clearly used to address problems within regional or global environmental compartments.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信